Science.gov

Sample records for structure based predictive

  1. Structure based prediction of protein folding intermediates.

    PubMed

    Xie, D; Freire, E

    1994-09-01

    The complete unfolding of a protein involves the disruption of non-covalent intramolecular interactions within the protein and the subsequent hydration of the backbone and amino acid side-chains. The magnitude of the thermodynamic parameters associated with this process is known accurately for a growing number of globular proteins for which high-resolution structures are also available. The existence of this database of structural and thermodynamic information has facilitated the development of statistical procedures aimed at quantifying the relationships existing between protein structure and the thermodynamic parameters of folding/unfolding. Under some conditions proteins do not unfold completely, giving rise to states (commonly known as molten globules) in which the molecule retains some secondary structure and remains in a compact configuration after denaturation. This phenomenon is reflected in the thermodynamics of the process. Depending on the nature of the residual structure that exists after denaturation, the observed enthalpy, entropy and heat capacity changes will deviate in a particular and predictable way from the values expected for complete unfolding. For several proteins, these deviations have been shown to exhibit similar characteristics, suggesting that their equilibrium folding intermediates exhibit some common structural features. Employing empirically derived structure-energetic relationships, it is possible to identify in the native structure of the protein those regions with the higher probability of being structured in equilibrium partly folded states. In this work, a thermodynamic search algorithm aimed at identifying the structural determinants of the molten globule state has been applied to six globular proteins; alpha-lactalbumin, barnase, IIIGlc, interleukin-1 beta, phage T4 lysozyme and phage 434 repressor. Remarkably, the structural features of the predicted equilibrium intermediates coincide to a large extent with the known

  2. A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile.

    PubMed

    Ding, Shuyan; Li, Yan; Shi, Zhuoxing; Yan, Shoujiang

    2014-02-01

    Knowledge of protein secondary structural classes plays an important role in understanding protein folding patterns. In this paper, 25 features based on position-specific scoring matrices are selected to reflect evolutionary information. In combination with other 11 rational features based on predicted protein secondary structure sequences proposed by the previous researchers, a 36-dimensional representation feature vector is presented to predict protein secondary structural classes for low-similarity sequences. ASTRALtraining dataset is used to train and design our method, other three low-similarity datasets ASTRALtest, 25PDB and 1189 are used to test the proposed method. Comparisons with other methods show that our method is effective to predict protein secondary structural classes. Stand alone version of the proposed method (PSSS-PSSM) is written in MATLAB language and it can be downloaded from http://letsgob.com/bioinfo_PSSS_PSSM/. PMID:24067326

  3. OPTIMIZATION BIAS IN ENERGY-BASED STRUCTURE PREDICTION

    PubMed Central

    Petrella, Robert J.

    2014-01-01

    Physics-based computational approaches to predicting the structure of macromolecules such as proteins are gaining increased use, but there are remaining challenges. In the current work, it is demonstrated that in energy-based prediction methods, the degree of optimization of the sampled structures can influence the prediction results. In particular, discrepancies in the degree of local sampling can bias the predictions in favor of the oversampled structures by shifting the local probability distributions of the minimum sampled energies. In simple systems, it is shown that the magnitude of the errors can be calculated from the energy surface, and for certain model systems, derived analytically. Further, it is shown that for energy wells whose forms differ only by a randomly assigned energy shift, the optimal accuracy of prediction is achieved when the sampling around each structure is equal. Energy correction terms can be used in cases of unequal sampling to reproduce the total probabilities that would occur under equal sampling, but optimal corrections only partially restore the prediction accuracy lost to unequal sampling. For multiwell systems, the determination of the correction terms is a multibody problem; it is shown that the involved cross-correlation multiple integrals can be reduced to simpler integrals. The possible implications of the current analysis for macromolecular structure prediction are discussed. PMID:25552783

  4. Prediction of reactive hazards based on molecular structure.

    PubMed

    Saraf, S R; Rogers, W J; Mannan, M S

    2003-03-17

    There is considerable interest in prediction of reactive hazards based on chemical structure. Calorimetric measurements to determine reactivity can be resource consuming, so computational methods to predict reactivity hazards present an attractive option. This paper reviews some of the commonly employed theoretical hazard evaluation techniques, including the oxygen-balance method, ASTM CHETAH, and calculated adiabatic reaction temperature (CART). It also discusses the development of a study table to correlate and predict calorimetric properties of pure compounds. Quantitative structure-property relationships (QSPR) based on quantum mechanical calculations can be employed to correlate calorimetrically measured onset temperatures, T(o), and energies of reaction, -deltaH, with molecular properties. To test the feasibility of this approach, the QSPR technique is used to correlate differential scanning calorimeter (DSC) data, T(o) and -deltaH, with molecular properties for 19 nitro compounds. PMID:12628775

  5. Structure based activity prediction of HIV-1 reverse transcriptase inhibitors.

    PubMed

    de Jonge, Marc R; Koymans, Lucien M H; Vinkers, H Maarten; Daeyaert, Frits F D; Heeres, Jan; Lewi, Paul J; Janssen, Paul A J

    2005-03-24

    We have developed a fast and robust computational method for prediction of antiviral activity in automated de novo design of HIV-1 reverse transcriptase inhibitors. This is a structure-based approach that uses a linear relation between activity and interaction energy with discrete orientation sampling and with localized interaction energy terms. The localization allows for the analysis of mutations of the protein target and for the separation of inhibition and a specific binding to the enzyme. We apply the method to the prediction of pIC(50) of HIV-1 reverse transcriptase inhibitors. The model predicts the activity of an arbitrary compound with a q(2) of 0.681 and an average absolute error of 0.66 log value, and it is fast enough to be used in high-throughput computational applications. PMID:15771460

  6. Protein secondary structure prediction using logic-based machine learning.

    PubMed

    Muggleton, S; King, R D; Sternberg, M J

    1992-10-01

    Many attempts have been made to solve the problem of predicting protein secondary structure from the primary sequence but the best performance results are still disappointing. In this paper, the use of a machine learning algorithm which allows relational descriptions is shown to lead to improved performance. The Inductive Logic Programming computer program, Golem, was applied to learning secondary structure prediction rules for alpha/alpha domain type proteins. The input to the program consisted of 12 non-homologous proteins (1612 residues) of known structure, together with a background knowledge describing the chemical and physical properties of the residues. Golem learned a small set of rules that predict which residues are part of the alpha-helices--based on their positional relationships and chemical and physical properties. The rules were tested on four independent non-homologous proteins (416 residues) giving an accuracy of 81% (+/- 2%). This is an improvement, on identical data, over the previously reported result of 73% by King and Sternberg (1990, J. Mol. Biol., 216, 441-457) using the machine learning program PROMIS, and of 72% using the standard Garnier-Osguthorpe-Robson method. The best previously reported result in the literature for the alpha/alpha domain type is 76%, achieved using a neural net approach. Machine learning also has the advantage over neural network and statistical methods in producing more understandable results. PMID:1480619

  7. Protein-protein interface prediction based on hexagon structure similarity.

    PubMed

    Guo, Fei; Ding, Yijie; Li, Shuai Cheng; Shen, Chao; Wang, Lusheng

    2016-08-01

    Studies on protein-protein interaction are important in proteome research. How to build more effective models based on sequence information, structure information and physicochemical characteristics, is the key technology in protein-protein interface prediction. In this paper, we study the protein-protein interface prediction problem. We propose a novel method for identifying residues on interfaces from an input protein with both sequence and 3D structure information, based on hexagon structure similarity. Experiments show that our method achieves better results than some state-of-the-art methods for identifying protein-protein interface. Comparing to existing methods, our approach improves F-measure value by at least 0.03. On a common dataset consisting of 41 complexes, our method has overall precision and recall values of 63% and 57%. On Benchmark v4.0, our method has overall precision and recall values of 55% and 56%. On CAPRI targets, our method has overall precision and recall values of 52% and 55%. PMID:26936323

  8. A protein structural class prediction method based on novel features.

    PubMed

    Zhang, Lichao; Zhao, Xiqiang; Kong, Liang

    2013-09-01

    In this study, a 12-dimensional feature vector is constructed to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Among the 12 features, 6 novel features are specially designed to improve the prediction accuracies for α/β and α + β classes based on the distributions of α-helices and β-strands and the characteristics of parallel β-sheets and anti-parallel β-sheets. To evaluate our method, the jackknife cross-validating test is employed on two widely-used datasets, 25PDB and 1189 datasets with sequence similarity lower than 40% and 25%, respectively. The performance of our method outperforms the recently reported methods in most cases, and the 6 newly-designed features have significant positive effect to the prediction accuracies, especially for α/β and α + β classes. PMID:23770446

  9. SAM-T08, HMM-based protein structure prediction

    PubMed Central

    Karplus, Kevin

    2009-01-01

    The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PMID:19483096

  10. SAbPred: a structure-based antibody prediction server.

    PubMed

    Dunbar, James; Krawczyk, Konrad; Leem, Jinwoo; Marks, Claire; Nowak, Jaroslaw; Regep, Cristian; Georges, Guy; Kelm, Sebastian; Popovic, Bojana; Deane, Charlotte M

    2016-07-01

    SAbPred is a server that makes predictions of the properties of antibodies focusing on their structures. Antibody informatics tools can help improve our understanding of immune responses to disease and aid in the design and engineering of therapeutic molecules. SAbPred is a single platform containing multiple applications which can: number and align sequences; automatically generate antibody variable fragment homology models; annotate such models with estimated accuracy alongside sequence and structural properties including potential developability issues; predict paratope residues; and predict epitope patches on protein antigens. The server is available at http://opig.stats.ox.ac.uk/webapps/sabpred. PMID:27131379

  11. Structure-based mutant stability predictions on proteins of unknown structure.

    PubMed

    Gonnelli, Giulia; Rooman, Marianne; Dehouck, Yves

    2012-10-31

    The ability to rapidly and accurately predict the effects of mutations on the physicochemical properties of proteins holds tremendous importance in the rational design of modified proteins for various types of industrial, environmental or pharmaceutical applications, as well as in elucidating the genetic background of complex diseases. In many cases, the absence of an experimentally resolved structure represents a major obstacle, since most currently available predictive software crucially depend on it. We investigate here the relevance of combining coarse-grained structure-based stability predictions with a simple comparative modeling procedure. Strikingly, our results show that the use of average to high quality structural models leads to virtually no loss in predictive power compared to the use of experimental structures. Even in the case of low quality models, the decrease in performance is quite limited and this combined approach remains markedly superior to other methods based exclusively on the analysis of sequence features. PMID:22782143

  12. Improving protein structure prediction using multiple sequence-based contact predictions

    PubMed Central

    Wu, Sitao; Szilagyi, Andras; Zhang, Yang

    2011-01-01

    Summary Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors which are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 non-homologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with P-value by student s t-test below 0.00001 and 0.001, respectively. In several cases, TM-score increases by >30%, which essentially converts “non-foldable” targets into “foldable” ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets. PMID:21827953

  13. Structure-Based Prediction of Protein-Folding Transition Paths.

    PubMed

    Jacobs, William M; Shakhnovich, Eugene I

    2016-09-01

    We propose a general theory to describe the distribution of protein-folding transition paths. We show that transition paths follow a predictable sequence of high-free-energy transient states that are separated by free-energy barriers. Each transient state corresponds to the assembly of one or more discrete, cooperative units, which are determined directly from the native structure. We show that the transition state on a folding pathway is reached when a small number of critical contacts are formed between a specific set of substructures, after which folding proceeds downhill in free energy. This approach suggests a natural resolution for distinguishing parallel folding pathways and provides a simple means to predict the rate-limiting step in a folding reaction. Our theory identifies a common folding mechanism for proteins with diverse native structures and establishes general principles for the self-assembly of polymers with specific interactions. PMID:27602721

  14. Finite Element Based HWB Centerbody Structural Optimization and Weight Prediction

    NASA Technical Reports Server (NTRS)

    Gern, Frank H.

    2012-01-01

    This paper describes a scalable structural model suitable for Hybrid Wing Body (HWB) centerbody analysis and optimization. The geometry of the centerbody and primary wing structure is based on a Vehicle Sketch Pad (VSP) surface model of the aircraft and a FLOPS compatible parameterization of the centerbody. Structural analysis, optimization, and weight calculation are based on a Nastran finite element model of the primary HWB structural components, featuring centerbody, mid section, and outboard wing. Different centerbody designs like single bay or multi-bay options are analyzed and weight calculations are compared to current FLOPS results. For proper structural sizing and weight estimation, internal pressure and maneuver flight loads are applied. Results are presented for aerodynamic loads, deformations, and centerbody weight.

  15. Strain Concentration at Structural Discontinuities and Its Prediction Based on Characteristics of Compliance Change in Structures

    NASA Astrophysics Data System (ADS)

    Kasahara, Naoto

    Elevated temperature structural design codes pay attention to strain concentration at structural discontinuities due to creep and plasticity, since it causes an increase in creep-fatigue damage of materials. One of the difficulties in predicting strain concentration is its dependence on the magnitude of loading, the constitutive equations, and the duration of loading. In this study, the author investigated the fundamental mechanism of strain concentration and its main factors. The results revealed that strain concentration is caused by strain redistribution between elastic and inelastic regions, which can be quantified by the characteristics of structural compliance. The characteristics of structural compliance are controlled by elastic region in structures and are insensitive to constitutive equations. It means that inelastic analysis can be easily applied to obtain compliance characteristics. By utilizing this fact, a simplified inelastic analysis method was proposed based on the characteristics of compliance change for the prediction of strain concentration.

  16. Shape and secondary structure prediction for ncRNAs including pseudoknots based on linear SVM

    PubMed Central

    2013-01-01

    Background Accurate secondary structure prediction provides important information to undefirstafinding the tertiary structures and thus the functions of ncRNAs. However, the accuracy of the native structure derivation of ncRNAs is still not satisfactory, especially on sequences containing pseudoknots. It is recently shown that using the abstract shapes, which retain adjacency and nesting of structural features but disregard the length details of helix and loop regions, can improve the performance of structure prediction. In this work, we use SVM-based feature selection to derive the consensus abstract shape of homologous ncRNAs and apply the predicted shape to structure prediction including pseudoknots. Results Our approach was applied to predict shapes and secondary structures on hundreds of ncRNA data sets with and without psuedoknots. The experimental results show that we can achieve 18% higher accuracy in shape prediction than the state-of-the-art consensus shape prediction tools. Using predicted shapes in structure prediction allows us to achieve approximate 29% higher sensitivity and 10% higher positive predictive value than other pseudoknot prediction tools. Conclusions Extensive analysis of RNA properties based on SVM allows us to identify important properties of sequences and structures related to their shapes. The combination of mass data analysis and SVM-based feature selection makes our approach a promising method for shape and structure prediction. The implemented tools, Knot Shape and Knot Structure are open source software and can be downloaded at: http://www.cse.msu.edu/~achawana/KnotShape. PMID:23369147

  17. Structure-Based Predictive model for Coal Char Combustion.

    SciTech Connect

    Hurt, R.; Colo, J; Essenhigh, R.; Hadad, C; Stanley, E.

    1997-09-24

    During the third quarter of this project, progress was made on both major technical tasks. Progress was made in the chemistry department at OSU on the calculation of thermodynamic properties for a number of model organic compounds. Modelling work was carried out at Brown to adapt a thermodynamic model of carbonaceous mesophase formation, originally applied to pitch carbonization, to the prediction of coke texture in coal combustion. This latter work makes use of the FG-DVC model of coal pyrolysis developed by Advanced Fuel Research to specify the pool of aromatic clusters that participate in the order/disorder transition. This modelling approach shows promise for the mechanistic prediction of the rank dependence of char structure and will therefore be pursued further. Crystalline ordering phenomena were also observed in a model char prepared from phenol-formaldehyde carbonized at 900{degrees}C and 1300{degrees}C using high-resolution TEM fringe imaging. Dramatic changes occur in the structure between 900 and 1300{degrees}C, making this char a suitable candidate for upcoming in situ work on the hot stage TEM. Work also proceeded on molecular dynamics simulations at Boston University and on equipment modification and testing for the combustion experiments with widely varying flame types at Ohio State.

  18. An RNA secondary structure prediction method based on minimum and suboptimal free energy structures.

    PubMed

    Fu, Haoyue; Yang, Lianping; Zhang, Xiangde

    2015-09-01

    The function of an RNA-molecule is mainly determined by its tertiary structures. And its secondary structure is an important determinant of its tertiary structure. The comparative methods usually give better results than the single-sequence methods. Based on minimum and suboptimal free energy structures, the paper presents a novel method for predicting conserved secondary structure of a group of related RNAs. In the method, the information from the known RNA structures is used as training data in a SVM (Support Vector Machine) classifier. Our method has been tested on the benchmark dataset given by Puton et al. The results show that the average sensitivity of our method is higher than that of other comparative methods such as CentroidAlifold, MXScrana, RNAalifold, and TurboFold. PMID:26100179

  19. Structural kinematics based damage zone prediction in gradient structures using vibration database

    NASA Astrophysics Data System (ADS)

    Talha, Mohammad; Ashokkumar, Chimpalthradi R.

    2014-05-01

    To explore the applications of functionally graded materials (FGMs) in dynamic structures, structural kinematics based health monitoring technique becomes an important problem. Depending upon the displacements in three dimensions, the health of the material to withstand dynamic loads is inferred in this paper, which is based on the net compressive and tensile displacements that each structural degree of freedom takes. These net displacements at each finite element node predicts damage zones of the FGM where the material is likely to fail due to a vibration response which is categorized according to loading condition. The damage zone prediction of a dynamically active FGMs plate have been accomplished using Reddy's higher-order theory. The constituent material properties are assumed to vary in the thickness direction according to the power-law behavior. The proposed C0 finite element model (FEM) is applied to get net tensile and compressive displacement distributions across the structures. A plate made of Aluminum/Ziconia is considered to illustrate the concept of structural kinematics-based health monitoring aspects of FGMs.

  20. STRUCTURE-BASED PREDICTIVE MODEL FOR COAL CHAR COMBUSTION

    SciTech Connect

    CHRISTOPHER M. HADAD; JOSEPH M. CALO; ROBERT H. ESSENHIGH; ROBERT H. HURT

    1998-06-04

    During the past quarter of this project, significant progress continued was made on both major technical tasks. Progress was made at OSU on advancing the application of computational chemistry to oxidative attack on model polyaromatic hydrocarbons (PAHs) and graphitic structures. This work is directed at the application of quantitative ab initio molecular orbital theory to address the decomposition products and mechanisms of coal char reactivity. Previously, it was shown that the �hybrid� B3LYP method can be used to provide quantitative information concerning the stability of the corresponding radicals that arise by hydrogen atom abstraction from monocyclic aromatic rings. In the most recent quarter, these approaches have been extended to larger carbocyclic ring systems, such as coronene, in order to compare the properties of a large carbonaceous PAH to that of the smaller, monocyclic aromatic systems. It was concluded that, at least for bond dissociation energy considerations, the properties of the large PAHs can be modeled reasonably well by smaller systems. In addition to the preceding work, investigations were initiated on the interaction of selected radicals in the �radical pool� with the different types of aromatic structures. In particular, the different pathways for addition vs. abstraction to benzene and furan by H and OH radicals were examined. Thus far, the addition channel appears to be significantly favored over abstraction on both kinetic and thermochemical grounds. Experimental work at Brown University in support of the development of predictive structural models of coal char combustion was focused on elucidating the role of coal mineral matter impurities on reactivity. An �inverse� approach was used where a carbon material was doped with coal mineral matter. The carbon material was derived from a high carbon content fly ash (Fly Ash 23 from the Salem Basin Power Plant. The ash was obtained from Pittsburgh #8 coal (PSOC 1451). Doped

  1. STRUCTURE-BASED PREDICTIVE MODEL FOR COAL CHAR COMBUSTION

    SciTech Connect

    CHRISTOPHER M. HADAD; JOSEPH M. CALO; ROBERT H. ESSENHIGH; ROBERT H. HURT

    1998-09-11

    Progress was made this period on a number of tasks. A significant advance was made in the incorporation of macrostructural ideas into high temperature combustion models. Work at OSU by R. Essenhigh in collaboration with the University of Stuttgart has led to a theory that the zone I / II transition in char combustion lies within the range of conditions of interest for pulverized char combustion. The group has presented evidence that some combustion data, previously interpreted with zone II models, in fact takes place in the transition from zone II to zone 1. This idea was used at Brown to make modifications to the CBK model (a char kinetics package specially designed for carbon burnout prediction, currently used by a number of research and furnace modeling groups in academia and industry). The resulting new model version, CBK8, shows improved ability to predict extinction behavior in the late stages of combustion, especially for particles with low ash content. The full development and release of CBK8, along with detailed descriptions of the role of the zone 1/2 transition will be reported on in subsequent reports. ABB-CE is currently implementing CBK7 into a special version of the CFD code Fluent for use in the modeling and design of their boilers. They have been appraised of the development, and have expressed interest in incorporating the new feature, realizing full CBK8 capabilities into their combustion codes. The computational chemistry task at OSU continued to study oxidative pathways for PAH, with emphasis this period on heteroatom containing ring compounds. Preliminary XPS studies were also carried out. Combustion experiments were also carried out at OSU this period, leading to the acquisition of samples at various residence times and the measurement of their oxidation reactivity by nonisothermal TGA techniques. Several members of the project team attended the Carbon Conference this period and made contacts with representatives from the new FETC Consortium

  2. Structure Based Predictive Model for Coal Char Combustion

    SciTech Connect

    Robert Hurt; Joseph Calo; Robert Essenhigh; Christopher Hadad

    2000-12-30

    This unique collaborative project has taken a very fundamental look at the origin of structure, and combustion reactivity of coal chars. It was a combined experimental and theoretical effort involving three universities and collaborators from universities outside the U.S. and from U.S. National Laboratories and contract research companies. The project goal was to improve our understanding of char structure and behavior by examining the fundamental chemistry of its polyaromatic building blocks. The project team investigated the elementary oxidative attack on polyaromatic systems, and coupled with a study of the assembly processes that convert these polyaromatic clusters to mature carbon materials (or chars). We believe that the work done in this project has defined a powerful new science-based approach to the understanding of char behavior. The work on aromatic oxidation pathways made extensive use of computational chemistry, and was led by Professor Christopher Hadad in the Department of Chemistry at Ohio State University. Laboratory experiments on char structure, properties, and combustion reactivity were carried out at both OSU and Brown, led by Principle Investigators Joseph Calo, Robert Essenhigh, and Robert Hurt. Modeling activities were divided into two parts: first unique models of crystal structure development were formulated by the team at Brown (PI'S Hurt and Calo) with input from Boston University and significant collaboration with Dr. Alan Kerstein at Sandia and with Dr. Zhong-Ying chen at SAIC. Secondly, new combustion models were developed and tested, led by Professor Essenhigh at OSU, Dieter Foertsch (a collaborator at the University of Stuttgart), and Professor Hurt at Brown. One product of this work is the CBK8 model of carbon burnout, which has already found practical use in CFD codes and in other numerical models of pulverized fuel combustion processes, such as EPRI's NOxLOI Predictor. The remainder of the report consists of detailed technical

  3. Energy-based RNA consensus secondary structure prediction in multiple sequence alignments.

    PubMed

    Washietl, Stefan; Bernhart, Stephan H; Kellis, Manolis

    2014-01-01

    Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs. PMID:24639158

  4. The effect of ligand-based tautomer and protomer prediction on structure-based virtual screening.

    PubMed

    Kalliokoski, Tuomo; Salo, Heikki S; Lahtela-Kakkonen, Maija; Poso, Antti

    2009-12-01

    As tautomerism and ionization may significantly change the interaction possibilities between a ligand and a target protein, these phenomena could have an effect on structure-based virtual screening. Tautomeric- and protonation-state enumeration ensures that the state with optimal interaction possibilities is included in the screening process, as the predicted state may not always be the optimal binder. However, there is very little information published if tautomer and protomer enumeration actually improves the enrichment of active molecules compared to the alternative of using a predicted form of each molecule. In this study, a retrospective virtual screening was performed using AutoDock on 19 drug targets with a publicly available data set. It is proposed that tautomer and protomer prediction can significantly save computing resources and can yield similar results to enumeration. PMID:19928753

  5. A permutation based simulated annealing algorithm to predict pseudoknotted RNA secondary structures.

    PubMed

    Tsang, Herbert H; Wiese, Kay C

    2015-01-01

    Pseudoknots are RNA tertiary structures which perform essential biological functions. This paper discusses SARNA-Predict-pk, a RNA pseudoknotted secondary structure prediction algorithm based on Simulated Annealing (SA). The research presented here extends previous work of SARNA-Predict and further examines the effect of the new algorithm to include prediction of RNA secondary structure with pseudoknots. An evaluation of the performance of SARNA-Predict-pk in terms of prediction accuracy is made via comparison with several state-of-the-art prediction algorithms using 20 individual known structures from seven RNA classes. We measured the sensitivity and specificity of nine prediction algorithms. Three of these are dynamic programming algorithms: Pseudoknot (pknotsRE), NUPACK, and pknotsRG-mfe. One is using the statistical clustering approach: Sfold and the other five are heuristic algorithms: SARNA-Predict-pk, ILM, STAR, IPknot and HotKnots algorithms. The results presented in this paper demonstrate that SARNA-Predict-pk can out-perform other state-of-the-art algorithms in terms of prediction accuracy. This supports the use of the proposed method on pseudoknotted RNA secondary structure prediction of other known structures. PMID:26558299

  6. HYPLOSP: a knowledge-based approach to protein local structure prediction.

    PubMed

    Chen, Ching-Tai; Lin, Hsin-Nan; Sung, Ting-Yi; Hsu, Wen-Lian

    2006-12-01

    Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, the accuracy of existing methods is limited. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our method. Empirically, the accuracy of the method correlates positively with the local match rate; therefore, we employ it to predict the local structures of positions with a high local match rate. For positions with a low local match rate, we propose a neural network prediction method. To better utilize the knowledge-based and neural network methods, we design a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines both methods. To evaluate the performance of the proposed methods, we first perform cross-validation experiments by applying our knowledge-based method, a neural network method, and HYPLOSP to a large dataset of 3,925 protein chains. We test our methods extensively on three different structural alphabets and evaluate their performance by two widely used criteria, Maximum Deviation of backbone torsion Angle (MDA) and Q(N), which is similar to Q(3) in secondary structure prediction. We then compare HYPLOSP with three previous studies using a dataset of 56 new protein chains. HYPLOSP shows promising results in terms of MDA and Q(N) accuracy and demonstrates its alphabet-independent capability. PMID:17245815

  7. AWSEM-MD: Protein Structure Prediction Using Coarse-grained Physical Potentials and Bioinformatically Based Local Structure Biasing

    PubMed Central

    Davtyan, Aram; Schafer, Nicholas P.; Zheng, Weihua; Clementi, Cecilia; Wolynes, Peter G.; Papoian, Garegin A.

    2012-01-01

    The Associative memory, Water mediated, Structure and Energy Model (AWSEM) is a coarse-grained protein force field. AWSEM contains physically motivated terms, such as hydrogen bonding, as well as a bioinformatically based local structure biasing term, which efficiently takes into account many-body effects that are modulated by the local sequence. When combined with appropriate local or global alignments to choose memories, AWSEM can be used to perform de novo protein structure prediction. Herein we present structure prediction results for a particular choice of local sequence alignment method based on short residue sequences called fragments. We demonstrate the model’s structure prediction capabilities for three levels of global homology between the target sequence and those proteins used for local structure biasing, all of which assume that the structure of the target sequence is not known. When there are no homologs in the database of structures used for local structure biasing, AWSEM calculations produce structural predictions that are somewhat improved compared with prior works using related approaches. The inclusion of a small number of structures from homologous sequences improves structure prediction only marginally but when the fragment search is restricted to only homologous sequences, AWSEM can perform high resolution structure prediction and can be used for kinetics and dynamics studies. PMID:22545654

  8. STRUCTURE-BASED PREDICTIVE MODEL FOR COAL CHAR COMBUSTION

    SciTech Connect

    CHRISTOPHER M. HADAD; JOSEPH M. CALO; ROBERT H. ESSENHIGH; ROBERT H. HURT

    1999-01-13

    Significant progress continued to be made during the past reporting quarter on both major technical tasks. During the reporting period at OSU, computational investigations were conducted of addition vs. abstraction reactions of H, O(3 P), and OH with monocyclic aromatic hydrocarbons. The potential energy surface for more than 80 unique reactions of H, O ( 3 P), and OH with aromatic hydrocarbons were determined at the B3LYP/6-31G(d) level of theory. The calculated transition state barriers and reaction free energies indicate that the addition channel is preferred at 298K, but that the abstraction channel becomes dominant at high temperatures. The thermodynamic preference for reactivity with aromatic hydrocarbons increases in the order O(3 P) < H < OH. Abstraction from six-membered aromatic rings is more facile than abstraction from five-membered aromatic rings. However, addition to five-membered rings is thermodynamically more favorable than addition to six-membered rings. The free energies for the abstraction and addition reactions of H, O, and OH with aromatic hydrocarbons and the characteristics of the respective transition states can be used to calculate the reaction rate constants for these important combustion reactions. Experimental work at Brown University on the effect of reaction on the structural evolution of different chars (i.e., phenolic resin char and chars produced from three different coals) have been investigated in a TGA/TPD-MS system. It has been found that samples of different age of these chars appeared to lose their "memory" concerning their initial structures at high burn-offs. During the reporting period, thermal desorption experiments of selected samples were conducted. These spectra show that the population of low temperature oxygen surface complexes, which are primarily responsible for reactivity, are more similar for the high burn-off than for the low burn-off samples of different ages; i.e., the population of active sites are more

  9. PSRna: Prediction of small RNA secondary structures based on reverse complementary folding method.

    PubMed

    Li, Jin; Xu, Chengzhen; Wang, Lei; Liang, Hong; Feng, Weixing; Cai, Zhongxi; Wang, Ying; Cong, Wang; Liu, Yunlong

    2016-08-01

    Prediction of RNA secondary structures is an important problem in computational biology and bioinformatics, since RNA secondary structures are fundamental for functional analysis of RNA molecules. However, small RNA secondary structures are scarce and few algorithms have been specifically designed for predicting the secondary structures of small RNAs. Here we propose an algorithm named "PSRna" for predicting small-RNA secondary structures using reverse complementary folding and characteristic hairpin loops of small RNAs. Unlike traditional algorithms that usually generate multi-branch loops and 5[Formula: see text] end self-folding, PSRna first estimated the maximum number of base pairs of RNA secondary structures based on the dynamic programming algorithm and a path matrix is constructed at the same time. Second, the backtracking paths are extracted from the path matrix based on backtracking algorithm, and each backtracking path represents a secondary structure. To improve accuracy, the predicted RNA secondary structures are filtered based on their free energy, where only the secondary structure with the minimum free energy was identified as the candidate secondary structure. Our experiments on real data show that the proposed algorithm is superior to two popular methods, RNAfold and RNAstructure, in terms of sensitivity, specificity and Matthews correlation coefficient (MCC). PMID:27045556

  10. Protein Structure Prediction using a Docking-based Hierarchical Folding scheme

    PubMed Central

    Kifer, Ilona; Nussinov, Ruth; Wolfson, Haim J.

    2011-01-01

    The pathways by which proteins fold into their specific native structure is still an unsolved mystery. Currently many methods for protein structure prediction are available, most of them tackle the problem by relying on the vast amounts of data collected from known protein structures. These methods are often not concerned with the route the protein follows to reach its final fold. This work is based on the premise that proteins fold in a hierarchical manner. We present FOBIA, an automated method for predicting a protein structure. FOBIA consists of two main stages: the first finds matches between parts of the target sequence and independently-folding structural units using profile-profile comparison. The second assembles these units into a 3D structure by searching and ranking their possible orientations towards each other using a docking-based approach. We have previously reported an application of an initial version of this strategy to homology based targets. Since then we have considerably enhanced our method’s abilities to allow it to address the more difficult template-based target category. This allows us to now apply FOBIA to the Template-Based targets of CASP8 and to show that it is both very efficient and promising. Our method can provide an alternative for Template-Based structure prediction, and in particular, the docking-based ranking technique presented here can be incorporated into any profile-profile comparison based method. PMID:21445943

  11. Protein structure prediction using a docking-based hierarchical folding scheme.

    PubMed

    Kifer, Ilona; Nussinov, Ruth; Wolfson, Haim J

    2011-06-01

    The pathways by which proteins fold into their specific native structure are still an unsolved mystery. Currently, many methods for protein structure prediction are available, and most of them tackle the problem by relying on the vast amounts of data collected from known protein structures. These methods are often not concerned with the route the protein follows to reach its final fold. This work is based on the premise that proteins fold in a hierarchical manner. We present FOBIA, an automated method for predicting a protein structure. FOBIA consists of two main stages: the first finds matches between parts of the target sequence and independently folding structural units using profile-profile comparison. The second assembles these units into a 3D structure by searching and ranking their possible orientations toward each other using a docking-based approach. We have previously reported an application of an initial version of this strategy to homology based targets. Since then we have considerably enhanced our method's abilities to allow it to address the more difficult template-based target category. This allows us to now apply FOBIA to the template-based targets of CASP8 and to show that it is both very efficient and promising. Our method can provide an alternative for template-based structure prediction, and in particular, the docking-basedranking technique presented here can be incorporated into any profile-profile comparison based method. PMID:21445943

  12. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    PubMed

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods. PMID:21787299

  13. Link prediction based on hyperbolic mapping with community structure for complex networks

    NASA Astrophysics Data System (ADS)

    Wang, Zuxi; Wu, Yao; Li, Qingguang; Jin, Fengdong; Xiong, Wei

    2016-05-01

    Link prediction is becoming a concerned topic in the complex network field in recent years. However, the existing link prediction methods are unsatisfactory for processing topological information and have high time complexity. This paper presents a novel method of Link Prediction with Community Structure (LPCS) based on hyperbolic mapping. Different from the existing link prediction methods, to utilize global structure information of the network, LPCS deals with the network from an overall perspective. LPCS takes full advantage of the community structure and its hierarchical organization to map networks into hyperbolic space, and obtains the hyperbolic coordinates which depict the global structure information of the network, then uses hyperbolic distance to describe the similarity between the nodes, finally predicts missing links according to the degree of the similarity between unconnected node pairs. The combination of the hyperbolic geometry framework and the community structure makes LPCS perform well in predicting missing links, and the time complexity of LPCS is linear, which makes LPCS can be applied to handle large scale networks in acceptable time. LPCS outperforms many state-of-the-art link prediction methods in the networks obeying power-law degree distribution.

  14. Relative packing groups in template-based structure prediction: cooperative effects of true positive constraints.

    PubMed

    Day, Ryan; Qu, Xiaotao; Swanson, Rosemarie; Bohannan, Zach; Bliss, Robert; Tsai, Jerry

    2011-01-01

    Most current template-based structure prediction methods concentrate on finding the correct backbone conformation and then packing sidechains within that backbone. Our packing-based method derives distance constraints from conserved relative packing groups (RPGs). In our refinement approach, the RPGs provide a level of resolution that restrains global topology while allowing conformational sampling. In this study, we test our template-based structure prediction method using 51 prediction units from CASP7 experiments. RPG-based constraints are able to substantially improve approximately two-thirds of starting templates. Upon deeper investigation, we find that true positive spatial constraints, especially those non-local in sequence, derived from the RPGs were important to building nearer native models. Surprisingly, the fraction of incorrect or false positive constraints does not strongly influence the quality of the final candidate. This result indicates that our RPG-based true positive constraints sample the self-consistent, cooperative interactions of the native structure. The lack of such reinforcing cooperativity explains the weaker effect of false positive constraints. Generally, these findings are encouraging indications that RPGs will improve template-based structure prediction. PMID:21210729

  15. A novel method for structure-based prediction of ion channel conductance properties.

    PubMed Central

    Smart, O S; Breed, J; Smith, G R; Sansom, M S

    1997-01-01

    A rapid and easy-to-use method of predicting the conductance of an ion channel from its three-dimensional structure is presented. The method combines the pore dimensions of the channel as measured in the HOLE program with an Ohmic model of conductance. An empirically based correction factor is then applied. The method yielded good results for six experimental channel structures (none of which were included in the training set) with predictions accurate to within an average factor of 1.62 to the true values. The predictive r2 was equal to 0.90, which is indicative of a good predictive ability. The procedure is used to validate model structures of alamethicin and phospholamban. Two genuine predictions for the conductance of channels with known structure but without reported conductances are given. A modification of the procedure that calculates the expected results for the effect of the addition of nonelectrolyte polymers on conductance is set out. Results for a cholera toxin B-subunit crystal structure agree well with the measured values. The difficulty in interpreting such studies is discussed, with the conclusion that measurements on channels of known structure are required. Images FIGURE 1 FIGURE 3 FIGURE 4 FIGURE 6 FIGURE 10 PMID:9138559

  16. Structural link prediction based on ant colony approach in social networks

    NASA Astrophysics Data System (ADS)

    Sherkat, Ehsan; Rahgozar, Maseud; Asadpour, Masoud

    2015-02-01

    As the size and number of online social networks are increasing day by day, social network analysis has become a popular issue in many branches of science. The link prediction is one of the key rolling issues in the analysis of social network's evolution. As the size of social networks is increasing, the necessity for scalable link prediction algorithms is being felt more. The aim of this paper is to introduce a new unsupervised structural link prediction algorithm based on the ant colony approach. Recently, ant colony approach has been used for solving some graph problems. Different kinds of networks are used for testing the proposed approach. In some networks, the proposed scalable algorithm has the best result in comparison to other structural unsupervised link prediction algorithms. In order to evaluate the algorithm results, methods like the top- n precision, area under the Receiver Operating Characteristic (ROC) and Precision-Recall curves are carried out on real-world networks.

  17. Genomic-scale comparison of sequence- and structure-based methods of function prediction: Does structure provide additional insight?

    PubMed Central

    Fetrow, Jacquelyn S.; Siew, Naomi; Di Gennaro, Jeannine A.; Martinez-Yamout, Maria; Dyson, H. Jane; Skolnick, Jeffrey

    2001-01-01

    A function annotation method using the sequence-to-structure-to-function paradigm is applied to the identification of all disulfide oxidoreductases in the Saccharomyces cerevisiae genome. The method identifies 27 sequences as potential disulfide oxidoreductases. All previously known thioredoxins, glutaredoxins, and disulfide isomerases are correctly identified. Three of the 27 predictions are probable false-positives. Three novel predictions, which subsequently have been experimentally validated, are presented. Two additional novel predictions suggest a disulfide oxidoreductase regulatory mechanism for two subunits (OST3 and OST6) of the yeast oligosaccharyltransferase complex. Based on homology, this prediction can be extended to a potential tumor suppressor gene, N33, in humans, whose biochemical function was not previously known. Attempts to obtain a folded, active N33 construct to test the prediction were unsuccessful. The results show that structure prediction coupled with biochemically relevant structural motifs is a powerful method for the function annotation of genome sequences and can provide more detailed, robust predictions than function prediction methods that rely on sequence comparison alone. PMID:11316881

  18. A protein structural classes prediction method based on PSI-BLAST profile.

    PubMed

    Ding, Shuyan; Yan, Shoujiang; Qi, Shuhua; Li, Yan; Yao, Yuhua

    2014-07-21

    Knowledge of protein structural classes plays an important role in understanding protein folding patterns. Prediction of protein structural class based solely on sequence data remains to be a challenging problem. In this study, we extract the long-range correlation information and linear correlation information from position-specific score matrix (PSSM). A total of 3600 features are extracted, then, 278 features are selected by a filter feature selection method based on 1189 dataset. To verify the performance of our method (named by LCC-PSSM), jackknife tests are performed on three widely used low similarity benchmark datasets. Comparison of our results with the existing methods shows that our method provides the favorable performance for protein structural class prediction. Stand-alone version of the proposed method (LCC-PSSM) is written in MATLAB language and it can be downloaded from http://bioinfo.zstu.edu.cn/LCC-PSSM/. PMID:24607742

  19. Structural predictions based on the compositions of cathodic materials by first-principles calculations

    NASA Astrophysics Data System (ADS)

    Li, Yang; Lian, Fang; Chen, Ning; Hao, Zhen-jia; Chou, Kuo-chih

    2015-05-01

    A first-principles method is applied to comparatively study the stability of lithium metal oxides with layered or spinel structures to predict the most energetically favorable structure for different compositions. The binding and reaction energies of the real or virtual layered LiMO2 and spinel LiM2O4 (M = Sc-Cu, Y-Ag, Mg-Sr, and Al-In) are calculated. The effect of element M on the structural stability, especially in the case of multiple-cation compounds, is discussed herein. The calculation results indicate that the phase stability depends on both the binding and reaction energies. The oxidation state of element M also plays a role in determining the dominant structure, i.e., layered or spinel phase. Moreover, calculation-based theoretical predictions of the phase stability of the doped materials agree with the previously reported experimental data.

  20. An integrative structure-based framework for predicting biological effects mediated by antipeptide antibodies.

    PubMed

    Caoili, Salvador Eugenio C

    2015-12-01

    A general framework is presented for predicting quantitative biological effects mediated by antipeptide antibodies, primarily on the basis of antigen structure (possibly featuring intrinsic disorder) analyzed to estimate epitope-paratope binding affinities, which in turn is considered within the context of dose-response relationships as regards antibody concentration. This is illustrated mainly using an approach based on protein structural energetics, whereby expected amounts of solvent-accessible surface area buried upon epitope-paratope binding are related to the corresponding binding affinity, which is estimated from putative B-cell epitope structure with implicit treatment of paratope structure, for antipeptide antibodies either reacting with peptides or cross-reacting with cognate protein antigens. Key methods described are implemented in SAPPHIRE/SUITE (Structural-energetic Analysis Program for Predicting Humoral Immune Response Epitopes/SAPPHIRE User Interface Tool Ensemble; publicly accessible via http://freeshell.de/~badong/suite.htm). Representative results thus obtained are compared with published experimental data on binding affinities and quantitative biological effects, with special attention to loss of paratope sidechain conformational entropy (neglected in previous analyses) and in light of key in-vivo constraints on antigen-antibody binding affinity and antibody-mediated effects. Implications for further refinement of B-cell epitope prediction methods are discussed as regards envisioned biomedical applications including the development of prophylactic and therapeutic antibodies, peptide-based vaccines and immunodiagnostics. PMID:26410103

  1. The expert system for toxicity prediction of chemicals based on structure-activity relationship.

    PubMed Central

    Nakadate, M; Hayashi, M; Sofuni, T; Kamata, E; Aida, Y; Osada, T; Ishibe, T; Sakamura, Y; Ishidate, M

    1991-01-01

    The prediction systems of chemical toxicity has been developed by means of structure-activity relationship based on the computerized fact database (BL-DB). Numbers and ratio of elements, side chains, bonding, position, and microenvironment of side chains were used as structural factors of the chemical for the prediction. Such information was obtained from the BL-DB database by Wiswesser line-formula chemical notation. In the present study, the Salmonella/microsome assay was chosen as indicative of the target toxicity of chemicals. A set of chemicals specified with mutagenicity data was retrieved, and necessary information was extracted and transferred to the working file. Rules of the relations between characteristics of chemical structure and the assay result are extracted as parameters for rules by experts on the rearranged data set. These were analyzed statistically by the discriminant analysis and the prediction with the rules were evaluated by the elimination method. Eight kinds of rules to predict Salmonella/microsome assay were constructed, and currently results of the assay on aliphatic and heterocyclic compounds can be predicted as accurately as +90%. PMID:1820282

  2. Structure based function prediction of proteins using fragment library frequency vectors

    PubMed Central

    Yadav, Akshay; Jayaraman, Valadi Krishnamoorthy

    2012-01-01

    The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10. PMID:23144557

  3. Towards universal structure-based prediction of class II MHC epitopes for diverse allotypes.

    PubMed

    Bordner, Andrew J

    2010-01-01

    The binding of peptide fragments of antigens to class II MHC proteins is a crucial step in initiating a helper T cell immune response. The discovery of these peptide epitopes is important for understanding the normal immune response and its misregulation in autoimmunity and allergies and also for vaccine design. In spite of their biomedical importance, the high diversity of class II MHC proteins combined with the large number of possible peptide sequences make comprehensive experimental determination of epitopes for all MHC allotypes infeasible. Computational methods can address this need by predicting epitopes for a particular MHC allotype. We present a structure-based method for predicting class II epitopes that combines molecular mechanics docking of a fully flexible peptide into the MHC binding cleft followed by binding affinity prediction using a machine learning classifier trained on interaction energy components calculated from the docking solution. Although the primary advantage of structure-based prediction methods over the commonly employed sequence-based methods is their applicability to essentially any MHC allotype, this has not yet been convincingly demonstrated. In order to test the transferability of the prediction method to different MHC proteins, we trained the scoring method on binding data for DRB1*0101 and used it to make predictions for multiple MHC allotypes with distinct peptide binding specificities including representatives from the other human class II MHC loci, HLA-DP and HLA-DQ, as well as for two murine allotypes. The results showed that the prediction method was able to achieve significant discrimination between epitope and non-epitope peptides for all MHC allotypes examined, based on AUC values in the range 0.632-0.821. We also discuss how accounting for peptide binding in multiple registers to class II MHC largely explains the systematically worse performance of prediction methods for class II MHC compared with those for class I MHC

  4. An Energy Based Fatigue Life Prediction Framework for In-Service Structural Components

    SciTech Connect

    H. Ozaltun; M. H.H. Shen; T. George; C. Cross

    2011-06-01

    An energy based fatigue life prediction framework has been developed for calculation of remaining fatigue life of in service gas turbine materials. The purpose of the life prediction framework is to account aging effect caused by cyclic loadings on fatigue strength of gas turbine engines structural components which are usually designed for very long life. Previous studies indicate the total strain energy dissipated during a monotonic fracture process and a cyclic process is a material property that can be determined by measuring the area underneath the monotonic true stress-strain curve and the sum of the area within each hysteresis loop in the cyclic process, respectively. The energy-based fatigue life prediction framework consists of the following entities: (1) development of a testing procedure to achieve plastic energy dissipation per life cycle and (2) incorporation of an energy-based fatigue life calculation scheme to determine the remaining fatigue life of in-service gas turbine materials. The accuracy of the remaining fatigue life prediction method was verified by comparison between model approximation and experimental results of Aluminum 6061-T6. The comparison shows promising agreement, thus validating the capability of the framework to produce accurate fatigue life prediction.

  5. Protein subcellular localization prediction based on compartment-specific features and structure conservation

    PubMed Central

    Su, Emily Chia-Yu; Chiu, Hua-Sheng; Lo, Allan; Hwang, Jenn-Kang; Sung, Ting-Yi; Hsu, Wen-Lian

    2007-01-01

    Background Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. Results We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. Conclusion Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant

  6. Genetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices.

    PubMed

    Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S

    2015-11-13

    The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science

  7. Automated protein motif generation in the structure-based protein function prediction tool ProMOL.

    PubMed

    Osipovitch, Mikhail; Lambrecht, Mitchell; Baker, Cameron; Madha, Shariq; Mills, Jeffrey L; Craig, Paul A; Bernstein, Herbert J

    2015-12-01

    ProMOL, a plugin for the PyMOL molecular graphics system, is a structure-based protein function prediction tool. ProMOL includes a set of routines for building motif templates that are used for screening query structures for enzyme active sites. Previously, each motif template was generated manually and required supervision in the optimization of parameters for sensitivity and selectivity. We developed an algorithm and workflow for the automation of motif building and testing routines in ProMOL. The algorithm uses a set of empirically derived parameters for optimization and requires little user intervention. The automated motif generation algorithm was first tested in a performance comparison with a set of manually generated motifs based on identical active sites from the same 112 PDB entries. The two sets of motifs were equally effective in identifying alignments with homologs and in rejecting alignments with unrelated structures. A second set of 296 active site motifs were generated automatically, based on Catalytic Site Atlas entries with literature citations, as an expansion of the library of existing manually generated motif templates. The new motif templates exhibited comparable performance to the existing ones in terms of hit rates against native structures, homologs with the same EC and Pfam designations, and randomly selected unrelated structures with a different EC designation at the first EC digit, as well as in terms of RMSD values obtained from local structural alignments of motifs and query structures. This research is supported by NIH grant GM078077. PMID:26573864

  8. SPACE: a suite of tools for protein structure prediction and analysis based on complementarity and environment.

    PubMed

    Sobolev, Vladimir; Eyal, Eran; Gerzon, Sergey; Potapov, Vladimir; Babor, Mariana; Prilusky, Jaime; Edelman, Marvin

    2005-07-01

    We describe a suite of SPACE tools for analysis and prediction of structures of biomolecules and their complexes. LPC/CSU software provides a common definition of inter-atomic contacts and complementarity of contacting surfaces to analyze protein structure and complexes. In the current version of LPC/CSU, analyses of water molecules and nucleic acids have been added, together with improved and expanded visualization options using Chime or Java based Jmol. The SPACE suite includes servers and programs for: structural analysis of point mutations (MutaProt); side chain modeling based on surface complementarity (SCCOMP); building a crystal environment and analysis of crystal contacts (CryCo); construction and analysis of protein contact maps (CMA) and molecular docking software (LIGIN). The SPACE suite is accessed at http://ligin.weizmann.ac.il/space. PMID:15980496

  9. Structure-based activity prediction for an enzyme of unknown function

    PubMed Central

    Hermann, Johannes C.; Marti-Arbona, Ricardo; Fedorov, Alexander A.; Fedorov, Elena; Almo, Steven C.; Shoichet, Brian K.; Raushel, Frank M.

    2008-01-01

    With many genomes sequenced, a pressing challenge in biology is predicting the function of the proteins that the genes encode. When proteins are unrelated to others of known activity, bioinformatics inference for function becomes problematic. It would thus be useful to interrogate protein structures for function directly. Here, we predict the function of an enzyme of unknown activity, Tm0936 from Thermotoga maritima, by docking high-energy intermediate forms of thousands of candidate metabolites. The docking hit list was dominated by adenine analogues, which appeared to undergo C6-deamination. Four of these, including 5-methylthioadenosine and S-adenosylhomocysteine (SAH), were tested as substrates, and three had substantial catalytic rate constants (105 M−1s−1). The X-ray crystal structure of the complex between Tm0936 and the product resulting from the deamination of SAH, S-inosylhomocysteine, was determined, and it corresponded closely to the predicted structure. The deaminated products can be further metabolized by T. maritima in a previously uncharacterized SAH degradation pathway. Structure-based docking with high-energy forms of potential substrates may be a useful tool to annotate enzymes for function. PMID:17603473

  10. Predicting protein structural classes based on complex networks and recurrence analysis.

    PubMed

    Olyaee, Mohammad H; Yaghoubi, Ali; Yaghoobi, Mahdi

    2016-09-01

    Protein sequences are divided into four structural classes. The determination of class is a challenging and beneficial task in the bioinformatics field. Several methods have been proposed to this end, but most utilize too many features and produce unsuitable results. In the present, features are extracted based on the predicted secondary structures. At first, predicted secondary structure sequences are mapped into two time series by the chaos game representation. Then, a recurrence matrix is calculated from each of the time series. The recurrence matrix is identified with the adjacency matrix of a complex network and measures are applied for the characterization of complex networks to these recurrence matrixes. For a given protein sequence, a total of 24 characteristic features can be calculated and these are fed into Fisher's discriminated analysis algorithm for classification. To examine the proposed method, two widely used low similarity benchmark datasets design and test its performance. A comparison with the results of existing methods shows that the current study's approach provides a satisfactory performance for protein structural class prediction. PMID:27320678

  11. Fast reconstruction and prediction of frozen flow turbulence based on structured Kalman filtering.

    PubMed

    Fraanje, Rufus; Rice, Justin; Verhaegen, Michel; Doelman, Niek

    2010-11-01

    Efficient and optimal prediction of frozen flow turbulence using the complete observation history of the wavefront sensor is an important issue in adaptive optics for large ground-based telescopes. At least for the sake of error budgeting and algorithm performance, the evaluation of an accurate estimate of the optimal performance of a particular adaptive optics configuration is important. However, due to the large number of grid points, high sampling rates, and the non-rationality of the turbulence power spectral density, the computational complexity of the optimal predictor is huge. This paper shows how a structure in the frozen flow propagation can be exploited to obtain a state-space innovation model with a particular sparsity structure. This sparsity structure enables one to efficiently compute a structured Kalman filter. By simulation it is shown that the performance can be improved and the computational complexity can be reduced in comparison with auto-regressive predictors of low order. PMID:21045884

  12. 3D Chemical Similarity Networks for Structure-Based Target Prediction and Scaffold Hopping.

    PubMed

    Lo, Yu-Chen; Senese, Silvia; Damoiseaux, Robert; Torres, Jorge Z

    2016-08-19

    Target identification remains a major challenge for modern drug discovery programs aimed at understanding the molecular mechanisms of drugs. Computational target prediction approaches like 2D chemical similarity searches have been widely used but are limited to structures sharing high chemical similarity. Here, we present a new computational approach called chemical similarity network analysis pull-down 3D (CSNAP3D) that combines 3D chemical similarity metrics and network algorithms for structure-based drug target profiling, ligand deorphanization, and automated identification of scaffold hopping compounds. In conjunction with 2D chemical similarity fingerprints, CSNAP3D achieved a >95% success rate in correctly predicting the drug targets of 206 known drugs. Significant improvement in target prediction was observed for HIV reverse transcriptase (HIVRT) compounds, which consist of diverse scaffold hopping compounds targeting the nucleotidyltransferase binding site. CSNAP3D was further applied to a set of antimitotic compounds identified in a cell-based chemical screen and identified novel small molecules that share a pharmacophore with Taxol and display a Taxol-like mechanism of action, which were validated experimentally using in vitro microtubule polymerization assays and cell-based assays. PMID:27285961

  13. Highly Accurate Structure-Based Prediction of HIV-1 Coreceptor Usage Suggests Intermolecular Interactions Driving Tropism

    PubMed Central

    Kieslich, Chris A.; Tamamis, Phanourios; Guzman, Yannis A.; Onel, Melis; Floudas, Christodoulos A.

    2016-01-01

    HIV-1 entry into host cells is mediated by interactions between the V3-loop of viral glycoprotein gp120 and chemokine receptor CCR5 or CXCR4, collectively known as HIV-1 coreceptors. Accurate genotypic prediction of coreceptor usage is of significant clinical interest and determination of the factors driving tropism has been the focus of extensive study. We have developed a method based on nonlinear support vector machines to elucidate the interacting residue pairs driving coreceptor usage and provide highly accurate coreceptor usage predictions. Our models utilize centroid-centroid interaction energies from computationally derived structures of the V3-loop:coreceptor complexes as primary features, while additional features based on established rules regarding V3-loop sequences are also investigated. We tested our method on 2455 V3-loop sequences of various lengths and subtypes, and produce a median area under the receiver operator curve of 0.977 based on 500 runs of 10-fold cross validation. Our study is the first to elucidate a small set of specific interacting residue pairs between the V3-loop and coreceptors capable of predicting coreceptor usage with high accuracy across major HIV-1 subtypes. The developed method has been implemented as a web tool named CRUSH, CoReceptor USage prediction for HIV-1, which is available at http://ares.tamu.edu/CRUSH/. PMID:26859389

  14. Template-based structure prediction and classification of transcription factors in Arabidopsis thaliana

    PubMed Central

    Lu, Tao; Yang, Yuedong; Yao, Bo; Liu, Song; Zhou, Yaoqi; Zhang, Chi

    2012-01-01

    Transcription factors (TFs) play important roles in plants. However, there is no systematic study of their structures and functions of most TFs in plants. Here, we performed template-based structure prediction for all TFs in Arabidopsis thaliana, with their full-length sequences as well as C-terminal and N-terminal regions. A total of 2918 model structures were obtained with a high confidence score. We find that TF families employ only a smaller number of templates for DNA-binding domains (DBD) but a diverse number of templates for transcription regulatory domains (TRD). Although TF families are classified according to DBD, their sizes have a significant correlation with the number of unique non-DNA-binding templates employed in the family (Pearson correlation coefficient of 0.74). That is, the size of TF family is related to its functional diversity. Network analysis reveals new connections between TF families based on shared TRD or DBD templates; 81% TF families share DBD and 67% share TRD templates. Two large fully connected family clusters in this network are observed along with 69 island families. In addition, 25 genes with unknown functions are found to be DNA-binding and/or TF factors according to predicted structures. This work provides a global view of the classification of TFs based on their DBD or TRD templates, and hence, a deeper understanding of DNA-binding and regulatory functions from structural perspective. All structural models of TFs are deposited in the online database for public usage at http://sysbio.unl.edu/AthTF. PMID:22549903

  15. Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction

    PubMed Central

    Bouziane, Hafida; Messabih, Belhadri; Chouarfia, Abdallah

    2011-01-01

    Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted attention recently. Today, the main goal remains to improve the prediction quality of the secondary structure elements. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences. In this paper, we investigate how best to combine k-NNs, ANNs and Multi-class SVMs (M-SVMs) to improve secondary structure prediction of globular proteins. An ensemble method which combines the outputs of two feed-forward ANNs, k-NN and three M-SVM classifiers has been applied. Ensemble members are combined using two variants of majority voting rule. An heuristic based filter has also been applied to refine the prediction. To investigate how much improvement the general ensemble method can give rather than the individual classifiers that make up the ensemble, we have experimented with the proposed system on the two widely used benchmark datasets RS126 and CB513 using cross-validation tests by including PSI-BLAST position-specific scoring matrix (PSSM) profiles as inputs. The experimental results reveal that the proposed system yields significant performance gains when compared with the best individual classifier. PMID:22058650

  16. Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.

    PubMed

    Jia, Lei; Yarlagadda, Ramya; Reed, Charles C

    2015-01-01

    Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html) is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG) and melting temperature change (dTm) were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor) and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models. PMID:26361227

  17. Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools

    PubMed Central

    Jia, Lei; Yarlagadda, Ramya; Reed, Charles C.

    2015-01-01

    Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find “hot spots” in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html) is a public database that consists of thousands of protein mutants’ experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG) and melting temperature change (dTm) were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor) and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models. PMID:26361227

  18. Structure-based prediction of transcription factor binding specificity using an integrative energy function

    PubMed Central

    Farrel, Alvin; Murphy, Jonathan; Guo, Jun-tao

    2016-01-01

    Transcription factors (TFs) regulate gene expression through binding to specific target DNA sites. Accurate annotation of transcription factor binding sites (TFBSs) at genome scale represents an essential step toward our understanding of gene regulation networks. In this article, we present a structure-based method for computational prediction of TFBSs using a novel, integrative energy (IE) function. The new energy function combines a multibody (MB) knowledge-based potential and two atomic energy terms (hydrogen bond and π interaction) that might not be accurately captured by the knowledge-based potential owing to the mean force nature and low count problem. We applied the new energy function to the TFBS prediction using a non-redundant dataset that consists of TFs from 12 different families. Our results show that the new IE function improves the prediction accuracy over the knowledge-based, statistical potentials, especially for homeodomain TFs, the second largest TF family in mammals. Contact: jguo4@uncc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307632

  19. Structure-based predictions broadly link transcription factor mutations to gene expression changes in cancers

    PubMed Central

    Ashworth, Justin; Bernard, Brady; Reynolds, Sheila; Plaisier, Christopher L.; Shmulevich, Ilya; Baliga, Nitin S.

    2014-01-01

    Thousands of unique mutations in transcription factors (TFs) arise in cancers, and the functional and biological roles of relatively few of these have been characterized. Here, we used structure-based methods developed specifically for DNA-binding proteins to systematically predict the consequences of mutations in several TFs that are frequently mutated in cancers. The explicit consideration of protein–DNA interactions was crucial to explain the roles and prevalence of mutations in TP53 and RUNX1 in cancers, and resulted in a higher specificity of detection for known p53-regulated genes among genetic associations between TP53 genotypes and genome-wide expression in The Cancer Genome Atlas, compared to existing methods of mutation assessment. Biophysical predictions also indicated that the relative prevalence of TP53 missense mutations in cancer is proportional to their thermodynamic impacts on protein stability and DNA binding, which is consistent with the selection for the loss of p53 transcriptional function in cancers. Structure and thermodynamics-based predictions of the impacts of missense mutations that focus on specific molecular functions may be increasingly useful for the precise and large-scale inference of aberrant molecular phenotypes in cancer and other complex diseases. PMID:25378323

  20. B-Pred, a structure based B-cell epitopes prediction server.

    PubMed

    Giacò, Luciano; Amicosante, Massimo; Fraziano, Maurizio; Gherardini, Pier Federico; Ausiello, Gabriele; Helmer-Citterich, Manuela; Colizzi, Vittorio; Cabibbo, Andrea

    2012-01-01

    The ability to predict immunogenic regions in selected proteins by in-silico methods has broad implications, such as allowing a quick selection of potential reagents to be used as diagnostics, vaccines, immunotherapeutics, or research tools in several branches of biological and biotechnological research. However, the prediction of antibody target sites in proteins using computational methodologies has proven to be a highly challenging task, which is likely due to the somewhat elusive nature of B-cell epitopes. This paper proposes a web-based platform for scoring potential immunological reagents based on the structures or 3D models of the proteins of interest. The method scores a protein's peptides set, which is derived from a sliding window, based on the average solvent exposure, with a filter on the average local model quality for each peptide. The platform was validated on a custom-assembled database of 1336 experimentally determined epitopes from 106 proteins for which a reliable 3D model could be obtained through standard modeling techniques. Despite showing poor sensitivity, this method can achieve a specificity of 0.70 and a positive predictive value of 0.29 by combining these two simple parameters. These values are slightly higher than those obtained with other established sequence-based or structure-based methods that have been evaluated using the same epitopes dataset. This method is implemented in a web server called B-Pred, which is accessible at http://immuno.bio.uniroma2.it/bpred. The server contains a number of original features that allow users to perform personalized reagent searches by manipulating the sliding window's width and sliding step, changing the exposure and model quality thresholds, and running sequential queries with different parameters. The B-Pred server should assist experimentalists in the rational selection of epitope antigens for a wide range of applications. PMID:22888263

  1. Delineating the relationship between amyotrophic lateral sclerosis and frontotemporal dementia: Sequence and structure-based predictions.

    PubMed

    Kumar, Vijay; Islam, Asimul; Hassan, Md Imtaiyaz; Ahmad, Faizan

    2016-09-01

    Amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) are related neurodegenerative disorders which are characterized by a rapid decline in cognitive and motor functions, and short survival. Both syndromes may be present within the same family or even in the same person. The genetic findings for both diseases also support the existence of a continuum, with mutations in the same genes being found in patients with ALS, FTD or FTD/ALS. Little is known about the molecular mechanisms underlying the differences in mutations of the same protein causing either ALS or FTD. Here, we shed light on 348 ALS and FTD missense mutations in 14 genes focusing on genic intolerance and protein stability based on available 3D structures. Using EvoTol, we prioritized the disease-causing genes and their domain. The most intolerant genes predicted by EvoTol are SQSTM1 and OPTN which are involved in protein homeostasis. Further, using ENCoM (Elastic Network Contact Model) that predicts stability based on vibrational entropy, we predicted that most of the missense mutations with destabilizing energies are in the structural regions that control the protein-protein interaction, and only a few mutations affect protein folding. We found a trend that energy changes are higher for ALS compared to FTD mutations. The stability of the ALS mutants correlated well with the duration of disease progression as compared to FTD-ALS mutants. This study provides a comprehensive understanding of the mechanism of ALS and illustrates the significance of structure-energy based studies in differentiating ALS and FTD mutations. PMID:27318084

  2. CASP11 - An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline.

    PubMed

    Fischer, Axel W; Heinze, Sten; Putnam, Daniel K; Li, Bian; Pino, James C; Xia, Yan; Lopez, Carlos F; Meiler, Jens

    2016-01-01

    In silico prediction of a protein's tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three 'assisted' protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data. PMID:27046050

  3. CASP11 – An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline

    PubMed Central

    Fischer, Axel W.; Heinze, Sten; Putnam, Daniel K.; Li, Bian; Pino, James C.; Xia, Yan; Lopez, Carlos F.; Meiler, Jens

    2016-01-01

    In silico prediction of a protein’s tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three ‘assisted’ protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data. PMID:27046050

  4. FPGA accelerator for protein secondary structure prediction based on the GOR algorithm

    PubMed Central

    2011-01-01

    Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582

  5. Molecular Simulation-Based Structural Prediction of Protein Complexes in Mass Spectrometry: The Human Insulin Dimer

    PubMed Central

    Li, Jinyu; Rossetti, Giulia; Dreyer, Jens; Raugei, Simone; Ippoliti, Emiliano; Lüscher, Bernhard; Carloni, Paolo

    2014-01-01

    Protein electrospray ionization (ESI) mass spectrometry (MS)-based techniques are widely used to provide insight into structural proteomics under the assumption that non-covalent protein complexes being transferred into the gas phase preserve basically the same intermolecular interactions as in solution. Here we investigate the applicability of this assumption by extending our previous structural prediction protocol for single proteins in ESI-MS to protein complexes. We apply our protocol to the human insulin dimer (hIns2) as a test case. Our calculations reproduce the main charge and the collision cross section (CCS) measured in ESI-MS experiments. Molecular dynamics simulations for 0.075 ms show that the complex maximizes intermolecular non-bonded interactions relative to the structure in water, without affecting the cross section. The overall gas-phase structure of hIns2 does exhibit differences with the one in aqueous solution, not inferable from a comparison with calculated CCS. Hence, care should be exerted when interpreting ESI-MS proteomics data based solely on NMR and/or X-ray structural information. PMID:25210764

  6. Molecular simulation-based structural prediction of protein complexes in mass spectrometry: the human insulin dimer.

    PubMed

    Li, Jinyu; Rossetti, Giulia; Dreyer, Jens; Raugei, Simone; Ippoliti, Emiliano; Lüscher, Bernhard; Carloni, Paolo

    2014-09-01

    Protein electrospray ionization (ESI) mass spectrometry (MS)-based techniques are widely used to provide insight into structural proteomics under the assumption that non-covalent protein complexes being transferred into the gas phase preserve basically the same intermolecular interactions as in solution. Here we investigate the applicability of this assumption by extending our previous structural prediction protocol for single proteins in ESI-MS to protein complexes. We apply our protocol to the human insulin dimer (hIns2) as a test case. Our calculations reproduce the main charge and the collision cross section (CCS) measured in ESI-MS experiments. Molecular dynamics simulations for 0.075 ms show that the complex maximizes intermolecular non-bonded interactions relative to the structure in water, without affecting the cross section. The overall gas-phase structure of hIns2 does exhibit differences with the one in aqueous solution, not inferable from a comparison with calculated CCS. Hence, care should be exerted when interpreting ESI-MS proteomics data based solely on NMR and/or X-ray structural information. PMID:25210764

  7. Predicting adsorption of aromatic compounds by carbon nanotubes based on quantitative structure property relationship principles

    NASA Astrophysics Data System (ADS)

    Rahimi-Nasrabadi, Mehdi; Akhoondi, Reza; Pourmortazavi, Seied Mahdi; Ahmadi, Farhad

    2015-11-01

    Quantitative structure property relationship (QSPR) models were developed to predict the adsorption of aromatic compounds by carbon nanotubes (CNTs). Five descriptors chosen by combining self-organizing map and stepwise multiple linear regression (MLR) techniques were used to connect the structure of the studied chemicals with their adsorption descriptor (K∞) using linear and nonlinear modeling techniques. Correlation coefficient (R2) of 0.99 and root-mean square error (RMSE) of 0.29 for multilayered perceptron neural network (MLP-NN) model are signs of the superiority of the developed nonlinear model over MLR model with R2 of 0.93 and RMSE of 0.36. The results of cross-validation test showed the reliability of MLP-NN to predict the K∞ values for the aromatic contaminants. Molar volume and hydrogen bond accepting ability were found to be the factors much influencing the adsorption of the compounds. The developed QSPR, as a neural network based model, could be used to predict the adsorption of organic compounds by CNTs.

  8. A nonlinear viscoelastic approach to durability predictions for polymer based composite structures

    NASA Technical Reports Server (NTRS)

    Brinson, Hal F.; Hiel, C. C.

    1990-01-01

    Current industry approaches for the durability assessment of metallic structures are briefly reviewed. For polymer based composite structures, it is suggested that new approaches must be adopted to include memory or viscoelastic effects which could lead to delayed failures that might not be predicted using current techniques. A durability or accelerated life assessment plan for fiber reinforced plastics (FRP) developed and documented over the last decade or so is reviewed and discussed. Limitations to the plan are outlined and suggestions to remove the limitations are given. These include the development of a finite element code to replace the previously used lamination theory code and the development of new specimen geometries to evaluate delamination failures. The new DCB model is reviewed and results are presented. Finally, it is pointed out that new procedures are needed to determine interfacial properties and current efforts underway to determine such properties are reviewed. Suggestions for additional efforts to develop a consistent and accurate durability predictive approach for FRP structures is outlined.

  9. A strength-based wearout model for predicting the life of composite structures

    SciTech Connect

    Schaff, J.R.; Davidson, B.D.

    1997-12-31

    A model to predict the residual strength and life of polymeric composite structures subjected to spectrum fatigue loadings is described. The model is based on the fundamental assumptions that the structure undergoes proportional loading, that the residual strength is a monotonically decreasing function of the number of fatigue cycles, and that both the life distribution due to continuous constant amplitude cycling and the residual strength distribution after an arbitrary load history may be represented by two parameter Weibull functions. The model also incorporates a cycle mix factor to account for the drastic reduction of fatigue life that may be caused by a large number of changes in the stress amplitude of the loading. The model`s predictions are compared to experimentally determined fatigue life distributions for uniaxial loadings of a number of laminates comprised of different materials and layups. Constant-amplitude, two-stress level, and spectrum fatigue loadings, including the FALSTAFF (Fighter Aircraft Loading Standard for Fatigue) spectrum, are considered. The theoretical fatigue life distributions are shown to correlate well with the experimental results. Moreover, excellent correlation of theory and experiment is obtained for an average fatigue life that is based on the 63.2% probability of failure.

  10. Coupled agent-based and finite-element models for predicting scar structure following myocardial infarction.

    PubMed

    Rouillard, Andrew D; Holmes, Jeffrey W

    2014-08-01

    Following myocardial infarction, damaged muscle is gradually replaced by collagenous scar tissue. The structural and mechanical properties of the scar are critical determinants of heart function, as well as the risk of serious post-infarction complications such as infarct rupture, infarct expansion, and progression to dilated heart failure. A number of therapeutic approaches currently under development aim to alter infarct mechanics in order to reduce complications, such as implantation of mechanical restraint devices, polymer injection, and peri-infarct pacing. Because mechanical stimuli regulate scar remodeling, the long-term consequences of therapies that alter infarct mechanics must be carefully considered. Computational models have the potential to greatly improve our ability to understand and predict how such therapies alter heart structure, mechanics, and function over time. Toward this end, we developed a straightforward method for coupling an agent-based model of scar formation to a finite-element model of tissue mechanics, creating a multi-scale model that captures the dynamic interplay between mechanical loading, scar deformation, and scar material properties. The agent-based component of the coupled model predicts how fibroblasts integrate local chemical, structural, and mechanical cues as they deposit and remodel collagen, while the finite-element component predicts local mechanics at any time point given the current collagen fiber structure and applied loads. We used the coupled model to explore the balance between increasing stiffness due to collagen deposition and increasing wall stress due to infarct thinning and left ventricular dilation during the normal time course of healing in myocardial infarcts, as well as the negative feedback between strain anisotropy and the structural anisotropy it promotes in healing scar. The coupled model reproduced the observed evolution of both collagen fiber structure and regional deformation following coronary

  11. Predictive grain yield models based on canopy structure and structural plasticity

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Structural dimensions, digitally measured on stems and leaves of soybean plants during the first six reproductive growth stages (R1-R6), were used to assess the impact of five management strategies including cropping systems (conventional (C) vs. organic, (O)), tillage (conventional moldboard (C) vs...

  12. The Lung Physiome: merging imaging-based measures with predictive computational models of structure and function

    PubMed Central

    Tawhai, Merryn H; Hoffman, Eric A; Lin, Ching-Long

    2009-01-01

    Global measurements of the lung provided by standard pulmonary function tests do not give insight into the regional basis of lung function and lung disease. Advances in imaging methodologies, computer technologies, and subject-specific simulations are creating new opportunities for studying structure-function relationships in the lung through multi-disciplinary research. The digital Human Lung Atlas is an imaging-based resource compiled from male and female subjects spanning several decades of age. The Atlas comprises both structural and functional measures, and includes computational models derived to match individual subjects for personalized prediction of function. The computational models in the Atlas form part of the Lung Physiome project, which is an international effort to develop integrative models of lung function at all levels of biological organization. The computational models provide mechanistic interpretation of imaging measures; the Atlas provides structural data upon which to base model geometry, and functional data against which to test hypotheses. The example of simulating air flow on a subject-specific basis is considered. Methods for deriving multi-scale models of the airway geometry for individual subjects in the Atlas are outlined, and methods for modeling turbulent flows in the airway are reviewed. PMID:20835982

  13. A scoring function based on solvation thermodynamics for protein structure prediction

    PubMed Central

    Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

    2012-01-01

    We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed.

  14. Engineering protein therapeutics: predictive performances of a structure-based virtual affinity maturation protocol.

    PubMed

    Oberlin, Michael; Kroemer, Romano; Mikol, Vincent; Minoux, Hervé; Tastan, Erdogan; Baurin, Nicolas

    2012-08-27

    The implementation of a structure based virtual affinity maturation protocol and evaluation of its predictivity are presented. The in silico protocol is based on conformational sampling of the interface residues (using the Dead End Elimination/A* algorithm), followed by the estimation of the change of free energy of binding due to a point mutation, applying MM/PBSA calculations. Several implementations of the protocol have been evaluated for 173 mutations in 7 different protein complexes for which experimental data were available: the use of the Boltzamnn averaged predictor based on the free energy of binding (ΔΔG(*)) combined with the one based on its polar component only (ΔΔE(pol*)) led to the proposal of a subset of mutations out of which 45% would have successfully enhanced the binding. When focusing on those mutations that are less likely to be introduced by natural in vivo maturation methods (99 mutations with at least two base changes in the codon), the success rate is increased to 63%. In another evaluation, focusing on 56 alanine scanning mutations, the in silico protocol was able to detect 89% of the hot-spots. PMID:22788756

  15. Toolbox for Protein Structure Prediction.

    PubMed

    Roche, Daniel Barry; McGuffin, Liam James

    2016-01-01

    Protein tertiary structure prediction algorithms aim to predict, from amino acid sequence, the tertiary structure of a protein. In silico protein structure prediction methods have become extremely important, as in vitro-based structural elucidation is unable to keep pace with the current growth of sequence databases due to high-throughput next-generation sequencing, which has exacerbated the gaps in our knowledge between sequences and structures.Here we briefly discuss protein tertiary structure prediction, the biennial competition for the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and its role in shaping the field. We also discuss, in detail, our cutting-edge web-server method IntFOLD2-TS for tertiary structure prediction. Furthermore, we provide a step-by-step guide on using the IntFOLD2-TS web server, along with some real world examples, where the IntFOLD server can and has been used to improve protein tertiary structure prediction and aid in functional elucidation. PMID:26519323

  16. APL: An angle probability list to improve knowledge-based metaheuristics for the three-dimensional protein structure prediction.

    PubMed

    Borguesan, Bruno; Barbachan e Silva, Mariel; Grisci, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio

    2015-12-01

    Tertiary protein structure prediction is one of the most challenging problems in structural bioinformatics. Despite the advances in algorithm development and computational strategies, predicting the folded structure of a protein only from its amino acid sequence remains as an unsolved problem. We present a new computational approach to predict the native-like three-dimensional structure of proteins. Conformational preferences of amino acid residues and secondary structure information were obtained from protein templates stored in the Protein Data Bank and represented as an Angle Probability List. Two knowledge-based prediction methods based on Genetic Algorithms and Particle Swarm Optimization were developed using this information. The proposed method has been tested with twenty-six case studies selected to validate our approach with different classes of proteins and folding patterns. Stereochemical and structural analysis were performed for each predicted three-dimensional structure. Results achieved suggest that the Angle Probability List can improve the effectiveness of metaheuristics used to predicted the three-dimensional structure of protein molecules by reducing its conformational search space. PMID:26495908

  17. Thermodynamic Properties of Asphaltenes: A Predictive Approach Based On Computer Assisted Structure Elucidation and Atomistic Simulations

    SciTech Connect

    Diallo, Mamadou S.; Cagin, Tahir; Faulon, Jean Loup; Goddard, William A.

    2000-08-01

    The authors describe a new methodology for predicting the thermodynamic properties of petroleum geomacromolecules (asphaltenes and resins). This methodology combines computer assisted structure elucidation (CASE) with atomistic simulations (molecular mechanics and molecular dynamics and statistical mechanics). They use quantitative and qualitative structural data as input to a CASE program (SIGNATURE) to generate a sample of ten asphaltene model structures for a Saudi crude oil (Arab Berri). MM calculations and MD simulations are used to estimate selected volumetric and thermal properties of the model structures.

  18. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    PubMed Central

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences. PMID:26788119

  19. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences. PMID:26788119

  20. Prediction of protein secondary structure based on residue pair types and conformational states using dynamic programming algorithm.

    PubMed

    Sadeghi, Mehdi; Parto, Sahar; Arab, Shahriar; Ranjbar, Bijan

    2005-06-20

    We have used a statistical approach for protein secondary structure prediction based on information theory and simultaneously taking into consideration pairwise residue types and conformational states. Since the prediction of residue secondary structure by one residue window sliding make ambiguity in state prediction, we used a dynamic programming algorithm to find the path with maximum score. A score system for residue pairs in particular conformations is derived for adjacent neighbors up to ten residue apart in sequence. The three state overall per-residue accuracy, Q3, of this method in a jackknife test with dataset created from PDBSELECT is more than 70%. PMID:15936021

  1. Protein Tertiary Structure Prediction Based on Main Chain Angle Using a Hybrid Bees Colony Optimization Algorithm

    NASA Astrophysics Data System (ADS)

    Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen

    Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.

  2. Predictions of Crystal Structure Based on Radius Ratio: How Reliable Are They?

    ERIC Educational Resources Information Center

    Nathan, Lawrence C.

    1985-01-01

    Discussion of crystalline solids in undergraduate curricula often includes the use of radius ratio rules as a method for predicting which type of crystal structure is likely to be adopted by a given ionic compound. Examines this topic, establishing more definitive guidelines for the use and reliability of the rules. (JN)

  3. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases.

    PubMed

    Parasuram, Ramya; Mills, Caitlyn L; Wang, Zhouxi; Somasundaram, Saroja; Beuning, Penny J; Ondrechen, Mary Jo

    2016-01-15

    Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectin/glucanase (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein BACOVA_03626 from Bacteroides ovatus (PDB 3ON6) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacillus halodurans (PDB 2RDY), has a predicted active site that matches well with a known α-L-galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0) is shown to have local structural

  4. Guided macro-mutation in a graded energy based genetic algorithm for protein structure prediction.

    PubMed

    Rashid, Mahmood A; Iqbal, Sumaiya; Khatib, Firas; Hoque, Md Tamjidul; Sattar, Abdul

    2016-04-01

    Protein structure prediction is considered as one of the most challenging and computationally intractable combinatorial problem. Thus, the efficient modeling of convoluted search space, the clever use of energy functions, and more importantly, the use of effective sampling algorithms become crucial to address this problem. For protein structure modeling, an off-lattice model provides limited scopes to exercise and evaluate the algorithmic developments due to its astronomically large set of data-points. In contrast, an on-lattice model widens the scopes and permits studying the relatively larger proteins because of its finite set of data-points. In this work, we took the full advantage of an on-lattice model by using a face-centered-cube lattice that has the highest packing density with the maximum degree of freedom. We proposed a graded energy-strategically mixes the Miyazawa-Jernigan (MJ) energy with the hydrophobic-polar (HP) energy-based genetic algorithm (GA) for conformational search. In our application, we introduced a 2×2 HP energy guided macro-mutation operator within the GA to explore the best possible local changes exhaustively. Conversely, the 20×20 MJ energy model-the ultimate objective function of our GA that needs to be minimized-considers the impacts amongst the 20 different amino acids and allow searching the globally acceptable conformations. On a set of benchmark proteins, our proposed approach outperformed state-of-the-art approaches in terms of the free energy levels and the root-mean-square deviations. PMID:26878130

  5. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

    PubMed Central

    Lees, Jonathan G; Janes, Robert W

    2008-01-01

    Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained. PMID:18197968

  6. An economic prediction of refinement coefficients in wavelet-based adaptive methods for electron structure calculations.

    PubMed

    Pipek, János; Nagy, Szilvia

    2013-03-01

    The wave function of a many electron system contains inhomogeneously distributed spatial details, which allows to reduce the number of fine detail wavelets in multiresolution analysis approximations. Finding a method for decimating the unnecessary basis functions plays an essential role in avoiding an exponential increase of computational demand in wavelet-based calculations. We describe an effective prediction algorithm for the next resolution level wavelet coefficients, based on the approximate wave function expanded up to a given level. The prediction results in a reasonable approximation of the wave function and allows to sort out the unnecessary wavelets with a great reliability. PMID:23115109

  7. Target selection for structural genomics based on combining fold recognition and crystallisation prediction methods: application to the human proteome.

    PubMed

    Bray, James E

    2012-03-01

    The objective of this study is to automatically identify regions of the human proteome that are suitable for 3D structure determination by X-ray crystallography and to annotate them according to their likelihood to produce diffraction quality crystals. The results provide a powerful tool for structural genomics laboratories who wish to select human proteins based on the statistical likelihood of crystallisation success. Combining fold recognition and crystallisation prediction algorithms enables the efficient calculation of the crystallisability of the entire human proteome. This novel study estimates that there are approximately 40,000 crystallisable regions in the human proteome. Currently, only 15% of these regions (approx. 6,000 sequences) have been solved to at least 95% sequence identity. The remaining unsolved regions have been categorised into 5 crystallisation classes and an integral membrane protein (IMP) class, based on established structure prediction, crystallisation prediction and transmembrane (TM) helix prediction algorithms. Approximately 750 unsolved regions (2% of the proteome) have been identified as having a PDB fold representative (template) and an 'optimal' likelihood of crystallisation. At the other end of the spectrum, more than 10,500 non-IMP regions with a PDB template are classified as 'very difficult' to crystallise (26%) and almost 2,500 regions (6%) were predicted to contain at least 3 TM helices. The 3D-SPECS (3D Structural Proteomics Explorer with Crystallisation Scores) website contains crystallisation predictions for the entire human proteome and can be found at http://www.bioinformaticsplus.org/3dspecs. PMID:22354707

  8. Damage Prediction and Estimation in Structural Mechanics Based on Data Mining

    SciTech Connect

    Sandhu, S S; Kanapady, R; Tamma, K K; Kamath, C; Kumar, V

    2001-07-23

    Damage in a material includes localized softening or cracks in a structural component due to high operational loads, or the presence of flaws in a structure due to various manufacturing processes. Methods that identify the presence, the location and the severity of damage in the structure are useful for non-destructive evaluation procedures that are typically employed in agile manufacturing and rapid prototyping systems. The current state-of-the art techniques for these inverse problems are computationally intensive or ill conditioned when insufficient data exists. Early work by a number of researchers has shown that data mining techniques can provide a potential solution to this problem. In this paper, they investigate the use of data mining techniques for predicting failure in a variety of 2D and 3D structures using artificial neural networks (ANNs) and decision trees. This work shows that if the correct features are chosen to build the model, and the model is trained on an adequate amount of data, the model can then correctly classify the failure event as well as predict location and severity of the damage in these structures.

  9. Evaluation of machine learning algorithms for treatment outcome prediction in patients with epilepsy based on structural connectome data

    PubMed Central

    Munsell, Brent C.; Wee, Chong-Yaw; Keller, Simon S.; Weber, Bernd; Elger, Christian; da Silva, Laura Angelica Tomaz; Nesland, Travis; Styner, Martin; Shen, Dinggang; Bonilha, Leonardo

    2015-01-01

    The objective of this study is to evaluate machine learning algorithms aimed at predicting surgical treatment outcomes in groups of patients with temporal lobe epilepsy (TLE) using only the structural brain connectome. Specifically, the brain connectome is reconstructed using white matter fiber tracts from presurgical diffusion tensor imaging. To achieve our objective, a two-stage connectome-based prediction framework is developed that gradually selects a small number of abnormal network connections that contribute to the surgical treatment outcome, and in each stage a linear kernel operation is used to further improve the accuracy of the learned classifier. Using a 10-fold cross validation strategy, the first stage in the connectome-based framework is able to separate patients with TLE from normal controls with 80% accuracy, and second stage in the connectome-based framework is able to correctly predict the surgical treatment outcome of patients with TLE with 70% accuracy. Compared to existing state-of-the-art methods that use VBM data, the proposed two-stage connectome-based prediction framework is a suitable alternative with comparable prediction performance. Our results additionally show that machine learning algorithms that exclusively use structural connectome data can predict treatment outcomes in epilepsy with similar accuracy compared with “expert-based” clinical decision. In summary, using the unprecedented information provided in the brain connectome, machine learning algorithms may uncover pathological changes in brain network organization and improve outcome forecasting in the context of epilepsy. PMID:26054876

  10. MicroRNAfold: pre-microRNA secondary structure prediction based on modified NCM model with thermodynamics-based scoring strategy.

    PubMed

    Han, Dianwei; Zhang, Jun; Tang, Guiliang

    2012-01-01

    An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions. Our experimental results show that microRNAfold outperforms the current leading prediction tools in terms of True Negative rate, False Negative rate, Specificity, and Matthews coefficient ratio. PMID:23155762

  11. A Structure Based Model for the Prediction of Phospholipidosis Induction Potential of Small Molecules

    PubMed Central

    Sun, Hongmao; Shahane, Sampada; Xia, Menghang; Austin, Christopher P.; Huang, Ruili

    2012-01-01

    Drug-induced phospholipidosis (PLD), characterized by an intracellular accumulation of phospholipids and formation of concentric lamellar bodies, has raised concerns in the drug discovery community, due to its potential adverse effects. To evaluate the PLD induction potential, 4,161 non-redundant drug-like molecules from the National Institutes of Health Chemical Genomics Center (NCGC) Pharmaceutical Collection (NPC), the Library of Pharmacologically Active Compounds (LOPAC) and the Tocris Biosciences collection were screened in a quantitative high-throughput screening (qHTS) format. The potential of drug-lipid complex formation can be linked directly to the structures of drug molecules, and many PLD inducing drugs were found to share common structural features. Support vector machine (SVM) models were constructed by using customized atom types or Molecular Operating Environment (MOE) 2D descriptors as structural descriptors. Either the compounds from LOPAC or randomly selected from the entire dataset were used as the training set. The impact of training data with biased structural features and the impact of molecule descriptors emphasizing whole-molecule properties or detailed functional groups at the atom level on model performance were analyzed and discussed. Rebalancing strategies were applied to improve the predictive power of the SVM models. Using the under-sampling method, the consensus model using one third of the compounds randomly selected from the data set as the training set achieved high accuracy of 0.90 in predicting the remaining two thirds of the compounds constituting the test set, as measured by the area under the receiver operator characteristic curve (AUC-ROC). PMID:22725677

  12. HMM-based prediction for protein structural motifs' two local properties: solvent accessibility and backbone torsion angles.

    PubMed

    Yu, Jianyong; Xiang, Leijun; Hong, Jiang; Zhang, Weidong

    2013-02-01

    Protein structure prediction is often assisted by predicting one-dimensional structural properties including relative solvent accessibility (RSA) surface and backbone torsion angles (BTA) of residues, and these two properties are continuously varying variables because proteins can move freely in a three-dimensional space. Instead of subdividing them into a few arbitrarily defined states that many popular approaches used, this paper proposes an integrated system for realvalue prediction of protein structural motifs' two local properties, based on the modified Hidden Markov Model that we previously presented. The model was used to capture the relevance of RSA and the dependency of BTA between adjacent residues along the local protein chain in motifs with definite probabilities. These two properties were predicted according to their own probability distribution. The method was applied to a protein fragment library. For nine different classes of motifs, real values of RSA were predicted with mean absolute error (MAE) of 0.122-0.175 and Pearson's correlation coefficient (PCC) of 0.623-0.714 between predicted and actual RSA. Meanwhile, real values of BTA were obtained with MAE of 8.5⁰-29.4⁰ for Φ angles, 11.2⁰-38.5⁰ for ψ angles and PCC of 0.601-0.716 for Φ, 0.597-0.713 for ψ. The results were compared with well-known Real-SPINE Server, and indicate the proposed method may at least serve as the foundation to obtain better local properties from structural motifs for protein structure prediction. PMID:22894152

  13. Altered sphingoid base profiles predict compromised membrane structure and permeability in atopic dermatitis

    PubMed Central

    Loiseau, Nicolas; Obata, Yasuko; Moradian, Sam; Sano, Hiromu; Yoshino, Saeko; Aburai, Kenichi; Takayama, Kozo; Sakamoto, Kazutami; Holleran, Walter M.; Elias, Peter M.; Uchida, Yoshikazu

    2013-01-01

    Background Ceramide hydrolysis by ceramidase in the stratum corneum (SC) yields both sphingoid bases and free fatty acids (FFA). While FFA are key constituents of the lamellar bilayers that mediate the epidermal permeability barrier, whether sphingoid bases influence permeability barrier homeostasis remains unknown. Pertinently, alterations of lipid profile, including ceramide and ceramidase activities occur in atopic dermatitis (AD). Object We investigated alterations in sphingoid base levels and/or profiles (sphingosine to sphinganine ratio) in the SC of normal vs. AD mice, a model that faithfully replicates human AD, and then whether altered sphingoid base levels and/or profiles influence(s) membrane stability and/or structures. Methods Unilamellar vesicles (LV), incorporating the three major SC lipids (ceramides/FFA/cholesterol) and different ratios of sphingosine/sphinganine, encapsulating carboxyfluorescein, were used as the model of SC lipids. Membrane stability was measured as release of carboxyfluorescein. Thermal analysis of LV was conducted by Differential scanning calorimetry (DSC). Results LV containing AD levels of sphingosine/sphinganine (AD-LV) displayed altered membrane permeability vs. normal-LV. DSC analyses revealed decreases in orthorhombic structures that form tightly-packed lamellar structures in AD-LV. Conclusion Sphingoid base composition influences lamellar membrane architecture in SC, suggesting that altered sphingoid base profiles could contribute to the barrier abnormality in AD. PMID:24070864

  14. Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes

    PubMed Central

    Lam, Phuc Vinh Nguyen; Goldman, Radoslav; Karagiannis, Konstantinos; Narsule, Tejas; Simonyan, Vahan; Soika, Valerii; Mazumder, Raja

    2013-01-01

    The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharomyces cerevisiae. Our analysis shows that 78% of all asparagines of NXS/T motif involved in N-glycosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribution across the secondary structural elements, indicating that the NXS/T motif in itself is not biologically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://hive.biochemistry.gwu.edu/tools/sfat. PMID:23459159

  15. Ab Initio Based 2D Continuum Mechanics - Sensitivity Prediction for Contact Resonance Atomic Force Microscopy Based Structure Fingerprints

    NASA Astrophysics Data System (ADS)

    Tu, Qing; Lange, Björn; Lopes, J. Marcelo J.; Zauscher, Stefan; Blum, Volker

    Contact resonance AFM is demonstrated as a powerful tool for mapping differences in the mechanical properties of 2D materials and heterostructures, permitting to resolve surface and subsurface structural differences of different domains. Measured contact resonance frequencies are related to the contact stiffness of the combined tip-sample system. Based on first principles predicted elastic properties and a continuum approach to model the mechanical impedance, we find contact stiffness ratios between different domains of few-layer graphene on 3C-SiC(111) in excellent agreement with experiment. We next demonstrate that the approach is able to quantitatively resolve differences between other 2D materials domains, e.g., for h-BN, MoS2 and MoO3 on graphene on SiC. We show that the combined effect of several materials parameters, especially the in-plane elastic properties and the layer thickness, determines the contact stiffness, therefore boosting the sensitivity even if the out-of-plane elastic properties are similar.

  16. 3D Structure Prediction of Human β1-Adrenergic Receptor via Threading-Based Homology Modeling for Implications in Structure-Based Drug Designing

    PubMed Central

    Ul-Haq, Zaheer; Saeed, Maria; Halim, Sobia Ahsan; Khan, Waqasuddin

    2015-01-01

    Dilated cardiomyopathy is a disease of left ventricular dysfunction accompanied by impairment of the β1-adrenergic receptor (β1-AR) signal cascade. The disturbed β1-AR function may be based on an elevated sympathetic tone observed in patients with heart failure. Prolonged adrenergic stimulation may induce metabolic and electrophysiological disturbances in the myocardium, resulting in tachyarrhythmia that leads to the development of heart failure in human and sudden death. Hence, β1-AR is considered as a promising drug target but attempts to develop effective and specific drug against this tempting pharmaceutical target is slowed down due to the lack of 3D structure of Homo sapiens β1-AR (hsβADR1). This study encompasses elucidation of 3D structural and physicochemical properties of hsβADR1 via threading-based homology modeling. Furthermore, the docking performance of several docking programs including Surflex-Dock, FRED, and GOLD were validated by re-docking and cross-docking experiments. GOLD and Surflex-Dock performed best in re-docking and cross docking experiments, respectively. Consequently, Surflex-Dock was used to predict the binding modes of four hsβADR1 agonists. This study provides clear understanding of hsβADR1 structure and its binding mechanism, thus help in providing the remedial solutions of cardiovascular, effective treatment of asthma and other diseases caused by malfunctioning of the target protein. PMID:25860348

  17. 3D structure prediction of human β1-adrenergic receptor via threading-based homology modeling for implications in structure-based drug designing.

    PubMed

    Ul-Haq, Zaheer; Saeed, Maria; Halim, Sobia Ahsan; Khan, Waqasuddin

    2015-01-01

    Dilated cardiomyopathy is a disease of left ventricular dysfunction accompanied by impairment of the β1-adrenergic receptor (β1-AR) signal cascade. The disturbed β1-AR function may be based on an elevated sympathetic tone observed in patients with heart failure. Prolonged adrenergic stimulation may induce metabolic and electrophysiological disturbances in the myocardium, resulting in tachyarrhythmia that leads to the development of heart failure in human and sudden death. Hence, β1-AR is considered as a promising drug target but attempts to develop effective and specific drug against this tempting pharmaceutical target is slowed down due to the lack of 3D structure of Homo sapiens β1-AR (hsβADR1). This study encompasses elucidation of 3D structural and physicochemical properties of hsβADR1 via threading-based homology modeling. Furthermore, the docking performance of several docking programs including Surflex-Dock, FRED, and GOLD were validated by re-docking and cross-docking experiments. GOLD and Surflex-Dock performed best in re-docking and cross docking experiments, respectively. Consequently, Surflex-Dock was used to predict the binding modes of four hsβADR1 agonists. This study provides clear understanding of hsβADR1 structure and its binding mechanism, thus help in providing the remedial solutions of cardiovascular, effective treatment of asthma and other diseases caused by malfunctioning of the target protein. PMID:25860348

  18. Structural Bioinformatics-Based Prediction of Exceptional Selectivity of p38 MAP Kinase Inhibitor PH-797804

    SciTech Connect

    Xing, Li; Shieh, Huey S.; Selness, Shaun R.; Devraj, Rajesh V.; Walker, John K.; Devadas, Balekudru; Hope, Heidi R.; Compton, Robert P.; Schindler, John F.; Hirsch, Jeffrey L.; Benson, Alan G.; Kurumbail, Ravi G.; Stegeman, Roderick A.; Williams, Jennifer M.; Broadus, Richard M.; Walden, Zara; Monahan, Joseph B.; Pfizer

    2009-07-24

    PH-797804 is a diarylpyridinone inhibitor of p38{alpha} mitogen-activated protein (MAP) kinase derived from a racemic mixture as the more potent atropisomer (aS), first proposed by molecular modeling and subsequently confirmed by experiments. On the basis of structural comparison with a different biaryl pyrazole template and supported by dozens of high-resolution crystal structures of p38{alpha} inhibitor complexes, PH-797804 is predicted to possess a high level of specificity across the broad human kinase genome. We used a structural bioinformatics approach to identify two selectivity elements encoded by the TXXXG sequence motif on the p38{alpha} kinase hinge: (i) Thr106 that serves as the gatekeeper to the buried hydrophobic pocket occupied by 2,4-difluorophenyl of PH-797804 and (ii) the bidentate hydrogen bonds formed by the pyridinone moiety with the kinase hinge requiring an induced 180{sup o} rotation of the Met109-Gly110 peptide bond. The peptide flip occurs in p38{alpha} kinase due to the critical glycine residue marked by its conformational flexibility. Kinome-wide sequence mining revealed rare presentation of the selectivity motif. Corroboratively, PH-797804 exhibited exceptionally high specificity against MAP kinases and the related kinases. No cross-reactivity was observed in large panels of kinase screens (selectivity ratio of >500-fold). In cellular assays, PH-797804 demonstrated superior potency and selectivity consistent with the biochemical measurements. PH-797804 has met safety criteria in human phase I studies and is under clinical development for several inflammatory conditions. Understanding the rationale for selectivity at the molecular level helps elucidate the biological function and design of specific p38{alpha} kinase inhibitors.

  19. Predicting Surgery Targets in Temporal Lobe Epilepsy through Structural Connectome Based Simulations

    PubMed Central

    Hutchings, Frances; Han, Cheol E.; Keller, Simon S.; Weber, Bernd; Taylor, Peter N.; Kaiser, Marcus

    2015-01-01

    Temporal lobe epilepsy (TLE) is a prevalent neurological disorder resulting in disruptive seizures. In the case of drug resistant epilepsy resective surgery is often considered. This is a procedure hampered by unpredictable success rates, with many patients continuing to have seizures even after surgery. In this study we apply a computational model of epilepsy to patient specific structural connectivity derived from diffusion tensor imaging (DTI) of 22 individuals with left TLE and 39 healthy controls. We validate the model by examining patient-control differences in simulated seizure onset time and network location. We then investigate the potential of the model for surgery prediction by performing in silico surgical resections, removing nodes from patient networks and comparing seizure likelihood post-surgery to pre-surgery simulations. We find that, first, patients tend to transit from non-epileptic to epileptic states more often than controls in the model. Second, regions in the left hemisphere (particularly within temporal and subcortical regions) that are known to be involved in TLE are the most frequent starting points for seizures in patients in the model. In addition, our analysis also implicates regions in the contralateral and frontal locations which may play a role in seizure spreading or surgery resistance. Finally, the model predicts that patient-specific surgery (resection areas chosen on an individual, model-prompted, basis and not following a predefined procedure) may lead to better outcomes than the currently used routine clinical procedure. Taken together this work provides a first step towards patient specific computational modelling of epilepsy surgery in order to inform treatment strategies in individuals. PMID:26657566

  20. Predicting Surgery Targets in Temporal Lobe Epilepsy through Structural Connectome Based Simulations.

    PubMed

    Hutchings, Frances; Han, Cheol E; Keller, Simon S; Weber, Bernd; Taylor, Peter N; Kaiser, Marcus

    2015-12-01

    Temporal lobe epilepsy (TLE) is a prevalent neurological disorder resulting in disruptive seizures. In the case of drug resistant epilepsy resective surgery is often considered. This is a procedure hampered by unpredictable success rates, with many patients continuing to have seizures even after surgery. In this study we apply a computational model of epilepsy to patient specific structural connectivity derived from diffusion tensor imaging (DTI) of 22 individuals with left TLE and 39 healthy controls. We validate the model by examining patient-control differences in simulated seizure onset time and network location. We then investigate the potential of the model for surgery prediction by performing in silico surgical resections, removing nodes from patient networks and comparing seizure likelihood post-surgery to pre-surgery simulations. We find that, first, patients tend to transit from non-epileptic to epileptic states more often than controls in the model. Second, regions in the left hemisphere (particularly within temporal and subcortical regions) that are known to be involved in TLE are the most frequent starting points for seizures in patients in the model. In addition, our analysis also implicates regions in the contralateral and frontal locations which may play a role in seizure spreading or surgery resistance. Finally, the model predicts that patient-specific surgery (resection areas chosen on an individual, model-prompted, basis and not following a predefined procedure) may lead to better outcomes than the currently used routine clinical procedure. Taken together this work provides a first step towards patient specific computational modelling of epilepsy surgery in order to inform treatment strategies in individuals. PMID:26657566

  1. A Multi-Objective Approach for Protein Structure Prediction Based on an Energy Model and Backbone Angle Preferences

    PubMed Central

    Tsay, Jyh-Jong; Su, Shih-Chieh; Yu, Chin-Sheng

    2015-01-01

    Protein structure prediction (PSP) is concerned with the prediction of protein tertiary structure from primary structure and is a challenging calculation problem. After decades of research effort, numerous solutions have been proposed for optimisation methods based on energy models. However, further investigation and improvement is still needed to increase the accuracy and similarity of structures. This study presents a novel backbone angle preference factor, which is one of the factors inducing protein folding. The proposed multiobjective optimisation approach simultaneously considers energy models and backbone angle preferences to solve the ab initio PSP. To prove the effectiveness of the multiobjective optimisation approach based on the energy models and backbone angle preferences, 75 amino acid sequences with lengths ranging from 22 to 88 amino acids were selected from the CB513 data set to be the benchmarks. The data sets were highly dissimilar, therefore indicating that they are meaningful. The experimental results showed that the root-mean-square deviation (RMSD) of the multiobjective optimization approach based on energy model and backbone angle preferences was superior to those of typical energy models, indicating that the proposed approach can facilitate the ab initio PSP. PMID:26151847

  2. Structure-Based Prediction of Unstable Regions in Proteins: Applications to Protein Misfolding Diseases

    NASA Astrophysics Data System (ADS)

    Guest, Will; Cashman, Neil; Plotkin, Steven

    2009-03-01

    Protein misfolding is a necessary step in the pathogenesis of many diseases, including Creutzfeldt-Jakob disease (CJD) and familial amyotrophic lateral sclerosis (fALS). Identifying unstable structural elements in their causative proteins elucidates the early events of misfolding and presents targets for inhibition of the disease process. An algorithm was developed to calculate the Gibbs free energy of unfolding for all sequence-contiguous regions of a protein using three methods to parameterize energy changes: a modified G=o model, changes in solvent-accessible surface area, and solution of the Poisson-Boltzmann equation. The entropic effects of disulfide bonds and post-translational modifications are treated analytically. It incorporates a novel method for finding local dielectric constants inside a protein to accurately handle charge effects. We have predicted the unstable parts of prion protein and superoxide dismutase 1, the proteins involved in CJD and fALS respectively, and have used these regions as epitopes to prepare antibodies that are specific to the misfolded conformation and show promise as therapeutic agents.

  3. A structure-based Multiple-Instance Learning approach to predicting in vitro transcription factor-DNA interaction

    PubMed Central

    2015-01-01

    Background Understanding the mechanism of transcriptional regulation remains an inspiring stage of molecular biology. Recently, in vitro protein-binding microarray experiments have greatly improved the understanding of transcription factor-DNA interaction. We present a method - MIL3D - which predicts in vitro transcription factor binding by multiple-instance learning with structural properties of DNA. Results Evaluation on in vitro data of twenty mouse transcription factors shows that our method outperforms a method based on simple-instance learning with DNA structural properties, and the widely used k-mer counting method, for nineteen out of twenty of the transcription factors. Our analysis showed that the MIL3D approach can utilize subtle structural similarities when a strong sequence consensus is not available. Conclusion Combining multiple-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks. PMID:25917392

  4. Ecotoxicity quantitative structure-activity relationships for alcohol ethoxylate mixtures based on substance-specific toxicity predictions.

    PubMed

    Boeije, G M; Cano, M L; Marshall, S J; Belanger, S E; Van Compernolle, R; Dorn, P B; Gümbel, H; Toy, R; Wind, T

    2006-05-01

    Traditionally, ecotoxicity quantitative structure-activity relationships (QSARs) for alcohol ethoxylate (AE) surfactants have been developed by assigning the measured ecotoxicity for commercial products to the average structures (alkyl chain length and ethoxylate chain length) of these materials. Acute Daphnia magna toxicity tests for binary mixtures indicate that mixtures are more toxic than the individual AE substances corresponding with their average structures (due to the nonlinear relation of toxicity with structure). Consequently, the ecotoxicity value (expressed as effects concentration) attributed to the average structures that are used to develop the existing QSARs is expected to be too low. A new QSAR technique for complex substances, which interprets the mixture toxicity with regard to the "ethoxymers" distribution (i.e., the individual AE components) rather than the average structure, was developed. This new technique was then applied to develop new AE ecotoxicity QSARs for invertebrates, fish, and mesocosms. Despite the higher complexity, the fit and accuracy of the new QSARs are at least as good as those for the existing QSARs based on the same data set. As expected from typical ethoxymer distributions of commercial AEs, the new QSAR generally predicts less toxicity than the QSARs based on average structure. PMID:16256196

  5. Predicting healthy older adult's brain age based on structural connectivity networks using artificial neural networks.

    PubMed

    Lin, Lan; Jin, Cong; Fu, Zhenrong; Zhang, Baiwen; Bin, Guangyu; Wu, Shuicai

    2016-03-01

    Brain ageing is followed by changes of the connectivity of white matter (WM) and changes of the grey matter (GM) concentration. Neurodegenerative disease is more vulnerable to an accelerated brain ageing, which is associated with prospective cognitive decline and disease severity. Accurate detection of accelerated ageing based on brain network analysis has a great potential for early interventions designed to hinder atypical brain changes. To capture the brain ageing, we proposed a novel computational approach for modeling the 112 normal older subjects (aged 50-79 years) brain age by connectivity analyses of networks of the brain. Our proposed method applied principal component analysis (PCA) to reduce the redundancy in network topological parameters. Back propagation artificial neural network (BPANN) improved by hybrid genetic algorithm (GA) and Levenberg-Marquardt (LM) algorithm is established to model the relation among principal components (PCs) and brain age. The predicted brain age is strongly correlated with chronological age (r=0.8). The model has mean absolute error (MAE) of 4.29 years. Therefore, we believe the method can provide a possible way to quantitatively describe the typical and atypical network organization of human brain and serve as a biomarker for presymptomatic detection of neurodegenerative diseases in the future. PMID:26718834

  6. Defect Prediction and Control for Ultra-high-strength Steel Complex Structure in Hot Forming Based on FEM

    NASA Astrophysics Data System (ADS)

    Shang, Xin; Zhou, Jie; Zhuo, Fang; Luo, Yan; Li, Yang

    2015-06-01

    Cracking is the main defect in ultra-high-strength steel (UHSS) forming products. In order to avoid cracking, either adjusting process parameters or changing die's design is usually applied. However, under the condition of forming parts with unreasonable structure design, it makes little difference through the traditional methods of modifying process parameters. In this paper, true stress-strain curves under different strain rates and temperatures are obtained via the hot tensile tests. Then, the material constitutive model of UHSS is introduced into software CAE; this step is used to analyze and predict defects of UHSS hot forming complex structural parts based on FEM. In addition, simulation results of changed structure (open end) are compared with original structure (closed end). The results have shown that both maximum reduction ratio and stress in all directions are sharply reduced, i.e., the forming quality is improved significantly after changing the end structure. Finally, the prediction and control methods of forming defects are verified to be feasible in actual production.

  7. Toward improving the reliability of hydrologic prediction: Model structure uncertainty and its quantification using ensemble-based genetic programming framework

    NASA Astrophysics Data System (ADS)

    Parasuraman, Kamban; Elshorbagy, Amin

    2008-12-01

    Uncertainty analysis is starting to be widely acknowledged as an integral part of hydrological modeling. The conventional treatment of uncertainty analysis in hydrologic modeling is to assume a deterministic model structure, and treat its associated parameters as imperfectly known, thereby neglecting the uncertainty associated with the model structure. In this paper, a modeling framework that can explicitly account for the effect of model structure uncertainty has been proposed. The modeling framework is based on initially generating different realizations of the original data set using a non-parametric bootstrap method, and then exploiting the ability of the self-organizing algorithms, namely genetic programming, to evolve their own model structure for each of the resampled data sets. The resulting ensemble of models is then used to quantify the uncertainty associated with the model structure. The performance of the proposed modeling framework is analyzed with regards to its ability in characterizing the evapotranspiration process at the Southwest Sand Storage facility, located near Ft. McMurray, Alberta. Eddy-covariance-measured actual evapotranspiration is modeled as a function of net radiation, air temperature, ground temperature, relative humidity, and wind speed. Investigating the relation between model complexity, prediction accuracy, and uncertainty, two sets of experiments were carried out by varying the level of mathematical operators that can be used to define the predictand-predictor relationship. While the first set uses just the additive operators, the second set uses both the additive and the multiplicative operators to define the predictand-predictor relationship. The results suggest that increasing the model complexity may lead to better prediction accuracy but at an expense of increasing uncertainty. Compared to the model parameter uncertainty, the relative contribution of model structure uncertainty to the predictive uncertainty of a model is

  8. Quantitative vapor-phase IR intensities and DFT computations to predict absolute IR spectra based on molecular structure: I. Alkanes

    NASA Astrophysics Data System (ADS)

    Williams, Stephen D.; Johnson, Timothy J.; Sharpe, Steven W.; Yavelak, Veronica; Oates, R. P.; Brauer, Carolyn S.

    2013-11-01

    Recently recorded quantitative IR spectra of a variety of gas-phase alkanes are shown to have integrated intensities in both the C3H stretching and C3H bending regions that depend linearly on the molecular size, i.e. the number of C3H bonds. This result is well predicted from CH4 to C15H32 by density functional theory (DFT) computations of IR spectra using Becke's three parameter functional (B3LYP/6-31+G(d,p)). Using the experimental data, a simple model predicting the absolute IR band intensities of alkanes based only on structural formula is proposed: For the C3H stretching band envelope centered near 2930 cm-1 this is given by (km/mol) CH_str=(34±1)×CH-(41±23) where CH is number of C3H bonds in the alkane. The linearity is explained in terms of coordinated motion of methylene groups rather than the summed intensities of autonomous -CH2-units. The effect of alkyl chain length on the intensity of a C3H bending mode is explored and interpreted in terms of conformer distribution. The relative intensity contribution of a methyl mode compared to the total C3H stretch intensity is shown to be linear in the number of methyl groups in the alkane, and can be used to predict quantitative spectra a priori based on structure alone.

  9. Multiscale modeling of interwoven Kevlar fibers based on random walk to predict yarn structural response

    NASA Astrophysics Data System (ADS)

    Recchia, Stephen

    Kevlar is the most common high-end plastic filament yarn used in body armor, tire reinforcement, and wear resistant applications. Kevlar is a trade name for an aramid fiber. These are fibers in which the chain molecules are highly oriented along the fiber axis, so the strength of the chemical bond can be exploited. The bulk material is extruded into filaments that are bound together into yarn, which may be chorded with other materials as in car tires, woven into a fabric, or layered in an epoxy to make composite panels. The high tensile strength to low weight ratio makes this material ideal for designs that decrease weight and inertia, such as automobile tires, body panels, and body armor. For designs that use Kevlar, increasing the strength, or tenacity, to weight ratio would improve performance or reduce cost of all products that are based on this material. This thesis computationally and experimentally investigates the tenacity and stiffness of Kevlar yarns with varying twist ratios. The test boundary conditions were replicated with a geometrically accurate finite element model, resulting in a customized code that can reproduce tortuous filaments in a yarn was developed. The solid model geometry capturing filament tortuosity was implemented through a random walk method of axial geometry creation. A finite element analysis successfully recreated the yarn strength and stiffness dependency observed during the tests. The physics applied in the finite element model was reproduced in an analytical equation that was able to predict the failure strength and strain dependency of twist ratio. The analytical solution can be employed to optimize yarn design for high strength applications.

  10. Assessment of the utility of contact-based restraints in accelerating the prediction of protein structure using molecular dynamics simulations.

    PubMed

    Raval, Alpan; Piana, Stefano; Eastwood, Michael P; Shaw, David E

    2016-01-01

    Molecular dynamics (MD) simulation is a well-established tool for the computational study of protein structure and dynamics, but its application to the important problem of protein structure prediction remains challenging, in part because extremely long timescales can be required to reach the native structure. Here, we examine the extent to which the use of low-resolution information in the form of residue-residue contacts, which can often be inferred from bioinformatics or experimental studies, can accelerate the determination of protein structure in simulation. We incorporated sets of 62, 31, or 15 contact-based restraints in MD simulations of ubiquitin, a benchmark system known to fold to the native state on the millisecond timescale in unrestrained simulations. One-third of the restrained simulations folded to the native state within a few tens of microseconds-a speedup of over an order of magnitude compared with unrestrained simulations and a demonstration of the potential for limited amounts of structural information to accelerate structure determination. Almost all of the remaining ubiquitin simulations reached near-native conformations within a few tens of microseconds, but remained trapped there, apparently due to the restraints. We discuss potential methodological improvements that would facilitate escape from these near-native traps and allow more simulations to quickly reach the native state. Finally, using a target from the Critical Assessment of protein Structure Prediction (CASP) experiment, we show that distance restraints can improve simulation accuracy: In our simulations, restraints stabilized the native state of the protein, enabling a reasonable structural model to be inferred. PMID:26266489

  11. Impact of the subtle differences in MMP-12 structure on Glide-based molecular docking for pose prediction of inhibitors

    NASA Astrophysics Data System (ADS)

    Zhang, Huan; Wang, Yajing; Xu, Feng

    2014-11-01

    Human MMP-12 is involved in many aspects of disease pathology. Substantial efforts have been made to develop MMP-12 inhibitors. However, the mechanism of some MMP-12 inhibitors is still unclear. Recently, the method of molecular modeling was used to explore the mechanism, but selecting the best candidate among the wealth of MMP-12 structures poses a challenge. In this study, we attempted to identify several criteria to predict the most appropriate MMP-12 PDB ID for enzyme-ligand interaction studies based on cross-docking by Glide. Furthermore, the parameters from PDB files such as R-free, resolution, B factor, and the molecular volume of the ligand in the complex can provide useful clues for choosing a suitable approximate initial model for pose prediction for MMP-12 inhibitors. This work might also provide a useful reference for other drug targets.

  12. Finite element prediction of seismic response modification of monumental structures utilizing base isolation

    NASA Astrophysics Data System (ADS)

    Spanos, Konstantinos; Anifantis, Nikolaos; Kakavas, Panayiotis

    2015-05-01

    The analysis of the mechanical behavior of ancient structures is an essential engineering task concerning the preservation of architectural heritage. As many monuments of classical antiquity are located in regions of earthquake activity, the safety assessment of these structures, as well as the selection of possible restoration interventions, requires numerical models capable of correctly representing their seismic response. The work presented herein was part of a research project in which a better understanding of the dynamics of classical column-architrave structures was sought by means of numerical techniques. In this paper, the seismic behavior of ancient monumental structures with multi-drum classical columns is investigated. In particular, the column-architrave classical structure under strong ground excitations was represented by a finite element method. This approach simulates the individual rock blocks as distinct rigid blocks interconnected with slidelines and incorporates seismic isolation dampers under the basement of the structure. Sliding and rocking motions of individual stone blocks and drums are modeled utilizing non-linear frictional contact conditions. The seismic isolation is modeled through the application of pad bearings under the basement of the structure. These pads are interpreted by appropriate rubber and steel layers. Time domain analyses were performed, considering the geometric and material non-linear behavior at the joints and the characteristics of pad bearings. The deformation and failure modes of drum columns subject to seismic excitations of various types and intensities were analyzed. The adverse influence of drum imperfections on structural safety was also examined.

  13. Real-time prediction of atmospheric Lagrangian coherent structures based on forecast data: An application and error analysis

    NASA Astrophysics Data System (ADS)

    BozorgMagham, Amir E.; Ross, Shane D.; Schmale, David G.

    2013-09-01

    The language of Lagrangian coherent structures (LCSs) provides a new means for studying transport and mixing of passive particles advected by an atmospheric flow field. Recent observations suggest that LCSs govern the large-scale atmospheric motion of airborne microorganisms, paving the way for more efficient models and management strategies for the spread of infectious diseases affecting plants, domestic animals, and humans. In addition, having reliable predictions of the timing of hyperbolic LCSs may contribute to improved aerobiological sampling of microorganisms with unmanned aerial vehicles and LCS-based early warning systems. Chaotic atmospheric dynamics lead to unavoidable forecasting errors in the wind velocity field, which compounds errors in LCS forecasting. In this study, we reveal the cumulative effects of errors of (short-term) wind field forecasts on the finite-time Lyapunov exponent (FTLE) fields and the associated LCSs when realistic forecast plans impose certain limits on the forecasting parameters. Objectives of this paper are to (a) quantify the accuracy of prediction of FTLE-LCS features and (b) determine the sensitivity of such predictions to forecasting parameters. Results indicate that forecasts of attracting LCSs exhibit less divergence from the archive-based LCSs than the repelling features. This result is important since attracting LCSs are the backbone of long-lived features in moving fluids. We also show under what circumstances one can trust the forecast results if one merely wants to know if an LCS passed over a region and does not need to precisely know the passage time.

  14. Refinement of modelled structures by knowledge-based energy profiles and secondary structure prediction: application to the human procarboxypeptidase A2.

    PubMed

    Aloy, P; Mas, J M; Martí-Renom, M A; Querol, E; Avilés, F X; Oliva, B

    2000-01-01

    Knowledge-based energy profiles combined with secondary structure prediction have been applied to molecular modelling refinement. To check the procedure, three different models of human procarboxypeptidase A2 (hPCPA2) have been built using the 3D structures of procarboxypeptidase A1 (pPCPA1) and bovine procarboxypeptidase A (bPCPA) as templates. The results of the refinement can be tested against the X-ray structure of hPCPA2 which has been recently determined. Regions miss-modelled in the activation segment of hPCPA2 were detected by means of pseudo-energies using Prosa II and modified afterwards according to the secondary structure prediction. Moreover, models obtained by automated methods as COMPOSER, MODELLER and distance restraints have also been compared, where it was found possible to find out the best model by means of pseudo-energies. Two general conclusions can be elicited from this work: (1) on a given set of putative models it is possible to distinguish among them the one closest to the crystallographic structure, and (2) within a given structure it is possible to find by means of pseudo-energies those regions that have been defectively modelled. PMID:10702927

  15. Sequence and structure-based prediction of fructosyltransferase activity for functional subclassification of fungal GH32 enzymes.

    PubMed

    Trollope, Kim M; van Wyk, Niël; Kotjomela, Momo A; Volschenk, Heinrich

    2015-12-01

    Sucrolytic enzymes catalyse sucrose hydrolysis or the synthesis of fructooligosaccharides (FOSs), a prebiotic in human and animal nutrition. FOS synthesis capacity differs between sucrolytic enzymes. Amino-acid-sequence-based classification of FOS synthesizing enzymes would greatly facilitate the in silico identification of novel catalysts, as large amounts of sequence data lie untapped. The development of a bioinformatics tool to rapidly distinguish between high-level FOSs synthesizing predominantly sucrose hydrolysing enzymes from fungal genomic data is presented. Sequence comparison of functionally characterized enzymes displaying low- and high-level FOS synthesis revealed conserved motifs unique to each group. New light is shed on the sequence context of active site residues in three previously identified conserved motifs. We characterized two enzymes predicted to possess low- and high-level FOS synthesis activities based on their conserved motif sequences. FOS data for the enzymes confirmed our successful prediction of their FOS synthesis capacity. Structural comparison of enzymes displaying low- and high-level FOS synthesis identified steric hindrance between nystose and a long loop region present only in low-level FOS synthesizers. This loop is proposed to limit the synthesis of FOS species with higher degrees of polymerization, a phenomenon observed among enzymes displaying low-level FOS synthesis. Conserved sequence motifs surrounding catalytic residues and a distant structural determinant were identifiers of FOS synthesis capacity and allow for functional annotation of sucrolytic enzymes directly from amino acid sequence. The tool presented may also be useful to study the structure-function relationships of β-fructofuranosidases by identifying mutations present in a group of closely related enzymes displaying similar function. PMID:26426731

  16. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles

    PubMed Central

    Brender, Jeffrey R.; Zhang, Yang

    2015-01-01

    The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533

  17. A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction

    PubMed Central

    2013-01-01

    Background Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured. Methods We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima. Results and conclusions Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be

  18. Static compressive strength prediction of open-hole structure based on non-linear shear behavior and micro-mechanics

    NASA Astrophysics Data System (ADS)

    Li, Wangnan; Cai, Hongneng; Li, Chao

    2014-11-01

    This paper deals with the characterization of the strength of the constituents of carbon fiber reinforced plastic laminate (CFRP), and a prediction of the static compressive strength of open-hole structure of polymer composites. The approach combined with non-linear analysis in macro-level and a linear elastic micromechanical failure analysis in microlevel (non-linear MMF) is proposed to improve the prediction accuracy. A face-centered cubic micromechanics model is constructed to analyze the stresses in fiber and matrix in microlevel. Non-interactive failure criteria are proposed to characterize the strength of fiber and matrix. The non-linear shear behavior of the laminate is studied experimentally, and a novel approach of cubic spline interpolation is used to capture significant non-linear shear behavior of laminate. The user-defined material subroutine UMAT for the non-linear share behavior is developed and combined in the mechanics analysis in the macro-level using the Abaqus Python codes. The failure mechanism and static strength of open-hole compressive (OHC) structure of polymer composites is studied based on non-linear MMF. The UTS50/E51 CFRP is used to demonstrate the application of theory of non-linear MMF.

  19. An efficient fragment-based approach for predicting the ground-state energies and structures of large molecules.

    PubMed

    Li, Shuhua; Li, Wei; Fang, Tao

    2005-05-18

    An efficient fragment-based approach for predicting the ground-state energies and structures of large molecules at the Hartree-Fock (HF) and post-HF levels is described. The physical foundation of this approach is attributed to the "quantum locality" of the electron correlation energy and the HF total energy, which is revealed by a new energy decomposition analysis of the HF total energy proposed in this work. This approach is based on the molecular fractionation with conjugated caps (MFCC) scheme (Zhang, D. W.; Zhang, J. Z. H. J. Chem. Phys. 2003, 119, 3599), by which a macromolecule is partitioned into various capped fragments and conjugated caps formed by two adjacent caps. We find that the MFCC scheme, if corrected by the interaction between non-neighboring fragments, can be used to predict the total energy of large molecules only from energy calculations on a series of small subsystems. The approach, named as energy-corrected MFCC (EC-MFCC), computationally achieves linear scaling with the molecular size. Our test calculations on a broad range of medium- and large molecules demonstrate that this approach is able to reproduce the conventional HF and second-order Moller-Plesset perturbation theory (MP2) energies within a few millihartree in most cases. With the EC-MFCC optimization algorithm described in this work, we have obtained the optimized structures of long oligomers of trans-polyacetylene and BN nanotubes with up to about 400 atoms, which are beyond the reach of traditional computational methods. In addition, the EC-MFCC approach is also applied to estimate the heats of formation for a series of organic compounds. This approach provides an appealing approach alternative to the traditional additivity rules based on either bond or group contributions for the estimation of thermochemical properties. PMID:15884963

  20. Hyperspectral bands prediction based on inter-band spectral correlation structure

    NASA Astrophysics Data System (ADS)

    Ahmed, Ayman M.; Sharkawy, Mohamed El.; Elramly, Salwa H.

    2013-02-01

    Hyperspectral imaging has been widely studied in many applications; notably in climate changes, vegetation, and desert studies. However, such kind of imaging brings a huge amount of data, which requires transmission, processing, and storage resources for both airborne and spaceborne imaging. Compression of hyperspectral data cubes is an effective solution for these problems. Lossless compression of the hyperspectral data usually results in low compression ratio, which may not meet the available resources; on the other hand, lossy compression may give the desired ratio, but with a significant degradation effect on object identification performance of the hyperspectral data. Moreover, most hyperspectral data compression techniques exploits the similarities in spectral dimensions; which requires bands reordering or regrouping, to make use of the spectral redundancy. In this paper, we analyze the spectral cross correlation between bands for AVIRIS and Hyperion hyperspectral data; spectral cross correlation matrix is calculated, assessing the strength of the spectral matrix, we propose new technique to find highly correlated groups of bands in the hyperspectral data cube based on "inter band correlation square", and finally, we propose a new technique of band regrouping based on correlation values weights for different group of bands as network of correlation.

  1. Scientific bases, methods, and results of mathematical simulation and prediction of structure and behavior of petroleum geology systems

    SciTech Connect

    Buryakovsky, L.A. )

    1992-07-01

    This paper reports that the systems approach to geology is both a sophisticated ideology and a scientific method for investigation of very complicated geological systems. As applied to petroleum geology, it includes the methodological base and technology of mathematical simulation used for modeling geological systems: the systems that have been previously investigated and estimated by experimental data and/or field studies. Because geological systems develop in time, it is very important to simulate them as dynamic systems. The main tasks in the systems approach to petroleum geology are the numerical simulation of physical and reservoir properties of rocks, pore (geofluid) pressure in reservoir beds, and hydrocarbon resources. The results of numerical simulation are used for prediction of geological system structure and behavior in both studies and noninvestigated areas.

  2. "Adapted Linear Interaction Energy": A Structure-Based LIE Parametrization for Fast Prediction of Protein-Ligand Affinities.

    PubMed

    Linder, Mats; Ranganathan, Anirudh; Brinck, Tore

    2013-02-12

    We present a structure-based parametrization of the Linear Interaction Energy (LIE) method and show that it allows for the prediction of absolute protein-ligand binding energies. We call the new model "Adapted" LIE (ALIE) because the α and β coefficients are defined by system-dependent descriptors and do therefore not require any empirical γ term. The best formulation attains a mean average deviation of 1.8 kcal/mol for a diverse test set and depends on only one fitted parameter. It is robust with respect to additional fitting and cross-validation. We compare this new approach with standard LIE by Åqvist and co-workers and the LIE + γSASA model (initially suggested by Jorgensen and co-workers) against in-house and external data sets and discuss their applicabilities. PMID:26588766

  3. Water-Regulated Self-Assembly Structure Transformation and Gelation Behavior Prediction Based on a Hydrazide Derivative.

    PubMed

    Li, Yajie; Ran, Xia; Li, Qiuyue; Gao, Qiongqiong; Guo, Lijun

    2016-08-01

    Herein, we report the water-regulated supramolecular self-assembly structure transformation and the predictability of the gelation ability based on an azobenzene derivative bearing a hydrazide group, namely, N-(3,4,5-tributoxyphenyl)-N'-4-[(4-hydroxyphenyl)azophenyl] benzohydrazide (BNB-t4). The regulation effects are demonstrated in the morphological transformation from spherical to lamellar particles then back to spherical in different solvent ratios of n-propanol/water. The self-assembly behavior of BNB-t4 was characterized by minimum gelation concentration, microstructure, thermal, and mechanical stabilities. From the spectroscopy studies, it is suggested that gel formation of BNB-t4 is mainly driven by intermolecular hydrogen bonding, accompanied with the contribution from π-π stacking as well as hydrophobic interactions. The successfully established correlation between the self-assembly behavior and solubility parameters yields a facile way to predict the gelation performance of other molecules in other single or mixed solvents. PMID:27258791

  4. An allometry-based approach for understanding forest structure, predicting tree-size distribution and assessing the degree of disturbance

    PubMed Central

    Anfodillo, Tommaso; Carrer, Marco; Simini, Filippo; Popa, Ionel; Banavar, Jayanth R.; Maritan, Amos

    2013-01-01

    Tree-size distribution is one of the most investigated subjects in plant population biology. The forestry literature reports that tree-size distribution trajectories vary across different stands and/or species, whereas the metabolic scaling theory suggests that the tree number scales universally as −2 power of diameter. Here, we propose a simple functional scaling model in which these two opposing results are reconciled. Basic principles related to crown shape, energy optimization and the finite-size scaling approach were used to define a set of relationships based on a single parameter that allows us to predict the slope of the tree-size distributions in a steady-state condition. We tested the model predictions on four temperate mountain forests. Plots (4 ha each, fully mapped) were selected with different degrees of human disturbance (semi-natural stands versus formerly managed). Results showed that the size distribution range successfully fitted by the model is related to the degree of forest disturbance: in semi-natural forests the range is wide, whereas in formerly managed forests, the agreement with the model is confined to a very restricted range. We argue that simple allometric relationships, at an individual level, shape the structure of the whole forest community. PMID:23193128

  5. Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles.

    PubMed

    Li, Zhixiu; Yang, Yuedong; Faraggi, Eshel; Zhan, Jian; Zhou, Yaoqi

    2014-10-01

    Locating sequences compatible with a protein structural fold is the well-known inverse protein-folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy-optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment-derived sequence profiles and structure-derived energy profiles. SPIN improves over the fragment-derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild-type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single-body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks-lab.org. PMID:24898915

  6. Direct prediction of profiles of sequences compatible to a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles

    PubMed Central

    Li, Zhixiu; Yang, Yuedong; Faraggi, Eshel; Zhan, Jian; Zhou, Yaoqi

    2014-01-01

    Locating sequences compatible to a protein structural fold is the well-known inverse protein-folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy-optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment-derived sequence profiles and structure-derived energy profiles. SPIN improves over the fragment-derived profile by 6.7% (from 23.6% to 30.3%) in sequence identity between predicted and wild-type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significant better balance of hydrophilic and hydrophobic residues at protein surfaces. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single-body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks-lab.org. PMID:24898915

  7. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  8. Vertical Chlorophyll Canopy Structure Affects the Remote Sensing Based Predictability of LAI, Chlorophyll and Leaf Nitrogen in Agricultural Fields

    NASA Astrophysics Data System (ADS)

    Boegh, E.; Houborg, R.; Bienkowski, J.; Braban, C. F.; Dalgaard, T.; van Dijk, N.; Dragosits, U.; Holmes, E.; Magliulo, V.; Schelde, K.; Di Tommasi, P.; Vitale, L.; Theobald, M.; Cellier, P.; Sutton, M.

    2012-12-01

    Leaf nitrogen and leaf surface area influence the exchange of gases between terrestrial ecosystems and the atmosphere, and they play a significant role in the global cycles of carbon, nitrogen and water. Remote sensing can be used to estimate leaf area index (LAI), chlorophyll content (CHL) and leaf nitrogen (N), but methods are often developed using plot-scale data and not verified over extended regions characterized by variations in environmental boundary conditions (soil, atmosphere) and canopy structures. Estimation of N can be indirect due to its association with CHL, however N is also included in pigments such as carotenoids and anthocyanin which have different spectral signatures than CHL. Photosynthesis optimization theory suggests that plants will distribute their N resources in proportion to the light gradient within the canopy. Such vertical variation in CHL and N complicates the evaluation of remote sensing-based methods. Typically remote sensing studies measure CHL of the upper leaf, which is then multiplied by the green LAI to represent canopy chlorophyll content, or random sampling is used. In this study, field measurements and high spatial resolution (10-20 m) remote sensing images acquired from the HRG and HRVIR sensors aboard the SPOT satellites were used to assess the predictability of LAI, CHL and N in five European agricultural landscapes located in Denmark, Scotland (United Kingdom), Poland, The Netherlands and Italy . All satellite images were atmospherically using the 6SV1 model with atmospheric inputs estimated by MODIS and AIRS data. Five spectral vegetation indices (SVIs) were calculated (the Normalized Difference Vegetation index, the Simple Ratio, the Enhanced Vegetation Index-2, the Green Normalized Difference Vegetation Index, and the green Chlorophyll Index), and an image-based inverse canopy radiative transfer modelling system, REGFLEC (REGularized canopy reFLECtance) was applied to each of the five European landscapes. While the

  9. DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry

    PubMed Central

    Chen, Yao Chi; Wright, Jon D.; Lim, Carmay

    2012-01-01

    DR_bind is a web server that automatically predicts DNA-binding residues, given the respective protein structure based on (i) electrostatics, (ii) evolution and (iii) geometry. In contrast to machine-learning methods, DR_bind does not require a training data set or any parameters. It predicts DNA-binding residues by detecting a cluster of conserved, solvent-accessible residues that are electrostatically stabilized upon mutation to Asp−/Glu−. The server requires as input the DNA-binding protein structure in PDB format and outputs a downloadable text file of the predicted DNA-binding residues, a 3D visualization of the predicted residues highlighted in the given protein structure, and a downloadable PyMol script for visualization of the results. Calibration on 83 and 55 non-redundant DNA-bound and DNA-free protein structures yielded a DNA-binding residue prediction accuracy/precision of 90/47% and 88/42%, respectively. Since DR_bind does not require any training using protein–DNA complex structures, it may predict DNA-binding residues in novel structures of DNA-binding proteins resulting from structural genomics projects with no conservation data. The DR_bind server is freely available with no login requirement at http://dnasite.limlab.ibms.sinica.edu.tw. PMID:22661576

  10. Definition of the applicability domains of knowledge-based predictive toxicology expert systems by using a structural fragment-based approach.

    PubMed

    Ellison, Claire M; Enoch, Steven J; Cronin, Mark Td; Madden, Judith C; Judson, Philip

    2009-11-01

    The applicability domain of a (quantitative) structure-activity relationship ([Q]SAR) must be defined, if a model is to be used successfully for toxicity prediction, particularly for regulatory purposes. Previous efforts to set guidelines on the definition of applicability domains have often been biased toward quantitative, rather than qualitative, models. As a result, novel techniques are still required to define the applicability domains of structural alert models and knowledge-based systems. By using Derek for Windows as an example, this study defined the domain for the skin sensitisation structural alert rule-base. This was achieved by fragmenting the molecules within a training set of compounds, then searching the fragments for those created from a test compound. This novel method was able to highlight test chemicals which differed from those in the training set. The information was then used to designate chemicals as being either within or outside the domain of applicability for the structural alert on which that training set was based. PMID:20017582

  11. Interface Structure Prediction from First-Principles

    SciTech Connect

    Zhao, Xin; Shu, Qiang; Nguyen, Manh Cuong; Wang, Yangang; Ji, Min; Xiang, Hongjun; Ho, Kai-Ming; Gong, Xingao; Wang, Cai-Zhuang

    2014-05-08

    Information about the atomic structures at solid–solid interfaces is crucial for understanding and predicting the performance of materials. Due to the complexity of the interfaces, it is very challenging to resolve their atomic structures using either experimental techniques or computer simulations. In this paper, we present an efficient first-principles computational method for interface structure prediction based on an adaptive genetic algorithm. This approach significantly reduces the computational cost, while retaining the accuracy of first-principles prediction. The method is applied to the investigation of both stoichiometric and nonstoichiometric SrTiO3 Σ3(112)[1¯10] grain boundaries with unit cell containing up to 200 atoms. Several novel low-energy structures are discovered, which provide fresh insights into the structure and stability of the grain boundaries.

  12. Geometric prediction structure for multiview video coding

    NASA Astrophysics Data System (ADS)

    Lee, Seok; Wey, Ho-Cheon; Park, Du-Sik

    2010-02-01

    One of the critical issues to successful service of 3D video is how to compress huge amount of multi-view video data efficiently. In this paper, we described about geometric prediction structure for multi-view video coding. By exploiting the geometric relations between each camera pose, we can make prediction pair which maximizes the spatial correlation of each view. To analyze the relationship of each camera pose, we defined the mathematical view center and view distance in 3D space. We calculated virtual center pose by getting mean rotation matrix and mean translation vector. We proposed an algorithm for establishing the geometric prediction structure based on view center and view distance. Using this prediction structure, inter-view prediction is performed to camera pair of maximum spatial correlation. In our prediction structure, we also considered the scalability in coding and transmitting the multi-view videos. Experiments are done using JMVC (Joint Multiview Video Coding) software on MPEG-FTV test sequences. Overall performance of proposed prediction structure is measured in the PSNR and subjective image quality measure such as PSPNR.

  13. Efficient HPLC method development using structure-based database search, physico-chemical prediction and chromatographic simulation.

    PubMed

    Wang, Lin; Zheng, Jinjian; Gong, Xiaoyi; Hartman, Robert; Antonucci, Vincent

    2015-02-01

    Development of a robust HPLC method for pharmaceutical analysis can be very challenging and time-consuming. In our laboratory, we have developed a new workflow leveraging ACD/Labs software tools to improve the performance of HPLC method development. First, we established ACD-based analytical method databases that can be searched by chemical structure similarity. By taking advantage of the existing knowledge of HPLC methods archived in the databases, one can find a good starting point for HPLC method development, or even reuse an existing method as is for a new project. Second, we used the software to predict compound physicochemical properties before running actual experiments to help select appropriate method conditions for targeted screening experiments. Finally, after selecting stationary and mobile phases, we used modeling software to simulate chromatographic separations for optimized temperature and gradient program. The optimized new method was then uploaded to internal databases as knowledge available to assist future method development efforts. Routine implementation of such standardized workflows has the potential to reduce the number of experiments required for method development and facilitate systematic and efficient development of faster, greener and more robust methods leading to greater productivity. In this article, we used Loratadine method development as an example to demonstrate efficient method development using this new workflow. PMID:25481084

  14. TRITIUM RESERVOIR STRUCTURAL PERFORMANCE PREDICTION

    SciTech Connect

    Lam, P.S.; Morgan, M.J

    2005-11-10

    The burst test is used to assess the material performance of tritium reservoirs in the surveillance program in which reservoirs have been in service for extended periods of time. A materials system model and finite element procedure were developed under a Savannah River Site Plant-Directed Research and Development (PDRD) program to predict the structural response under a full range of loading and aged material conditions of the reservoir. The results show that the predicted burst pressure and volume ductility are in good agreement with the actual burst test results for the unexposed units. The material tensile properties used in the calculations were obtained from a curved tensile specimen harvested from a companion reservoir by Electric Discharge Machining (EDM). In the absence of exposed and aged material tensile data, literature data were used for demonstrating the methodology in terms of the helium-3 concentration in the metal and the depth of penetration in the reservoir sidewall. It can be shown that the volume ductility decreases significantly with the presence of tritium and its decay product, helium-3, in the metal, as was observed in the laboratory-controlled burst tests. The model and analytical procedure provides a predictive tool for reservoir structural integrity under aging conditions. It is recommended that benchmark tests and analysis for aged materials be performed. The methodology can be augmented to predict performance for reservoir with flaws.

  15. Characteristics and Prediction of RNA Structure

    PubMed Central

    Zhu, Daming; Zhang, Caiming; Han, Huijian; Crandall, Keith A.

    2014-01-01

    RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding. PMID:25110687

  16. Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction

    PubMed Central

    Rysavy, Steven J; Beck, David AC; Daggett, Valerie

    2014-01-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412

  17. Predicting protein dynamics from structural ensembles

    NASA Astrophysics Data System (ADS)

    Copperman, J.; Guenza, M. G.

    2015-12-01

    The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using experimental NMR conformers as the input structural ensembles, LE4PD predicts quantitatively accurate results, with correlation coefficient ρ = 0.93 to NMR backbone relaxation measurements for the seven proteins. The NMR solution structure derived ensemble and predicted dynamical relaxation is compared with molecular dynamics simulation-derived structural ensembles and LE4PD predictions and is consistent in the time scale of the simulations. The use of the experimental NMR conformers frees the approach from computationally demanding simulations.

  18. A Novel Peptide Binding Prediction Approach for HLA-DR Molecule Based on Sequence and Structural Information

    PubMed Central

    Li, Zhao; Zhao, Yilei; Pan, Gaofeng; Tang, Jijun; Guo, Fei

    2016-01-01

    MHC molecule plays a key role in immunology, and the molecule binding reaction with peptide is an important prerequisite for T cell immunity induced. MHC II molecules do not have conserved residues, so they appear as open grooves. As a consequence, this will increase the difficulty in predicting MHC II molecules binding peptides. In this paper, we aim to propose a novel prediction method for MHC II molecules binding peptides. First, we calculate sequence similarity and structural similarity between different MHC II molecules. Then, we reorder pseudosequences according to descending similarity values and use a weight calculation formula to calculate new pocket profiles. Finally, we use three scoring functions to predict binding cores and evaluate the accuracy of prediction to judge performance of each scoring function. In the experiment, we set a parameter α in the weight formula. By changing α value, we can observe different performances of each scoring function. We compare our method with the best function to some popular prediction methods and ultimately find that our method outperforms them in identifying binding cores of HLA-DR molecules. PMID:27340658

  19. Structure-Based Prediction of Drug Distribution Across the Headgroup and Core Strata of a Phospholipid Bilayer Using Surrogate Phases

    PubMed Central

    2015-01-01

    locations for 27 compounds. The resulting structure-based prediction system for intrabilayer distribution will facilitate more realistic modeling of passive transport and drug interactions with those integral membrane proteins, which have the binding sites located in the bilayer, such as some enzymes, influx and efflux transporters, and receptors. If only overall bilayer accumulation is of interest, the 1-octanol/W P values suffice to model the studied set. PMID:25179490

  20. To Hit or Not to Hit, That Is the Question – Genome-wide Structure-Based Druggability Predictions for Pseudomonas aeruginosa Proteins

    PubMed Central

    Sarkar, Aurijit; Brenk, Ruth

    2015-01-01

    Pseudomonas aeruginosa is a Gram-negative bacterium known to cause opportunistic infections in immune-compromised or immunosuppressed individuals that often prove fatal. New drugs to combat this organism are therefore sought after. To this end, we subjected the gene products of predicted perturbative genes to structure-based druggability predictions using DrugPred. Making this approach suitable for large-scale predictions required the introduction of new methods for calculation of descriptors, development of a workflow to identify suitable pockets in homologous proteins and establishment of criteria to obtain valid druggability predictions based on homologs. We were able to identify 29 perturbative proteins of P. aeruginosa that may contain druggable pockets, including some of them with no or no drug-like inhibitors deposited in ChEMBL. These proteins form promising novel targets for drug discovery against P. aeruginosa. PMID:26360059

  1. To Hit or Not to Hit, That Is the Question - Genome-wide Structure-Based Druggability Predictions for Pseudomonas aeruginosa Proteins.

    PubMed

    Sarkar, Aurijit; Brenk, Ruth

    2015-01-01

    Pseudomonas aeruginosa is a Gram-negative bacterium known to cause opportunistic infections in immune-compromised or immunosuppressed individuals that often prove fatal. New drugs to combat this organism are therefore sought after. To this end, we subjected the gene products of predicted perturbative genes to structure-based druggability predictions using DrugPred. Making this approach suitable for large-scale predictions required the introduction of new methods for calculation of descriptors, development of a workflow to identify suitable pockets in homologous proteins and establishment of criteria to obtain valid druggability predictions based on homologs. We were able to identify 29 perturbative proteins of P. aeruginosa that may contain druggable pockets, including some of them with no or no drug-like inhibitors deposited in ChEMBL. These proteins form promising novel targets for drug discovery against P. aeruginosa. PMID:26360059

  2. FINDSITE-metal: Integrating evolutionary information and machine learning for structure-based metal binding site prediction at the proteome level

    PubMed Central

    Brylinski, Michal; Skolnick, Jeffrey

    2010-01-01

    The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this paper, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal binding sites are detected with the best predicted binding site at rank 1 and within the top 2 ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium and magnesium ions, the binding metal can be predicted with high, typically 70-90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/. PMID:21287609

  3. Protein structural domains: definition and prediction.

    PubMed

    Ezkurdia, Iakes; Tress, Michael L

    2011-11-01

    Recognition and prediction of structural domains in proteins is an important part of structure and function prediction. This unit lists the range of tools available for domain prediction, and describes sequence and structural analysis tools that complement domain prediction methods. Also detailed are the basic domain prediction steps, along with suggested strategies for different protein sequences and potential pitfalls in domain boundary prediction. The difficult problem of domain orientation prediction is also discussed. All the resources necessary for domain boundary prediction are accessible via publicly available Web servers and databases and do not require computational expertise. PMID:22045561

  4. Input-based structure-specific proficiency predicts the neural mechanism of adult L2 syntactic processing.

    PubMed

    Deng, Taiping; Zhou, Huixia; Bi, Hong-Yan; Chen, Baoguo

    2015-06-12

    This study used Event-Related Potentials (ERPs) to explore the role of input-based structure-specific proficiency in L2 syntactic processing, using English subject-verb agreement structures as the stimuli. A pre-test/trainings/post-test paradigm of experimental and control groups was employed, and Chinese speakers who learned English as a second language (L2) participated in the experiment. At pre-test, no ERP component related to the subject-verb agreement structures violations was observed in either group. At training session, the experimental group learned the subject-verb agreement structures, while the control group learned other syntactic structures. After two continuously intensive input trainings, at post-test, a significant P600 component related to the subject-verb agreement structures violations was elicited in the experimental group, but not in the control group. These findings suggest that input training improves structure-specific proficiency, which is reflected in the neural mechanism of L2 syntactic processing. PMID:25838243

  5. RNA secondary structure prediction using soft computing.

    PubMed

    Ray, Shubhra Sankar; Pal, Sankar K

    2013-01-01

    Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned. PMID:23702539

  6. Predicting P-glycoprotein-mediated drug transport based on support vector machine and three-dimensional crystal structure of P-glycoprotein.

    PubMed

    Bikadi, Zsolt; Hazai, Istvan; Malik, David; Jemnitz, Katalin; Veres, Zsuzsa; Hari, Peter; Ni, Zhanglin; Loo, Tip W; Clarke, David M; Hazai, Eszter; Mao, Qingcheng

    2011-01-01

    Human P-glycoprotein (P-gp) is an ATP-binding cassette multidrug transporter that confers resistance to a wide range of chemotherapeutic agents in cancer cells by active efflux of the drugs from cells. P-gp also plays a key role in limiting oral absorption and brain penetration and in facilitating biliary and renal elimination of structurally diverse drugs. Thus, identification of drugs or new molecular entities to be P-gp substrates is of vital importance for predicting the pharmacokinetics, efficacy, safety, or tissue levels of drugs or drug candidates. At present, publicly available, reliable in silico models predicting P-gp substrates are scarce. In this study, a support vector machine (SVM) method was developed to predict P-gp substrates and P-gp-substrate interactions, based on a training data set of 197 known P-gp substrates and non-substrates collected from the literature. We showed that the SVM method had a prediction accuracy of approximately 80% on an independent external validation data set of 32 compounds. A homology model of human P-gp based on the X-ray structure of mouse P-gp as a template has been constructed. We showed that molecular docking to the P-gp structures successfully predicted the geometry of P-gp-ligand complexes. Our SVM prediction and the molecular docking methods have been integrated into a free web server (http://pgp.althotas.com), which allows the users to predict whether a given compound is a P-gp substrate and how it binds to and interacts with P-gp. Utilization of such a web server may prove valuable for both rational drug design and screening. PMID:21991360

  7. Predicting P-Glycoprotein-Mediated Drug Transport Based On Support Vector Machine and Three-Dimensional Crystal Structure of P-glycoprotein

    PubMed Central

    Bikadi, Zsolt; Hazai, Istvan; Malik, David; Jemnitz, Katalin; Veres, Zsuzsa; Hari, Peter; Ni, Zhanglin; Loo, Tip W.; Clarke, David M.; Hazai, Eszter; Mao, Qingcheng

    2011-01-01

    Human P-glycoprotein (P-gp) is an ATP-binding cassette multidrug transporter that confers resistance to a wide range of chemotherapeutic agents in cancer cells by active efflux of the drugs from cells. P-gp also plays a key role in limiting oral absorption and brain penetration and in facilitating biliary and renal elimination of structurally diverse drugs. Thus, identification of drugs or new molecular entities to be P-gp substrates is of vital importance for predicting the pharmacokinetics, efficacy, safety, or tissue levels of drugs or drug candidates. At present, publicly available, reliable in silico models predicting P-gp substrates are scarce. In this study, a support vector machine (SVM) method was developed to predict P-gp substrates and P-gp-substrate interactions, based on a training data set of 197 known P-gp substrates and non-substrates collected from the literature. We showed that the SVM method had a prediction accuracy of approximately 80% on an independent external validation data set of 32 compounds. A homology model of human P-gp based on the X-ray structure of mouse P-gp as a template has been constructed. We showed that molecular docking to the P-gp structures successfully predicted the geometry of P-gp-ligand complexes. Our SVM prediction and the molecular docking methods have been integrated into a free web server (http://pgp.althotas.com), which allows the users to predict whether a given compound is a P-gp substrate and how it binds to and interacts with P-gp. Utilization of such a web server may prove valuable for both rational drug design and screening. PMID:21991360

  8. Ko Displacement Theory for Structural Shape Predictions

    NASA Technical Reports Server (NTRS)

    Ko, William L.

    2010-01-01

    The development of the Ko displacement theory for predictions of structure deformed shapes was motivated in 2003 by the Helios flying wing, which had a 247-ft (75-m) wing span with wingtip deflections reaching 40 ft (12 m). The Helios flying wing failed in midair in June 2003, creating the need to develop new technology to predict in-flight deformed shapes of unmanned aircraft wings for visual display before the ground-based pilots. Any types of strain sensors installed on a structure can only sense the surface strains, but are incapable to sense the overall deformed shapes of structures. After the invention of the Ko displacement theory, predictions of structure deformed shapes could be achieved by feeding the measured surface strains into the Ko displacement transfer functions for the calculations of out-of-plane deflections and cross sectional rotations at multiple locations for mapping out overall deformed shapes of the structures. The new Ko displacement theory combined with a strain-sensing system thus created a revolutionary new structure- shape-sensing technology.

  9. Reliable resonance assignments of selected residues of proteins with known structure based on empirical NMR chemical shift prediction

    NASA Astrophysics Data System (ADS)

    Li, Da-Wei; Meng, Dan; Brüschweiler, Rafael

    2015-05-01

    A robust NMR resonance assignment method is introduced for proteins whose 3D structure has previously been determined by X-ray crystallography. The goal of the method is to obtain a subset of correct assignments from a parsimonious set of 3D NMR experiments of 15N, 13C labeled proteins. Chemical shifts of sequential residue pairs are predicted from static protein structures using PPM_One, which are then compared with the corresponding experimental shifts. Globally optimized weighted matching identifies the assignments that are robust with respect to small changes in NMR cross-peak positions. The method, termed PASSPORT, is demonstrated for 4 proteins with 100-250 amino acids using 3D NHCA and a 3D CBCA(CO)NH experiments as input producing correct assignments with high reliability for 22% of the residues. The method, which works best for Gly, Ala, Ser, and Thr residues, provides assignments that serve as anchor points for additional assignments by both manual and semi-automated methods or they can be directly used for further studies, e.g. on ligand binding, protein dynamics, or post-translational modification, such as phosphorylation.

  10. Predicting structured metadata from unstructured metadata

    PubMed Central

    Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

    2016-01-01

    Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data—defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. Database URL: http://www.yeastgenome.org/

  11. [Quantitative structure activity relationship models based on heuristic method and gene expression programming for the prediction of the pK(a) values of sulfa drugs].

    PubMed

    Li, Yu-qin; Si, Hong-zong; Xiao, Yu-liang; Liu, Cai-hong; Xia, Cheng-cai; Li, Ke; Qi, Yong-xiu

    2009-05-01

    Quantitative structure-property relationships (QSPR) were developed to predict the pK(a) values of sulfa drugs via heuristic method (HM) and gene expression programming (GEP). The descriptors of 31 sulfa drugs were calculated by the software CODESSA, which can calculate constitutional, topological, geometrical, electrostatic, and quantum chemical descriptors. HM was also used for the preselection of 4 appropriate molecular descriptors. Linear and nonlinear QSPR models were developed based on the HM and GEP separately and two prediction models lead to a good correlation coefficient (R) of 0.90 and 0.95. The two QSPR models are tseful in predicting pK(a) during the discovery of new drugs and providing theory information for studying the new drugs. PMID:19618723

  12. Prediction of binding affinity and efficacy of thyroid hormone receptor ligands using QSAR and structure based modeling methods

    PubMed Central

    Politi, Regina; Rusyn, Ivan; Tropsha, Alexander

    2016-01-01

    The thyroid hormone receptor (THR) is an important member of the nuclear receptor family that can be activated by endocrine disrupting chemicals (EDC). Quantitative Structure-Activity Relationship (QSAR) models have been developed to facilitate the prioritization of THR-mediated EDC for the experimental validation. The largest database of binding affinities available at the time of the study for ligand binding domain (LBD) of THRβ was assembled to generate both continuous and classification QSAR models with an external accuracy of R2=0.55 and CCR=0.76, respectively. In addition, for the first time a QSAR model was developed to predict binding affinities of antagonists inhibiting the interaction of coactivators with the AF-2 domain of THRβ (R2=0.70). Furthermore, molecular docking studies were performed for a set of THRβ ligands (57 agonists and 15 antagonists of LBD, 210 antagonists of the AF-2 domain, supplemented by putative decoys/non-binders) using several THRβ structures retrieved from the Protein Data Bank. We found that two agonist-bound THRβ conformations could effectively discriminate their corresponding ligands from presumed non-binders. Moreover, one of the agonist conformations could discriminate agonists from antagonists. Finally, we have conducted virtual screening of a chemical library compiled by the EPA as part of the Tox21 program to identify potential THRβ-mediated EDCs using both QSAR models and docking. We concluded that the library is unlikely to have any EDC that would bind to the THRβ. Models developed in this study can be employed either to identify environmental chemicals interacting with the THR or, conversely, to eliminate the THR-mediated mechanism of action for chemicals of concern. PMID:25058446

  13. Mechanics based model for predicting structure-induced rolling resistance (SRR) of the tire-pavement system

    NASA Astrophysics Data System (ADS)

    Shakiba, Maryam; Ozer, Hasan; Ziyadi, Mojtaba; Al-Qadi, Imad L.

    2016-05-01

    The structure-induced rolling resistance of pavements, and its impact on vehicle fuel consumption, is investigated in this study. The structural response of pavement causes additional rolling resistance and fuel consumption of vehicles through deformation of pavement and various dissipation mechanisms associated with inelastic material properties and damping. Accurate and computationally efficient models are required to capture these mechanisms and obtain realistic estimates of changes in vehicle fuel consumption. Two mechanistic-based approaches are currently used to calculate vehicle fuel consumption as related to structural rolling resistance: dissipation-induced and deflection-induced methods. The deflection-induced approach is adopted in this study, and realistic representation of pavement-vehicle interactions (PVIs) is incorporated. In addition to considering viscoelastic behavior of asphalt concrete layers, the realistic representation of PVIs in this study includes non-uniform three-dimensional tire contact stresses and dynamic analysis in pavement simulations. The effects of analysis type, tire contact stresses, pavement viscoelastic properties, pavement damping coefficients, vehicle speed, and pavement temperature are then investigated.

  14. A Structured Approach to Sediment Transport Prediction

    NASA Astrophysics Data System (ADS)

    Wilcock, Peter

    2013-04-01

    There are two types of sediment transport problem. One, flow competence, concerns the conditions that initiate motion of grains on the bed surface. The other, transport capacity, concerns the rate at which sediment is transported and involves sediment found locally on the bed as well as sediment delivered from upstream. The two problems can be linked by the critical stress for incipient motion. A model for critical stress is used directly to predict flow competence. The Ashida/Parker similarity hypothesis provides a useful approximation of transport rates and incorporates local sediment effects entirely via the reference stress, a surrogate for critical stress. Although critical stress is key to both predictions, its application is quite different. The difficult problem of wash load - sizes found in transport in quantities much larger than would be predicted by their presence in the bed - makes the distinction clear and challenges any attempt to predict transport rate from a competence-like approach based on hydraulics and bed material alone. The Shields Diagram and a hiding function provide models for critical stress for uni-size and mixed-size sediment. In addition to grain size - both absolute and relative - other factors alter the critical stress of bed material. These include the proportion of fine-grained material, the aging or freshening of bed material via biologically mediated processes, and the development of bed structure at flows close to the critical stress. Although these factors directly influence the prediction of competent flows, their effect on transport rate is less clear. As flow increases, to what extent does bed strengthening through structuring and other mechanisms persist in dampening transport rate? The answer involves the condition of partial transport in which some grains in a size fraction are active and others remain inactive. Tracing of grains in the flume and field provide guidance on the domain of partial transport and thus on the

  15. Practical lessons from protein structure prediction

    PubMed Central

    Ginalski, Krzysztof; Grishin, Nick V.; Godzik, Adam; Rychlewski, Leszek

    2005-01-01

    Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed. PMID:15805122

  16. RNAComposer and RNA 3D structure prediction for nanotechnology.

    PubMed

    Biesiada, Marcin; Pachulska-Wieczorek, Katarzyna; Adamiak, Ryszard W; Purzycka, Katarzyna J

    2016-07-01

    RNAs adopt specific, stable tertiary architectures to perform their activities. Knowledge of RNA tertiary structure is fundamental to understand RNA functions beginning with transcription and ending with turnover. Contrary to advanced RNA secondary structure prediction algorithms, which allow good accuracy when experimental data are integrated into the prediction, tertiary structure prediction of large RNAs still remains a significant challenge. However, the field of RNA tertiary structure prediction is rapidly developing and new computational methods based on different strategies are emerging. RNAComposer is a user-friendly and freely available server for 3D structure prediction of RNA up to 500 nucleotide residues. RNAComposer employs fully automated fragment assembly based on RNA secondary structure specified by the user. Importantly, this method allows incorporation of distance restraints derived from the experimental data to strengthen the 3D predictions. The potential and limitations of RNAComposer are discussed and an application to RNA design for nanotechnology is presented. PMID:27016145

  17. Prediction of binding affinity and efficacy of thyroid hormone receptor ligands using QSAR and structure-based modeling methods

    SciTech Connect

    Politi, Regina; Rusyn, Ivan; Tropsha, Alexander

    2014-10-01

    The thyroid hormone receptor (THR) is an important member of the nuclear receptor family that can be activated by endocrine disrupting chemicals (EDC). Quantitative Structure–Activity Relationship (QSAR) models have been developed to facilitate the prioritization of THR-mediated EDC for the experimental validation. The largest database of binding affinities available at the time of the study for ligand binding domain (LBD) of THRβ was assembled to generate both continuous and classification QSAR models with an external accuracy of R{sup 2} = 0.55 and CCR = 0.76, respectively. In addition, for the first time a QSAR model was developed to predict binding affinities of antagonists inhibiting the interaction of coactivators with the AF-2 domain of THRβ (R{sup 2} = 0.70). Furthermore, molecular docking studies were performed for a set of THRβ ligands (57 agonists and 15 antagonists of LBD, 210 antagonists of the AF-2 domain, supplemented by putative decoys/non-binders) using several THRβ structures retrieved from the Protein Data Bank. We found that two agonist-bound THRβ conformations could effectively discriminate their corresponding ligands from presumed non-binders. Moreover, one of the agonist conformations could discriminate agonists from antagonists. Finally, we have conducted virtual screening of a chemical library compiled by the EPA as part of the Tox21 program to identify potential THRβ-mediated EDCs using both QSAR models and docking. We concluded that the library is unlikely to have any EDC that would bind to the THRβ. Models developed in this study can be employed either to identify environmental chemicals interacting with the THR or, conversely, to eliminate the THR-mediated mechanism of action for chemicals of concern. - Highlights: • This is the largest curated dataset for ligand binding domain (LBD) of the THRβ. • We report the first QSAR model for antagonists of AF-2 domain of THRβ. • A combination of QSAR and docking enables

  18. Knowledge-based fragment binding prediction.

    PubMed

    Tang, Grace W; Altman, Russ B

    2014-04-01

    Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971

  19. DNA Barcoding Identification of Kadsurae Caulis and Spatholobi Caulis Based on Internal Transcribed Spacer 2 Region and Secondary Structure Prediction

    PubMed Central

    Yu, Xiaoxue; Xie, Zhiyong; Wu, Junwei; Tao, Junfei; Xu, Xinjun

    2016-01-01

    Background: Kadsurae Caulis and Spatholobi Caulis have very similar Chinese names. Their commodities were hard to distinguish because their stems were very alike after dried and processed. These two herbal drugs were often mixed in clinical use. Objective: Authenticity assurance is crucial for quality control of herbal drugs. Therefore, it is essential to establish a method for identifying the two herbs. Materials and Methods: In this paper, we used the DNA barcoding technology, based on the internal transcribed spacer 2 (ITS2) regions, to differentiate Kadsurae Caulis and Spatholobi Caulis. Results: The ITS2 of these two herbs were very different. They were successfully differentiated using the DNA barcoding technique. Conclusions: DNA barcoding was a promising and reliable tool for the identification of medicinal plants. It can be a powerful complementary method for traditional authentication. SUMMARY The internal transcribed spacer 2 (ITS2) regions between Kadsurae Caulis and Spatholobi Caulis varied considerably, totally 139 variable sitesSample 1 was not Kadsurae Caulis as it labeled, but it should be Spatholobi Caulis in fact based on ITS2 regionThe secondary structure can also separate Kadsurae Caulis and Spatholobi Caulis effectivelyDNA barcoding provided an accurate and strong prove to identify these two herbs. Abbreviations used: CTAB: hexadecyltrimethylammonium bromide, DNA: deoxyribonucleic acid, ITS2:internal transcribed spacer 2, PCR: polymerase chain reaction PMID:27279702

  20. Topolology-symmetry law of structure of natural titanosilicate micas and related heterophyllosilicates based on the extended OD theory: Structure prediction

    NASA Astrophysics Data System (ADS)

    Belokoneva, E. L.; Topnikova, A. P.; Aksenov, S. M.

    2015-01-01

    A topology-symmetry analysis of the structures in the family of titanosilicate micas and related heterophyllosilicates based on the extended OD theory reveals their kinship with the family of rhodezite, delhayelite, and other minerals that had been analyzed earlier by distinguishing sheets common for all the structures. Like in the family studied earlier, the structural variety of a more complex titanosilicate family is determined by different local symmetries of sheets. Sheets consist of central O layers of edge-sharing octahedra and H layers formed by tetrahedra connected into diortho groups and Ti(Nb,Fe) semioctahedra (octahedra). Three patterns of connection of O and H layers correspond to sheet symmetry P2/ m, P21/ m, and . Various symmetry modes of sheet connection in the structures are analyzed. Hypothetical structures, including structures with a higher degree of disorder, which can be found in nature or obtained by crystal synthesis, are deduced. Factors responsible for structural variety, including the existence of two main sheet varieties (with P2/ m and P21/ m symmetry) are considered a consequence of the difference in the chemism of the mineral formation medium.

  1. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    PubMed

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272

  2. RBO Aleph: leveraging novel information sources for protein structure prediction

    PubMed Central

    Mabrouk, Mahmoud; Putz, Ines; Werner, Tim; Schneider, Michael; Neeb, Moritz; Bartels, Philipp; Brock, Oliver

    2015-01-01

    RBO Aleph is a novel protein structure prediction web server for template-based modeling, protein contact prediction and ab initio structure prediction. The server has a strong emphasis on modeling difficult protein targets for which templates cannot be detected. RBO Aleph's unique features are (i) the use of combined evolutionary and physicochemical information to perform residue–residue contact prediction and (ii) leveraging this contact information effectively in conformational space search. RBO Aleph emerged as one of the leading approaches to ab initio protein structure prediction and contact prediction during the most recent Critical Assessment of Protein Structure Prediction experiment (CASP11, 2014). In addition to RBO Aleph's main focus on ab initio modeling, the server also provides state-of-the-art template-based modeling services. Based on template availability, RBO Aleph switches automatically between template-based modeling and ab initio prediction based on the target protein sequence, facilitating use especially for non-expert users. The RBO Aleph web server offers a range of tools for visualization and data analysis, such as the visualization of predicted models, predicted contacts and the estimated prediction error along the model's backbone. The server is accessible at http://compbio.robotics.tu-berlin.de/rbo_aleph/. PMID:25897112

  3. RBO Aleph: leveraging novel information sources for protein structure prediction.

    PubMed

    Mabrouk, Mahmoud; Putz, Ines; Werner, Tim; Schneider, Michael; Neeb, Moritz; Bartels, Philipp; Brock, Oliver

    2015-07-01

    RBO Aleph is a novel protein structure prediction web server for template-based modeling, protein contact prediction and ab initio structure prediction. The server has a strong emphasis on modeling difficult protein targets for which templates cannot be detected. RBO Aleph's unique features are (i) the use of combined evolutionary and physicochemical information to perform residue-residue contact prediction and (ii) leveraging this contact information effectively in conformational space search. RBO Aleph emerged as one of the leading approaches to ab initio protein structure prediction and contact prediction during the most recent Critical Assessment of Protein Structure Prediction experiment (CASP11, 2014). In addition to RBO Aleph's main focus on ab initio modeling, the server also provides state-of-the-art template-based modeling services. Based on template availability, RBO Aleph switches automatically between template-based modeling and ab initio prediction based on the target protein sequence, facilitating use especially for non-expert users. The RBO Aleph web server offers a range of tools for visualization and data analysis, such as the visualization of predicted models, predicted contacts and the estimated prediction error along the model's backbone. The server is accessible at http://compbio.robotics.tu-berlin.de/rbo_aleph/. PMID:25897112

  4. Vascular endothelial growth factor receptor-2 (VEGFR-2) inhibitors: development and validation of predictive 3-D QSAR models through extensive ligand- and structure-based approaches.

    PubMed

    Ragno, Rino; Ballante, Flavio; Pirolli, Adele; Wickersham, Richard B; Patsilinakos, Alexandros; Hesse, Stéphanie; Perspicace, Enrico; Kirsch, Gilbert

    2015-08-01

    Vascular endothelial growth factor receptor-2, (VEGFR-2), is a key element in angiogenesis, the process by which new blood vessels are formed, and is thus an important pharmaceutical target. Here, 3-D quantitative structure-activity relationship (3-D QSAR) were used to build a quantitative screening and pharmacophore model of the VEGFR-2 receptors for design of inhibitors with improved activities. Most of available experimental data information has been used as training set to derive optimized and fully cross-validated eight mono-probe and a multi-probe quantitative models. Notable is the use of 262 molecules, aligned following both structure-based and ligand-based protocols, as external test set confirming the 3-D QSAR models' predictive capability and their usefulness in design new VEGFR-2 inhibitors. From a survey on literature, this is the first generation of a wide-ranging computational medicinal chemistry application on VEGFR2 inhibitors. PMID:26194852

  5. Predicting missing links via structural similarity

    NASA Astrophysics Data System (ADS)

    Lyu, Guo-Dong; Fan, Chang-Jun; Yu, Lian-Fei; Xiu, Bao-Xin; Zhang, Wei-Ming

    2015-04-01

    Predicting missing links in networks plays a significant role in modern science. On the basis of structural similarity, our paper proposes a new node-similarity-based measure called biased resource allocation (BRA), which is motivated by the resource allocation (RA) measure. Comparisons between BRA and nine well-known node-similarity-based measures on five real networks indicate that BRA performs no worse than RA, which was the best node-similarity-based index in previous researches. Afterwards, based on localPath (LP) and Katz measure, we propose another two improved measures, named Im-LocalPath and Im-Katz respectively. Numerical results show that the prediction accuracy of both Im-LP and Im-Katz measure improve compared with the original LP and Katz measure. Finally, a new path-similarity-based measure and its improved measure, called LYU and Im-LYU measure, are proposed and especially, Im-LYU measure is shown to perform more remarkably than other mentioned measures.

  6. RNA-SSPT: RNA Secondary Structure Prediction Tools.

    PubMed

    Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; Din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

    2013-01-01

    The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes. PMID:24250115

  7. kPROT: a knowledge-based scale for the propensity of residue orientation in transmembrane segments. Application to membrane protein structure prediction.

    PubMed

    Pilpel, Y; Ben-Tal, N; Lancet, D

    1999-12-10

    Modeling of integral membrane proteins and the prediction of their functional sites requires the identification of transmembrane (TM) segments and the determination of their angular orientations. Hydrophobicity scales predict accurately the location of TM helices, but are less accurate in computing angular disposition. Estimating lipid-exposure propensities of the residues from statistics of solved membrane protein structures has the disadvantage of relying on relatively few proteins. As an alternative, we propose here a scale of knowledge-based Propensities for Residue Orientation in Transmembrane segments (kPROT), derived from the analysis of more than 5000 non-redundant protein sequences. We assume that residues that tend to be exposed to the membrane are more frequent in TM segments of single-span proteins, while residues that prefer to be buried in the transmembrane bundle interior are present mainly in multi-span TMs. The kPROT value for each residue is thus defined as the logarithm of the ratio of its proportions in single and multiple TM spans. The scale is refined further by defining it for three discrete sections of the TM segment; namely, extracellular, central, and intracellular. The capacity of the kPROT scale to predict angular helical orientation was compared to that of alternative methods in a benchmark test, using a diversity of multi-span alpha-helical transmembrane proteins with a solved 3D structure. kPROT yielded an average angular error of 41 degrees, significantly lower than that of alternative scales (62 degrees -68 degrees ). The new scale thus provides a useful general tool for modeling and prediction of functional residues in membrane proteins. A WWW server (http://bioinfo.weizmann.ac.il/kPROT) is available for automatic helix orientation prediction with kPROT. PMID:10588897

  8. Prediction of estrogen receptor binding for 58,000 chemicals using an integrated system of a tree-based model with structural alerts.

    PubMed Central

    Hong, Huixiao; Tong, Weida; Fang, Hong; Shi, Leming; Xie, Qian; Wu, Jie; Perkins, Roger; Walker, John D; Branham, William; Sheehan, Daniel M

    2002-01-01

    A number of environmental chemicals, by mimicking natural hormones, can disrupt endocrine function in experimental animals, wildlife, and humans. These chemicals, called "endocrine-disrupting chemicals" (EDCs), are such a scientific and public concern that screening and testing 58,000 chemicals for EDC activities is now statutorily mandated. Computational chemistry tools are important to biologists because they identify chemicals most important for in vitro and in vivo studies. Here we used a computational approach with integration of two rejection filters, a tree-based model, and three structural alerts to predict and prioritize estrogen receptor (ER) ligands. The models were developed using data for 232 structurally diverse chemicals (training set) with a 10(6) range of relative binding affinities (RBAs); we then validated the models by predicting ER RBAs for 463 chemicals that had ER activity data (testing set). The integrated model gave a lower false negative rate than any single component for both training and testing sets. When the integrated model was applied to approximately 58,000 potential EDCs, 80% (approximately 46,000 chemicals) were predicted to have negligible potential (log RBA < -4.5, with log RBA = 2.0 for estradiol) to bind ER. The ability to process large numbers of chemicals to predict inactivity for ER binding and to categorically prioritize the remainder provides one biologic measure to prioritize chemicals for entry into more expensive assays (most chemicals have no biologic data of any kind). The general approach for predicting ER binding reported here may be applied to other receptors and/or reversible binding mechanisms involved in endocrine disruption. PMID:11781162

  9. Local backbone structure prediction of proteins.

    PubMed

    de Brevern, Alexandre G; Benros, Cristina; Gautier, Romain; Valadié, Héléne; Hazout, Serge; Etchebest, Catherine

    2004-01-01

    A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (phi, psi) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288

  10. Evolutionary Structure Prediction of Stoichiometric Compounds

    NASA Astrophysics Data System (ADS)

    Zhu, Qiang; Oganov, Artem

    2014-03-01

    In general, for a given ionic compound AmBn\\ at ambient pressure condition, its stoichiometry reflects the valence state ratio between per chemical specie (i.e., the charges for each anion and cation). However, compounds under high pressure exhibit significantly behavior, compared to those analogs at ambient condition. Here we developed a method to solve the crystal structure prediction problem based on the evolutionary algorithms, which can predict both the stable compounds and their crystal structures at arbitrary P,T-conditions, given just the set of chemical elements. By applying this method to a wide range of binary ionic systems (Na-Cl, Mg-O, Xe-O, Cs-F, etc), we discovered a lot of compounds with brand new stoichimetries which can become thermodynamically stable. Further electronic structure analysis on these novel compounds indicates that several factors can contribute to this extraordinary phenomenon: (1) polyatomic anions; (2) free electron localization; (3) emergence of new valence states; (4) metallization. In particular, part of the results have been confirmed by experiment, which warrants that this approach can play a crucial role in new materials design under extreme pressure conditions. This work is funded by DARPA (Grants No. W31P4Q1210008 and W31P4Q1310005), NSF (EAR-1114313 and DMR-1231586).

  11. Predicting road accidents: Structural time series approach

    NASA Astrophysics Data System (ADS)

    Junus, Noor Wahida Md; Ismail, Mohd Tahir

    2014-07-01

    In this paper, the model for occurrence of road accidents in Malaysia between the years of 1970 to 2010 was developed and throughout this model the number of road accidents have been predicted by using the structural time series approach. The models are developed by using stepwise method and the residual of each step has been analyzed. The accuracy of the model is analyzed by using the mean absolute percentage error (MAPE) and the best model is chosen based on the smallest Akaike information criterion (AIC) value. A structural time series approach found that local linear trend model is the best model to represent the road accidents. This model allows level and slope component to be varied over time. In addition, this approach also provides useful information on improving the conventional time series method.

  12. Comparative analysis of QSAR models for predicting pK(a) of organic oxygen acids and nitrogen bases from molecular structure.

    PubMed

    Yu, Haiying; Kühne, Ralph; Ebert, Ralf-Uwe; Schüürmann, Gerrit

    2010-11-22

    For 1143 organic compounds comprising 580 oxygen acids and 563 nitrogen bases that cover more than 17 orders of experimental pK(a) (from -5.00 to 12.23), the pK(a) prediction performances of ACD, SPARC, and two calibrations of a semiempirical quantum chemical (QC) AM1 approach have been analyzed. The overall root-mean-square errors (rms) for the acids are 0.41, 0.58 (0.42 without ortho-substituted phenols with intramolecular H-bonding), and 0.55 and for the bases are 0.65, 0.70, 1.17, and 1.27 for ACD, SPARC, and both QC methods, respectively. Method-specific performances are discussed in detail for six acid subsets (phenols and aromatic and aliphatic carboxylic acids with different substitution patterns) and nine base subsets (anilines, primary, secondary and tertiary amines, meta/para-substituted and ortho-substituted pyridines, pyrimidines, imidazoles, and quinolines). The results demonstrate an overall better performance for acids than for bases but also a substantial variation across subsets. For the overall best-performing ACD, rms ranges from 0.12 to 1.11 and 0.40 to 1.21 pK(a) units for the acid and base subsets, respectively. With regard to the squared correlation coefficient r², the results are 0.86 to 0.96 (acids) and 0.79 to 0.95 (bases) for ACD, 0.77 to 0.95 (acids) and 0.85 to 0.97 (bases) for SPARC, and 0.64 to 0.87 (acids) and 0.43 to 0.83 (bases) for the QC methods, respectively. Attention is paid to structural and method-specific causes for observed pitfalls. The significant subset dependence of the prediction performances suggests a consensus modeling approach. PMID:21033677

  13. Predictive Models of Primary Tropical Forest Structure from Geomorphometric Variables Based on SRTM in the Tapajós Region, Brazilian Amazon

    PubMed Central

    Bispo, Polyanna da Conceição; dos Santos, João Roberto; Valeriano, Márcio de Morisson; Graça, Paulo Maurício Lima de Alencastro; Balzter, Heiko; França, Helena; Bispo, Pitágoras da Conceição

    2016-01-01

    Surveying primary tropical forest over large regions is challenging. Indirect methods of relating terrain information or other external spatial datasets to forest biophysical parameters can provide forest structural maps at large scales but the inherent uncertainties need to be evaluated fully. The goal of the present study was to evaluate relief characteristics, measured through geomorphometric variables, as predictors of forest structural characteristics such as average tree basal area (BA) and height (H) and average percentage canopy openness (CO). Our hypothesis is that geomorphometric variables are good predictors of the structure of primary tropical forest, even in areas, with low altitude variation. The study was performed at the Tapajós National Forest, located in the Western State of Pará, Brazil. Forty-three plots were sampled. Predictive models for BA, H and CO were parameterized based on geomorphometric variables using multiple linear regression. Validation of the models with nine independent sample plots revealed a Root Mean Square Error (RMSE) of 3.73 m2/ha (20%) for BA, 1.70 m (12%) for H, and 1.78% (21%) for CO. The coefficient of determination between observed and predicted values were r2 = 0.32 for CO, r2 = 0.26 for H and r2 = 0.52 for BA. The models obtained were able to adequately estimate BA and CO. In summary, it can be concluded that relief variables are good predictors of vegetation structure and enable the creation of forest structure maps in primary tropical rainforest with an acceptable uncertainty. PMID:27089013

  14. Predictive Models of Primary Tropical Forest Structure from Geomorphometric Variables Based on SRTM in the Tapajós Region, Brazilian Amazon.

    PubMed

    Bispo, Polyanna da Conceição; Dos Santos, João Roberto; Valeriano, Márcio de Morisson; Graça, Paulo Maurício Lima de Alencastro; Balzter, Heiko; França, Helena; Bispo, Pitágoras da Conceição

    2016-01-01

    Surveying primary tropical forest over large regions is challenging. Indirect methods of relating terrain information or other external spatial datasets to forest biophysical parameters can provide forest structural maps at large scales but the inherent uncertainties need to be evaluated fully. The goal of the present study was to evaluate relief characteristics, measured through geomorphometric variables, as predictors of forest structural characteristics such as average tree basal area (BA) and height (H) and average percentage canopy openness (CO). Our hypothesis is that geomorphometric variables are good predictors of the structure of primary tropical forest, even in areas, with low altitude variation. The study was performed at the Tapajós National Forest, located in the Western State of Pará, Brazil. Forty-three plots were sampled. Predictive models for BA, H and CO were parameterized based on geomorphometric variables using multiple linear regression. Validation of the models with nine independent sample plots revealed a Root Mean Square Error (RMSE) of 3.73 m2/ha (20%) for BA, 1.70 m (12%) for H, and 1.78% (21%) for CO. The coefficient of determination between observed and predicted values were r2 = 0.32 for CO, r2 = 0.26 for H and r2 = 0.52 for BA. The models obtained were able to adequately estimate BA and CO. In summary, it can be concluded that relief variables are good predictors of vegetation structure and enable the creation of forest structure maps in primary tropical rainforest with an acceptable uncertainty. PMID:27089013

  15. Prediction of binary hard-sphere crystal structures.

    PubMed

    Filion, Laura; Dijkstra, Marjolein

    2009-04-01

    We present a method based on a combination of a genetic algorithm and Monte Carlo simulations to predict close-packed crystal structures in hard-core systems. We employ this method to predict the binary crystal structures in a mixture of large and small hard spheres with various stoichiometries and diameter ratios between 0.4 and 0.84. In addition to known binary hard-sphere crystal structures similar to NaCl and AlB2, we predict additional crystal structures with the symmetry of CrB, gammaCuTi, alphaIrV, HgBr2, AuTe2, Ag2Se, and various structures for which an atomic analog was not found. In order to determine the crystal structures at infinite pressures, we calculate the maximum packing density as a function of size ratio for the crystal structures predicted by our GA using a simulated annealing approach. PMID:19518387

  16. Phylogenetic Approaches to Natural Product Structure Prediction

    PubMed Central

    Ziemert, Nadine; Jensen, Paul R.

    2015-01-01

    Phylogenetics is the study of the evolutionary relatedness among groups of organisms. Molecular phylogenetics uses sequence data to infer these relationships for both organisms and the genes they maintain. With the large amount of publicly available sequence data, phylogenetic inference has become increasingly important in all fields of biology. In the case of natural product research, phylogenetic relationships are proving to be highly informative in terms of delineating the architecture and function of the genes involved in secondary metabolite biosynthesis. Polyketide synthases and nonribosomal peptide synthetases provide model examples in which individual domain phylogenies display different predictive capacities, resolving features ranging from substrate specificity to structural motifs associated with the final metabolic product. This chapter provides examples in which phylogeny has proven effective in terms of predicting functional or structural aspects of secondary metabolism. The basics of how to build a reliable phylogenetic tree are explained along with information about programs and tools that can be used for this purpose. Furthermore, it introduces the Natural Product Domain Seeker, a recently developed Web tool that employs phylogenetic logic to classify ketosynthase and condensation domains based on established enzyme architecture and biochemical function. PMID:23084938

  17. Development of a Support Vector Machine-Based System to Predict Whether a Compound Is a Substrate of a Given Drug Transporter Using Its Chemical Structure.

    PubMed

    Ose, Atsushi; Toshimoto, Kota; Ikeda, Kazushi; Maeda, Kazuya; Yoshida, Shuya; Yamashita, Fumiyoshi; Hashida, Mitsuru; Ishida, Takashi; Akiyama, Yutaka; Sugiyama, Yuichi

    2016-07-01

    The aim of this study was to develop an in silico prediction system to assess which of 7 categories of drug transporters (organic anion transporting polypeptide [OATP] 1B1/1B3, multidrug resistance-associated protein [MRP] 2/3/4, organic anion transporter [OAT] 1, OAT3, organic cation transporter [OCT] 1/2/multidrug and toxin extrusion [MATE] 1/2-K, multidrug resistance protein 1 [MDR1], and breast cancer resistance protein [BCRP]) can recognize compounds as substrates using its chemical structure alone. We compiled an internal data set consisting of 260 compounds that are substrates for at least 1 of the 7 categories of drug transporters. Four physicochemical parameters (charge, molecular weight, lipophilicity, and plasma unbound fraction) of each compound were used as the basic descriptors. Furthermore, a greedy algorithm was used to select 3 additional physicochemical descriptors from 731 available descriptors. In addition, transporter nonsubstrates tend not to be in the public domain; we, thus, tried to compile an expert-curated data set of putative nonsubstrates for each transporter using personal opinions of 11 researchers in the field of drug transporters. The best prediction was finally achieved by a support vector machine based on 4 basic and 3 additional descriptors. The model correctly judged that 364 of 412 compounds (internal data set) and 111 of 136 compounds (external data set) were substrates, indicating that this model performs well enough to predict the specificity of transporter substrates. PMID:27262201

  18. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics

  19. Structure Prediction of Membrane Proteins

    NASA Astrophysics Data System (ADS)

    Hu, Xiche

    Membrane proteins play a central role in many cellular and physiological processes. It is estimated that integral membrane proteins make up about 20-30% of the proteome (Krogh et al., 2001b; Stevens and Arkin, 2000; von Heijne, 1999). They are essential mediators of material and information transfer across cell membranes. Their functions include active and passive transport of molecules into and out of cells and organelles; transduction of energy among various forms (light, electrical, and chemical energy); as well as reception and transduction of chemical and electrical signals across membranes (Avdonin, 2005; Bockaert et al., 2002; Pahl, 1999; Rehling et al., 2004; Stack et al., 1995). Identifying these transmembrane (TM) proteins and deciphering their molecular mechanisms, then, is of great importance, particularly as applied to biomedicine. Membrane proteins are the targets of a large number of pharmacologically and toxicologically active substances, and are directly involved in their uptake, metabolism, and clearance (Bettler et al., 1998; Cohen, 2002; Heusser and Jardieu, 1997; Tibes et al., 2005; Xu et al., 2005). Despite the importance of membrane proteins, the knowledge of their high-resolution structures and mechanisms of action has lagged far behind in comparison to that of water-soluble proteins: less than 1% of all three-dimensional structures deposited in the Protein Data Bank are of membrane proteins. This unfortunate disparity stems from difficulties in overexpression and the crystallization of membrane proteins (Grisshammer and Tate, 1995; Michel, 1991).

  20. Protein Structure and Function Prediction Using I-TASSER

    PubMed Central

    Yang, Jianyi; Zhang, Yang

    2016-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386

  1. PredyFlexy: flexibility and local structure prediction from sequence

    PubMed Central

    de Brevern, Alexandre G.; Bornot, Aurélie; Craveur, Pierrick; Etchebest, Catherine; Gelly, Jean-Christophe

    2012-01-01

    Protein structures are necessary for understanding protein function at a molecular level. Dynamics and flexibility of protein structures are also key elements of protein function. So, we have proposed to look at protein flexibility using novel methods: (i) using a structural alphabet and (ii) combining classical X-ray B-factor data and molecular dynamics simulations. First, we established a library composed of structural prototypes (LSPs) to describe protein structure by a limited set of recurring local structures. We developed a prediction method that proposes structural candidates in terms of LSPs and predict protein flexibility along a given sequence. Second, we examine flexibility according to two different descriptors: X-ray B-factors considered as good indicators of flexibility and the root mean square fluctuations, based on molecular dynamics simulations. We then define three flexibility classes and propose a method based on the LSP prediction method for predicting flexibility along the sequence. This method does not resort to sophisticate learning of flexibility but predicts flexibility from average flexibility of predicted local structures. The method is implemented in PredyFlexy web server. Results are similar to those obtained with the most recent, cutting-edge methods based on direct learning of flexibility data conducted with sophisticated algorithms. PredyFlexy can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/predyflexy/. PMID:22689641

  2. Quantifying variances in comparative RNA secondary structure prediction

    PubMed Central

    2013-01-01

    Background With the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances. Results In this paper we explore a number of factors which can contribute to poor RNA secondary structure prediction quality. We establish a quantified relationship between alignment quality and loss of accuracy. Furthermore, we define two new measures to quantify uncertainty in alignment-based structure predictions. One of the measures improves on the “reliability score” reported by PPfold, and considers alignment uncertainty as well as base-pair probabilities. The other measure considers the information entropy for SCFGs over a space of input alignments. Conclusions Our predictive accuracy improves on the PPfold reliability score. We can successfully characterize many of the underlying reasons for and variances in poor prediction. However, there is still variability unaccounted for, which we therefore suggest comes from the RNA secondary structure predictive model itself. PMID:23634662

  3. Nucleosome structure incorporated histone acetylation site prediction in arabidopsis thaliana

    PubMed Central

    2010-01-01

    Abstract Background Acetylation is a crucial post-translational modification for histones, and plays a key role in gene expression regulation. Due to limited data and lack of a clear acetylation consensus sequence, a few researches have focused on prediction of lysine acetylation sites. Several systematic prediction studies have been conducted for human and yeast, but less for Arabidopsis thaliana. Results Concerning the insufficient observation on acetylation site, we analyzed contributions of the peptide-alignment-based distance definition and 3D structure factors in acetylation prediction. We found that traditional structure contributes little to acetylation site prediction. Identified acetylation sites of histones in Arabidopsis thaliana are conserved and cross predictable with that of human by peptide based methods. However, the predicted specificity is overestimated, because of the existence of non-observed acetylable site. Here, by performing a complete exploration on the factors that affect the acetylability of lysines in histones, we focused on the relative position of lysine at nucleosome level, and defined a new structure feature to promote the performance in predicting the acetylability of all the histone lysines in A. thaliana. Conclusion We found a new spacial correlated acetylation factor, and defined a ε-N spacial location based feature, which contains five core spacial ellipsoid wired areas. By incorporating the new feature, the performance of predicting the acetylability of all the histone lysines in A. Thaliana was promoted, in which the previous mispredicted acetylable lysines were corrected by comparing to the peptide-based prediction. PMID:21047388

  4. A structural alphabet for local protein structures: improved prediction methods.

    PubMed

    Etchebest, Catherine; Benros, Cristina; Hazout, Serge; de Brevern, Alexandre G

    2005-06-01

    Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%. PMID:15822101

  5. A physical approach to protein structure prediction: CASP4 results

    SciTech Connect

    Crivelli, Silvia; Eskow, Elizabeth; Bader, Brett; Lamberti, Vincent; Byrd, Richard; Schnabel, Robert; Head-Gordon, Teresa

    2001-02-27

    We describe our global optimization method called Stochastic Perturbation with Soft Constraints (SPSC), which uses information from known proteins to predict secondary structure, but not in the tertiary structure predictions or in generating the terms of the physics-based energy function. Our approach is also characterized by the use of an all atom energy function that includes a novel hydrophobic solvation function derived from experiments that shows promising ability for energy discrimination against misfolded structures. We present the results obtained using our SPSC method and energy function for blind prediction in the 4th Critical Assessment of Techniques for Protein Structure Prediction (CASP4) competition, and show that our approach is more effective on targets for which less information from known proteins is available. In fact our SPSC method produced the best prediction for one of the most difficult targets of the competition, a new fold protein of 240 amino acids.

  6. Predicting complex mineral structures using genetic algorithms.

    PubMed

    Mohn, Chris E; Kob, Walter

    2015-10-28

    We show that symmetry-adapted genetic algorithms are capable of finding the ground state of a range of complex crystalline phases including layered- and incommensurate super-structures. This opens the way for the atomistic prediction of complex crystal structures of functional materials and mineral phases. PMID:26441052

  7. Predicting structure in nonsymmetric sparse matrix factorizations

    SciTech Connect

    Gilbert, J.R.; Ng, E.

    1991-12-31

    Many computations on sparse matrices have a phase that predicts the nonzero structure of the output, followed by a phase that actually performs the numerical computation. We study structure prediction for computations that involve nonsymmetric row and column permutations and nonsymmetric or non-square matrices. Our tools are bipartite graphs, matchings, and alternating paths. Our main new result concerns LU factorization with partial pivoting. We show that if a square matrix A has the strong Hall property (i.e., is fully indecomposable) then an upper bound due to George and Ng on the nonzero structure of L + U is as tight as possible. To show this, we prove a crucial result about alternating paths in strong Hall graphs. The alternating-paths theorem seems to be of independent interest: it can also be used to prove related results about structure prediction for QR factorization that are due to Coleman, Edenbrandt, Gilbert, Hare, Johnson, Olesky, Pothen, and van den Driessche.

  8. Predicting structure in nonsymmetric sparse matrix factorizations

    SciTech Connect

    Gilbert, J.R. ); Ng, E. )

    1991-01-01

    Many computations on sparse matrices have a phase that predicts the nonzero structure of the output, followed by a phase that actually performs the numerical computation. We study structure prediction for computations that involve nonsymmetric row and column permutations and nonsymmetric or non-square matrices. Our tools are bipartite graphs, matchings, and alternating paths. Our main new result concerns LU factorization with partial pivoting. We show that if a square matrix A has the strong Hall property (i.e., is fully indecomposable) then an upper bound due to George and Ng on the nonzero structure of L + U is as tight as possible. To show this, we prove a crucial result about alternating paths in strong Hall graphs. The alternating-paths theorem seems to be of independent interest: it can also be used to prove related results about structure prediction for QR factorization that are due to Coleman, Edenbrandt, Gilbert, Hare, Johnson, Olesky, Pothen, and van den Driessche.

  9. Predicting structure in nonsymmetric sparse matrix factorizations

    SciTech Connect

    Gilbert, J.R. ); Ng, E.G. )

    1992-10-01

    Many computations on sparse matrices have a phase that predicts the nonzero structure of the output, followed by a phase that actually performs the numerical computation. We study structure prediction for computations that involve nonsymmetric row and column permutations and nonsymmetric or non-square matrices. Our tools are bipartite graphs, matchings, and alternating paths. Our main new result concerns LU factorization with partial pivoting. We show that if a square matrix A has the strong Hall property (i.e., is fully indecomposable) then an upper bound due to George and Ng on the nonzero structure of L + U is as tight as possible. To show this, we prove a crucial result about alternating paths in strong Hall graphs. The alternating-paths theorem seems to be of independent interest: it can also be used to prove related results about structure prediction for QR factorization that are due to Coleman, Edenbrandt, Gilbert, Hare, Johnson, Olesky, Pothen, and van den Driessche.

  10. Predicting Odor Perceptual Similarity from Odor Structure

    PubMed Central

    Weiss, Tali; Frumin, Idan; Khan, Rehan M.; Sobel, Noam

    2013-01-01

    To understand the brain mechanisms of olfaction we must understand the rules that govern the link between odorant structure and odorant perception. Natural odors are in fact mixtures made of many molecules, and there is currently no method to look at the molecular structure of such odorant-mixtures and predict their smell. In three separate experiments, we asked 139 subjects to rate the pairwise perceptual similarity of 64 odorant-mixtures ranging in size from 4 to 43 mono-molecular components. We then tested alternative models to link odorant-mixture structure to odorant-mixture perceptual similarity. Whereas a model that considered each mono-molecular component of a mixture separately provided a poor prediction of mixture similarity, a model that represented the mixture as a single structural vector provided consistent correlations between predicted and actual perceptual similarity (r≥0.49, p<0.001). An optimized version of this model yielded a correlation of r = 0.85 (p<0.001) between predicted and actual mixture similarity. In other words, we developed an algorithm that can look at the molecular structure of two novel odorant-mixtures, and predict their ensuing perceptual similarity. That this goal was attained using a model that considers the mixtures as a single vector is consistent with a synthetic rather than analytical brain processing mechanism in olfaction. PMID:24068899

  11. A proposed architecture for the central domain of the bacterial enhancer-binding proteins based on secondary structure prediction and fold recognition.

    PubMed Central

    Osuna, J.; Soberón, X.; Morett, E.

    1997-01-01

    The expression of genes transcribed by the RNA polymerase with the alternative sigma factor sigma 54 (E sigma 54) is absolutely dependent on activator proteins that bind to enhancer-like sites, located far upstream from the promoter. These unique prokaryotic proteins, known as enhancer-binding proteins (EBP), mediate open promoter complex formation in a reaction dependent on NTP hydrolysis. The best characterized proteins of this family of regulators are NtrC and NifA, which activate genes required for ammonia assimilation and nitrogen fixation, respectively. In a recent IRBM course (@ontiers of protein structure prediction," IRBM, Pomezia, Italy, 1995; see web site http://www.mrc-cpe.cam.uk/irbm-course95/), one of us (J.O.) participated in the elaboration of the proposal that the Central domain of the EBPs might adopt the classical mononucleotide-binding fold. This suggestion was based on the results of a new protein fold recognition algorithm (Map) and in the mapping of correlated mutations calculated for the sequence family on the same mononucleotide-binding fold topology. In this work, we present new data that support the previous conclusion. The results from a number of different secondary structure prediction programs suggest that the Central domain could adopt an alpha/beta topology. The fold recognition programs ProFIT 0.9, 3D PROFILE combined with secondary structure prediction, and 123D suggest a mononucleotide-binding fold topology for the Central domain amino acid sequence. Finally, and most importantly, three of five reported residue alterations that impair the Central domain. ATPase activity of the E sigma 54 activators are mapped to polypeptide regions that might be playing equivalent roles as those involved in nucleotide-binding in the mononucleotide-binding proteins. Furthermore, the known residue substitution that alter the function of the E sigma 54 activators, leaving intact the Central domain ATPase activity, are mapped on region proposed to

  12. Crystal structure prediction of rigid molecules.

    PubMed

    Elking, Dennis M; Fusti-Molnar, Laszlo; Nichols, Anthony

    2016-08-01

    A non-polarizable force field based on atomic multipoles fit to reproduce experimental crystal properties and ab initio gas-phase dimers is described. The Ewald method is used to calculate both long-range electrostatic and 1/r(6) dispersion energies of crystals. The dispersion energy of a crystal calculated by a cutoff method is shown to converge slowly to the exact Ewald result. A method for constraining space-group symmetry during unit-cell optimization is derived. Results for locally optimizing 4427 unit cells including volume, cell parameters, unit-cell r.m.s.d. and CPU timings are given for both flexible and rigid molecule optimization. An algorithm for randomly generating rigid molecule crystals is described. Using the correct experimentally determined space group, the average and maximum number of random crystals needed to find the correct experimental structure is given for 2440 rigid single component crystals. The force field energy rank of the correct experimental structure is presented for the same set of 2440 rigid single component crystals assuming the correct space group. A complete crystal prediction is performed for two rigid molecules by searching over the 32 most probable space groups. PMID:27484371

  13. Predicting polymeric crystal structures by evolutionary algorithms

    NASA Astrophysics Data System (ADS)

    Zhu, Qiang; Sharma, Vinit; Oganov, Artem R.; Ramprasad, Ramamurthy

    2014-10-01

    The recently developed evolutionary algorithm USPEX proved to be a tool that enables accurate and reliable prediction of structures. Here we extend this method to predict the crystal structure of polymers by constrained evolutionary search, where each monomeric unit is treated as a building block with fixed connectivity. This greatly reduces the search space and allows the initial structure generation with different sequences and packings of these blocks. The new constrained evolutionary algorithm is successfully tested and validated on a diverse range of experimentally known polymers, namely, polyethylene, polyacetylene, poly(glycolic acid), poly(vinyl chloride), poly(oxymethylene), poly(phenylene oxide), and poly (p-phenylene sulfide). By fixing the orientation of polymeric chains, this method can be further extended to predict the structures of complex linear polymers, such as all polymorphs of poly(vinylidene fluoride), nylon-6 and cellulose. The excellent agreement between predicted crystal structures and experimentally known structures assures a major role of this approach in the efficient design of the future polymeric materials.

  14. Data-Based Predictive Control with Multirate Prediction Step

    NASA Technical Reports Server (NTRS)

    Barlow, Jonathan S.

    2010-01-01

    Data-based predictive control is an emerging control method that stems from Model Predictive Control (MPC). MPC computes current control action based on a prediction of the system output a number of time steps into the future and is generally derived from a known model of the system. Data-based predictive control has the advantage of deriving predictive models and controller gains from input-output data. Thus, a controller can be designed from the outputs of complex simulation code or a physical system where no explicit model exists. If the output data happens to be corrupted by periodic disturbances, the designed controller will also have the built-in ability to reject these disturbances without the need to know them. When data-based predictive control is implemented online, it becomes a version of adaptive control. One challenge of MPC is computational requirements increasing with prediction horizon length. This paper develops a closed-loop dynamic output feedback controller that minimizes a multi-step-ahead receding-horizon cost function with multirate prediction step. One result is a reduced influence of prediction horizon and the number of system outputs on the computational requirements of the controller. Another result is an emphasis on portions of the prediction window that are sampled more frequently. A third result is the ability to include more outputs in the feedback path than in the cost function.

  15. Protein Structure Prediction with Evolutionary Algorithms

    SciTech Connect

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.; Smith, J.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  16. Multipass Membrane Protein Structure Prediction Using Rosetta

    PubMed Central

    Yarov-Yarovoy, Vladimir; Schonbrun, Jack; Baker, David

    2006-01-01

    We describe the adaptation of the Rosetta de novo structure prediction method for prediction of helical transmembrane protein structures. The membrane environment is modeled by embedding the protein chain into a model membrane represented by parallel planes defining hydrophobic, interface, and polar membrane layers for each energy evaluation. The optimal embedding is determined by maximizing the exposure of surface hydrophobic residues within the membrane and minimizing hydrophobic exposure outside of the membrane. Protein conformations are built up using the Rosetta fragment assembly method and evaluated using a new membrane-specific version of the Rosetta low-resolution energy function in which residue–residue and residue–environment interactions are functions of the membrane layer in addition to amino acid identity, distance, and density. We find that lower energy and more native-like structures are achieved by sequential addition of helices to a growing chain, which may mimic some aspects of helical protein biogenesis after translocation, rather than folding the whole chain simultaneously as in the Rosetta soluble protein prediction method. In tests on 12 membrane proteins for which the structure is known, between 51 and 145 residues were predicted with root-mean-square deviation <4Å from the native structure. PMID:16372357

  17. A new protein structure representation for efficient protein function prediction.

    PubMed

    Maghawry, Huda A; Mostafa, Mostafa G M; Gharib, Tarek F

    2014-12-01

    One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average. PMID:25343279

  18. Status of research aimed at predicting structural integrity

    SciTech Connect

    Reuter, W.G.

    1997-12-31

    Considerable research has been performed throughout the world on measuring the fracture toughness of metals. The existing capability fills the need encountered when selecting materials, thermal-mechanical treatments, welding procedures, etc., but cannot predict the fracture process of structural components containing cracks. The Idaho National Engineering and Environmental Laboratory and the Massachusetts Institute of Technology have been collaborating for a number of years on developing capabilities for using fracture toughness results to predict structural integrity. Because of the high cost of fabricating and testing structural components, these studies have been limited to predicting the fracture process in specimens containing surface cracks. This paper summarizes the present status of the experimental studies of using fracture toughness data to predict crack growth initiation in specimens (structural components) containing surface cracks. These results are limited to homogeneous base materials.

  19. WeFold: a coopetition for protein structure prediction.

    PubMed

    Khoury, George A; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O; Faccioli, Rodrigo A; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A; Sieradzan, Adam K; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C B; Floudas, Christodoulos A; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A; Skolnick, Jeffrey; Crivelli, Silvia N

    2014-09-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  20. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  1. Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure.

    PubMed

    Zhang, Lichao; Kong, Liang; Han, Xiaodong; Lv, Jinfeng

    2016-07-01

    Protein structural class prediction plays an important role in protein structure and function analysis, drug design and many other biological applications. Extracting good representation from protein sequence is fundamental for this prediction task. In recent years, although several secondary structure based feature extraction strategies have been specially proposed for low-similarity protein sequences, the prediction accuracy still remains limited. To explore the potential of secondary structure information, this study proposed a novel feature extraction method from the chaos game representation of predicted secondary structure to mainly capture sequence order information and secondary structure segments distribution information in a given protein sequence. Several kinds of prediction accuracies obtained by the jackknife test are reported on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640). Compared with the state-of-the-art prediction methods, the proposed method achieves the highest overall accuracies on all the three datasets. The experimental results confirm that the proposed feature extraction method is effective for accurate prediction of protein structural class. Moreover, it is anticipated that the proposed method could be extended to other graphical representations of protein sequence and be helpful in future research. PMID:27084358

  2. Towards cheminformatics-based estimation of drug therapeutic index: Predicting the protective index of anticonvulsants using a new quantitative structure-index relationship approach.

    PubMed

    Chen, Shangying; Zhang, Peng; Liu, Xin; Qin, Chu; Tao, Lin; Zhang, Cheng; Yang, Sheng Yong; Chen, Yu Zong; Chui, Wai Keung

    2016-06-01

    The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature-reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and external validation method. These models showed significantly compromised PI predictive capability due to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using support vector regression (SVR) method with the parameters optimized by using the greedy search method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are primarily associated with the drug-like, pharmacological and toxicological features and those used in the published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating the therapeutic index of drug candidates. PMID:27262528

  3. Fractal structure enables temporal prediction in music.

    PubMed

    Rankin, Summer K; Fink, Philip W; Large, Edward W

    2014-10-01

    1/f serial correlations and statistical self-similarity (fractal structure) have been measured in various dimensions of musical compositions. Musical performances also display 1/f properties in expressive tempo fluctuations, and listeners predict tempo changes when synchronizing. Here the authors show that the 1/f structure is sufficient for listeners to predict the onset times of upcoming musical events. These results reveal what information listeners use to anticipate events in complex, non-isochronous acoustic rhythms, and this will entail innovative models of temporal synchronization. This finding could improve therapies for Parkinson's and related disorders and inform deeper understanding of how endogenous neural rhythms anticipate events in complex, temporally structured communication signals. PMID:25324107

  4. Protein structure prediction enhanced with evolutionary diversity : SPEED.

    SciTech Connect

    DeBartolo, J.; Hocky, G.; Wilde, M.; Xu, J.; Freed, K. F.; Sosnick, T. R.; Univ. of Chicago; Toyota Technological Inst. at Chicago

    2010-03-01

    For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template-based modeling of protein structure and have been incorporated into fragment-based assembly methods. Our previous homology-free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of {phi},{psi} backbone dihedral angles that are obtained from a Protein Data Bank-based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position-resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.

  5. Base Rates, Contingencies, and Prediction Behavior

    ERIC Educational Resources Information Center

    Kareev, Yaakov; Fiedler, Klaus; Avrahami, Judith

    2009-01-01

    A skew in the base rate of upcoming events can often provide a better cue for accurate predictions than a contingency between signals and events. The authors study prediction behavior and test people's sensitivity to both base rate and contingency; they also examine people's ability to compare the benefits of both for prediction. They formalize…

  6. Predicting PDZ domain mediated protein interactions from structure

    PubMed Central

    2013-01-01

    Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on

  7. Structural load prediction methods for space payloads

    NASA Technical Reports Server (NTRS)

    Wada, B. K.

    1982-01-01

    The state of the art in structural loads prediction procedures for spacecraft is summarized. Three categories of prediction techniques delineated by cost, complexity, comprehensiveness, accuracy, and applications are outlined. The lowest cost method has been used for earth resources, communications, and weather satellites, the medium cost method for sun-synchronous orbits and the large space telescope, and the most expensive for planetary missions, comet rendezvous, and out-of-ecliptic orbits, all assuming Shuttle launch. The lowest cost method involves a mass-acceleration curve. A shock spectra technique predicts a least upper bound for loads. A recovered transient method analyzes the interface acceleration of two connected launch vehicles. The most accurate method devised thus far is a transient analysis of the total launch vehicle/payload dynamic system.

  8. Structure Prediction of RNA Loops with a Probabilistic Approach

    PubMed Central

    Li, Jun; Zhang, Jian; Wang, Jun; Li, Wenfei; Wang, Wei

    2016-01-01

    The knowledge of the tertiary structure of RNA loops is important for understanding their functions. In this work we develop an efficient approach named RNApps, specifically designed for predicting the tertiary structure of RNA loops, including hairpin loops, internal loops, and multi-way junction loops. It includes a probabilistic coarse-grained RNA model, an all-atom statistical energy function, a sequential Monte Carlo growth algorithm, and a simulated annealing procedure. The approach is tested with a dataset including nine RNA loops, a 23S ribosomal RNA, and a large dataset containing 876 RNAs. The performance is evaluated and compared with a homology modeling based predictor and an ab initio predictor. It is found that RNApps has comparable performance with the former one and outdoes the latter in terms of structure predictions. The approach holds great promise for accurate and efficient RNA tertiary structure prediction. PMID:27494763

  9. Structure Prediction of RNA Loops with a Probabilistic Approach.

    PubMed

    Li, Jun; Zhang, Jian; Wang, Jun; Li, Wenfei; Wang, Wei

    2016-08-01

    The knowledge of the tertiary structure of RNA loops is important for understanding their functions. In this work we develop an efficient approach named RNApps, specifically designed for predicting the tertiary structure of RNA loops, including hairpin loops, internal loops, and multi-way junction loops. It includes a probabilistic coarse-grained RNA model, an all-atom statistical energy function, a sequential Monte Carlo growth algorithm, and a simulated annealing procedure. The approach is tested with a dataset including nine RNA loops, a 23S ribosomal RNA, and a large dataset containing 876 RNAs. The performance is evaluated and compared with a homology modeling based predictor and an ab initio predictor. It is found that RNApps has comparable performance with the former one and outdoes the latter in terms of structure predictions. The approach holds great promise for accurate and efficient RNA tertiary structure prediction. PMID:27494763

  10. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function. PMID:26718864

  11. Contingency Table Browser - prediction of early stage protein structure.

    PubMed

    Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena

    2015-01-01

    The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity. PMID:26664034

  12. A novel fold recognition method using composite predicted secondary structures.

    PubMed

    An, Yuling; Friesner, Richard A

    2002-08-01

    In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases. PMID:12112702

  13. Adaptive modelling of structured molecular representations for toxicity prediction

    NASA Astrophysics Data System (ADS)

    Bertinetto, Carlo; Duce, Celia; Micheli, Alessio; Solaro, Roberto; Tiné, Maria Rosaria

    2012-12-01

    We investigated the possibility of modelling structure-toxicity relationships by direct treatment of the molecular structure (without using descriptors) through an adaptive model able to retain the appropriate structural information. With respect to traditional descriptor-based approaches, this provides a more general and flexible way to tackle prediction problems that is particularly suitable when little or no background knowledge is available. Our method employs a tree-structured molecular representation, which is processed by a recursive neural network (RNN). To explore the realization of RNN modelling in toxicological problems, we employed a data set containing growth impairment concentrations (IGC50) for Tetrahymena pyriformis.

  14. Improving Predictions of Protein-Protein Interfaces by Combining Amino Acid-Specific Classifiers Based on Structural and Physicochemical Descriptors with Their Weighted Neighbor Averages

    PubMed Central

    de Moraes, Fábio R.; Neshich, Izabella A. P.; Mazoni, Ivan; Yano, Inácio H.; Pereira, José G. C.; Salim, José A.; Jardine, José G.; Neshich, Goran

    2014-01-01

    Protein-protein interactions are involved in nearly all regulatory processes in the cell and are considered one of the most important issues in molecular biology and pharmaceutical sciences but are still not fully understood. Structural and computational biology contributed greatly to the elucidation of the mechanism of protein interactions. In this paper, we present a collection of the physicochemical and structural characteristics that distinguish interface-forming residues (IFR) from free surface residues (FSR). We formulated a linear discriminative analysis (LDA) classifier to assess whether chosen descriptors from the BlueStar STING database (http://www.cbi.cnptia.embrapa.br/SMS/) are suitable for such a task. Receiver operating characteristic (ROC) analysis indicates that the particular physicochemical and structural descriptors used for building the linear classifier perform much better than a random classifier and in fact, successfully outperform some of the previously published procedures, whose performance indicators were recently compared by other research groups. The results presented here show that the selected set of descriptors can be utilized to predict IFRs, even when homologue proteins are missing (particularly important for orphan proteins where no homologue is available for comparative analysis/indication) or, when certain conformational changes accompany interface formation. The development of amino acid type specific classifiers is shown to increase IFR classification performance. Also, we found that the addition of an amino acid conservation attribute did not improve the classification prediction. This result indicates that the increase in predictive power associated with amino acid conservation is exhausted by adequate use of an extensive list of independent physicochemical and structural parameters that, by themselves, fully describe the nano-environment at protein-protein interfaces. The IFR classifier developed in this study is now

  15. Improving predictions of protein-protein interfaces by combining amino acid-specific classifiers based on structural and physicochemical descriptors with their weighted neighbor averages.

    PubMed

    de Moraes, Fábio R; Neshich, Izabella A P; Mazoni, Ivan; Yano, Inácio H; Pereira, José G C; Salim, José A; Jardine, José G; Neshich, Goran

    2014-01-01

    Protein-protein interactions are involved in nearly all regulatory processes in the cell and are considered one of the most important issues in molecular biology and pharmaceutical sciences but are still not fully understood. Structural and computational biology contributed greatly to the elucidation of the mechanism of protein interactions. In this paper, we present a collection of the physicochemical and structural characteristics that distinguish interface-forming residues (IFR) from free surface residues (FSR). We formulated a linear discriminative analysis (LDA) classifier to assess whether chosen descriptors from the BlueStar STING database (http://www.cbi.cnptia.embrapa.br/SMS/) are suitable for such a task. Receiver operating characteristic (ROC) analysis indicates that the particular physicochemical and structural descriptors used for building the linear classifier perform much better than a random classifier and in fact, successfully outperform some of the previously published procedures, whose performance indicators were recently compared by other research groups. The results presented here show that the selected set of descriptors can be utilized to predict IFRs, even when homologue proteins are missing (particularly important for orphan proteins where no homologue is available for comparative analysis/indication) or, when certain conformational changes accompany interface formation. The development of amino acid type specific classifiers is shown to increase IFR classification performance. Also, we found that the addition of an amino acid conservation attribute did not improve the classification prediction. This result indicates that the increase in predictive power associated with amino acid conservation is exhausted by adequate use of an extensive list of independent physicochemical and structural parameters that, by themselves, fully describe the nano-environment at protein-protein interfaces. The IFR classifier developed in this study is now

  16. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    PubMed Central

    Green, James R; Korenberg, Michael J; Aboul-Magd, Mohammed O

    2009-01-01

    Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at . In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting

  17. Prediction of the structure of a novel amylopectin-based Cd-associated molecule in the stem of common reed grown in the presence of Cd.

    PubMed

    Higuchi, Kyoko; Ito, Naho; Nukada, Tomoo

    2016-10-01

    We previously found a novel Cd-associated molecule with an apparent molecular weight of 10-50 kDa in common reeds grown in the presence of Cd. The partial structure of this molecule was predicted by enzymatic digestion to release Cd from a trace amount that had been partially purified from the cell sap. The major component was branched α-glucan, whereas a peptide, β-1,4 glucan, and mannose were found as minor components. Uronic acids appeared to provide functional groups that bind Cd. PMID:27296512

  18. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  19. On lattice protein structure prediction revisited.

    PubMed

    Dotu, Ivan; Cebrián, Manuel; Van Hentenryck, Pascal; Clote, Peter

    2011-01-01

    Protein structure prediction is regarded as a highly challenging problem both for the biology and for the computational communities. In recent years, many approaches have been developed, moving to increasingly complex lattice models and off-lattice models. This paper presents a Large Neighborhood Search (LNS) to find the native state for the Hydrophobic-Polar (HP) model on the Face-Centered Cubic (FCC) lattice or, in other words, a self-avoiding walk on the FCC lattice having a maximum number of H-H contacts. The algorithm starts with a tabu-search algorithm, whose solution is then improved by a combination of constraint programming and LNS. The flexible framework of this hybrid algorithm allows an adaptation to the Miyazawa-Jernigan contact potential, in place of the HP model, thus suggesting its potential for tertiary structure prediction. Benchmarking statistics are given for our method against the hydrophobic core threading program HPstruct, an exact method which can be viewed as complementary to our method. PMID:21358007

  20. Accurate Prediction of Docked Protein Structure Similarity.

    PubMed

    Akbal-Delibas, Bahar; Pomplun, Marc; Haspel, Nurit

    2015-09-01

    One of the major challenges for protein-protein docking methods is to accurately discriminate nativelike structures. The protein docking community agrees on the existence of a relationship between various favorable intermolecular interactions (e.g. Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure. Different docking algorithms often formulate this relationship as a weighted sum of selected terms and calibrate their weights against specific training data to evaluate and rank candidate structures. However, the exact form of this relationship is unknown and the accuracy of such methods is impaired by the pervasiveness of false positives. Unlike the conventional scoring functions, we propose a novel machine learning approach that not only ranks the candidate structures relative to each other but also indicates how similar each candidate is to the native conformation. We trained the AccuRMSD neural network with an extensive dataset using the back-propagation learning algorithm. Our method achieved predicting RMSDs of unbound docked complexes with 0.4Å error margin. PMID:26335807

  1. THE FUTURE OF COMPUTER-BASED TOXICITY PREDICTION: MECHANISM-BASED MODELS VS. INFORMATION MINING APPROACHES

    EPA Science Inventory


    The Future of Computer-Based Toxicity Prediction:
    Mechanism-Based
    Models vs. Information Mining Approaches

    When we speak of computer-based toxicity prediction, we are generally referring to a broad array of approaches which rely primarily upon chemical structure ...

  2. Predicting loop–helix tertiary structural contacts in RNA pseudoknots

    PubMed Central

    Cao, Song; Giedroc, David P.; Chen, Shi-Jie

    2010-01-01

    Tertiary interactions between loops and helical stems play critical roles in the biological function of many RNA pseudoknots. However, quantitative predictions for RNA tertiary interactions remain elusive. Here we report a statistical mechanical model for the prediction of noncanonical loop–stem base-pairing interactions in RNA pseudoknots. Central to the model is the evaluation of the conformational entropy for the pseudoknotted folds with defined loop–stem tertiary structural contacts. We develop an RNA virtual bond-based conformational model (Vfold model), which permits a rigorous computation of the conformational entropy for a given fold that contains loop–stem tertiary contacts. With the entropy parameters predicted from the Vfold model and the energy parameters for the tertiary contacts as inserted parameters, we can then predict the RNA folding thermodynamics, from which we can extract the tertiary contact thermodynamic parameters from theory–experimental comparisons. These comparisons reveal a contact enthalpy (ΔH) of −14 kcal/mol and a contact entropy (ΔS) of −38 cal/mol/K for a protonated C+•(G–C) base triple at pH 7.0, and (ΔH = −7 kcal/mol, ΔS = −19 cal/mol/K) for an unprotonated base triple. Tests of the model for a series of pseudoknots show good theory–experiment agreement. Based on the extracted energy parameters for the tertiary structural contacts, the model enables predictions for the structure, stability, and folding pathways for RNA pseudoknots with known or postulated loop–stem tertiary contacts from the nucleotide sequence alone. PMID:20100813

  3. Structure prediction of magnetosome-associated proteins.

    PubMed

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region - the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure-function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models' overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs. PMID:24523717

  4. Evaluation, analysis and prediction of geologic structures

    NASA Astrophysics Data System (ADS)

    Woodward, Nicholas B.

    2012-08-01

    Balanced cross-sections claim to be better because they apply a rigorous set of rules to develop the conceptual model of the structures present in an area. Balanced cross-sections can be further improved and become more useful to understanding real physical problems by collection of additional data such as seismic reflection surveys, collection of additional stratigraphic data, or collection of rock fabric information. The additional information validates the initial model and provides details on deformation conditions and on local rock responses to the deformation. Although individual cross-sections are two dimensional, the objective of evaluation and analysis of deformed regions should be three dimensional whenever possible to recognize the challenges of the real world. Subsurface system analysis derived from the hydrologic community emphasizes conceptual model development through model verification, validation, uncertainty quantification, benchmarking and meta-analysis. Their approach includes many steps informally used by the structural geology community but in a much more explicit way. Newer geological applications of structural geology would benefit from this more rigorous approach for designing and doing performance predictions as technological needs become more socially sensitive such as for carbon storage sites, new areas of energy exploration in higher population density areas, or for nuclear waste storage facilities.

  5. A wave based method to predict the absorption, reflection and transmission coefficient of two-dimensional rigid frame porous structures with periodic inclusions

    NASA Astrophysics Data System (ADS)

    Deckers, Elke; Claeys, Claus; Atak, Onur; Groby, Jean-Philippe; Dazel, Olivier; Desmet, Wim

    2016-05-01

    This paper presents an extension to the Wave Based Method to predict the absorption, reflection and transmission coefficients of a porous material with an embedded periodic set of inclusions. The porous unit cell is described using the Multi-Level methodology and by embedding Bloch-Floquet periodicity conditions in the weighted residual scheme. The dynamic pressure field in the semi-infinite acoustic domains is approximated using a novel wave function set that fulfils the Helmholtz equation, the Bloch-Floquet periodicity conditions and the Sommerfeld radiation condition. The method is meshless and computationally efficient, which makes it well suited for optimisation studies.

  6. Predicting electrical measurements by applying scatterometry to complex spacer structures

    NASA Astrophysics Data System (ADS)

    Sendelbach, Matthew; Ayala, Javier; Herrera, Pedro

    2007-03-01

    The comparison of scatterometry measurements of complex spacer structures to electrical test measurements is discussed. Details of the NFET and PFET structures are presented, along with a summary of the scatterometry models used to represent the structures. Before comparison data are shown, a methodology and set of metrics are presented that assist in the analysis and interpretation of comparison data. The methodology, called Prediction Analysis, has its roots in TMU analysis, where both measurements are subject to error. But in Prediction Analysis, an "apples-to-apples" comparison of the measurements is not the goal, and the measurements may be reported in different units. The goal of Prediction Analysis is to analyze the components of error in a correlation and use this analysis to predict a measurement based on the knowledge of another measurement, such that the predicted measurement is bounded. This method is used in this work to determine how well scatterometry measurements of certain parameters correlate to electrical measurements of gate resistance, gate Lpoly, and transistor current Ion. Clear correlations are demonstrated, and physical explanations that explain these correlations are presented. Due to the correlations, the scatterometry measurements can be used as a predictor of electrical performance significantly before the electrical test occurs. Because of this, scatterometry can be a reliable measurement technique for improving spacer controls and reducing the mean time to detect (MTTD) some profile abnormalities.

  7. Optimizing nondecomposable loss functions in structured prediction.

    PubMed

    Ranjbar, Mani; Lan, Tian; Wang, Yang; Robinovitch, Steven N; Li, Ze-Nian; Mori, Greg

    2013-04-01

    We develop an algorithm for structured prediction with nondecomposable performance measures. The algorithm learns parameters of Markov Random Fields (MRFs) and can be applied to multivariate performance measures. Examples include performance measures such as Fβ score (natural language processing), intersection over union (object category segmentation), Precision/Recall at k (search engines), and ROC area (binary classifiers). We attack this optimization problem by approximating the loss function with a piecewise linear function. The loss augmented inference forms a Quadratic Program (QP), which we solve using LP relaxation. We apply this approach to two tasks: object class-specific segmentation and human action retrieval from videos. We show significant improvement over baseline approaches that either use simple loss functions or simple scoring functions on the PASCAL VOC and H3D Segmentation datasets, and a nursing home action recognition dataset. PMID:22868650

  8. Data-directed RNA secondary structure prediction using probabilistic modeling.

    PubMed

    Deng, Fei; Ledda, Mirko; Vaziri, Sana; Aviran, Sharon

    2016-08-01

    Structure dictates the function of many RNAs, but secondary RNA structure analysis is either labor intensive and costly or relies on computational predictions that are often inaccurate. These limitations are alleviated by integration of structure probing data into prediction algorithms. However, existing algorithms are optimized for a specific type of probing data. Recently, new chemistries combined with advances in sequencing have facilitated structure probing at unprecedented scale and sensitivity. These novel technologies and anticipated wealth of data highlight a need for algorithms that readily accommodate more complex and diverse input sources. We implemented and investigated a recently outlined probabilistic framework for RNA secondary structure prediction and extended it to accommodate further refinement of structural information. This framework utilizes direct likelihood-based calculations of pseudo-energy terms per considered structural context and can readily accommodate diverse data types and complex data dependencies. We use real data in conjunction with simulations to evaluate performances of several implementations and to show that proper integration of structural contexts can lead to improvements. Our tests also reveal discrepancies between real data and simulations, which we show can be alleviated by refined modeling. We then propose statistical preprocessing approaches to standardize data interpretation and integration into such a generic framework. We further systematically quantify the information content of data subsets, demonstrating that high reactivities are major drivers of SHAPE-directed predictions and that better understanding of less informative reactivities is key to further improvements. Finally, we provide evidence for the adaptive capability of our framework using mock probe simulations. PMID:27251549

  9. Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge

    PubMed Central

    Kaplan, Tommy; Friedman, Nir; Margalit, Hanah

    2005-01-01

    Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys2His2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys2His2 transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins. PMID:16103898

  10. Factor Structure of Self-Regulation in Preschoolers: Testing Models of a Field-Based Assessment for Predicting Early School Readiness

    PubMed Central

    Denham, Susanne A; Warren-Khot, Heather K.; Bassett, Hideko Hamada; Wyatt, Todd; Perna, Alyssa

    2011-01-01

    The importance of early self-regulatory skill has seen increased focus in the applied research literature, given the implications of these skills for early school success. A three-factor latent structure of self-regulation consisting of compliance, cool executive control, and hot executive control, was tested against alternative models, and retained as best fitting. Tests of model equivalence indicated the model held invariant across Head Start and private child care samples. Partial invariance was supported for age and gender. In the validity model, because of substantial amount of shared variance among latent factors, we included a second-order factor explaining the two types of executive control. Higher-Order Executive Control positively predicted teacher report of learning behaviors and social competence in the classroom. These findings are discussed in light of their practical and theoretical significance. PMID:22104321

  11. Predictive modeling of neuroanatomic structures for brain atrophy detection

    NASA Astrophysics Data System (ADS)

    Hu, Xintao; Guo, Lei; Nie, Jingxin; Li, Kaiming; Liu, Tianming

    2010-03-01

    In this paper, we present an approach of predictive modeling of neuroanatomic structures for the detection of brain atrophy based on cross-sectional MRI image. The underlying premise of applying predictive modeling for atrophy detection is that brain atrophy is defined as significant deviation of part of the anatomy from what the remaining normal anatomy predicts for that part. The steps of predictive modeling are as follows. The central cortical surface under consideration is reconstructed from brain tissue map and Regions of Interests (ROI) on it are predicted from other reliable anatomies. The vertex pair-wise distance between the predicted vertex and the true one within the abnormal region is expected to be larger than that of the vertex in normal brain region. Change of white matter/gray matter ratio within a spherical region is used to identify the direction of vertex displacement. In this way, the severity of brain atrophy can be defined quantitatively by the displacements of those vertices. The proposed predictive modeling method has been evaluated by using both simulated atrophies and MRI images of Alzheimer's disease.

  12. Structure prediction of magnetosome-associated proteins

    PubMed Central

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region – the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure–function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models’ overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs. PMID:24523717

  13. Predicting inclusion behaviour and framework structures in organic crystals.

    PubMed

    Cruz-Cabeza, Aurora J; Day, Graeme M; Jones, William

    2009-12-01

    We have used well-established computational methods to generate and explore the crystal structure landscapes of four organic molecules of well-known inclusion behaviour. Using these methods, we are able to generate both close-packed crystal structures and high-energy open frameworks containing voids of molecular dimensions. Some of these high-energy open frameworks correspond to real structures observed experimentally when the appropriate guest molecules are present during crystallisation. We propose a combination of crystal structure prediction methodologies with structure rankings based on relative lattice energy and solvent-accessible volume as a way of selecting likely inclusion frameworks completely ab initio. This methodology can be used as part of a rational strategy in the design of inclusion compounds, and also for the anticipation of inclusion behaviour in organic molecules. PMID:19876969

  14. Synthesis, characterization, crystal structure and predicting the second-order optical nonlinearity of a new dicobalt(III) complex with Schiff base ligand

    NASA Astrophysics Data System (ADS)

    Zarei, Seyed Amir; Piltan, Mohammad; Hassanzadeh, Keyumars; Akhtari, Keivan; Cinčić, Dominik

    2015-03-01

    The synthesis and characterization of dicobalt(III) complex [Co2L2(OMe)2] of the tetradentate Schiff base ligand N,N‧-bis(2-hydroxybenzylidene)-2,2-dimethyl-1,3-propanediamine (H2L) is reported. The crystal structure of the complex has been determined that exhibited the pseudo-octahedral geometry around both cobalt(III) ions. In the complexation process, H2L acts as two negatively charged tetradentate ligand, L2-, and methoxy group plays as bridging ligand. The geometry structure of the complex is optimized by density functional theory (DFT) using B3LYP/6-311G(d,p). The calculated geometric parameters are in good agreement with the corresponding experimental data. Second-Order Nonlinear Optical (NLO) property of the complex is evaluated by DFT/B3LYP/6-311G(d,p) on the base of the optimized structure that shows the enhancement relative to the calculated value of H2L. The calculated NLO value of the complex is much greater than the corresponding value of urea.

  15. Structural imaging biomarkers of Alzheimer's disease: predicting disease progression.

    PubMed

    Eskildsen, Simon F; Coupé, Pierrick; Fonov, Vladimir S; Pruessner, Jens C; Collins, D Louis

    2015-01-01

    Optimized magnetic resonance imaging (MRI)-based biomarkers of Alzheimer's disease (AD) may allow earlier detection and refined prediction of the disease. In addition, they could serve as valuable tools when designing therapeutic studies of individuals at risk of AD. In this study, we combine (1) a novel method for grading medial temporal lobe structures with (2) robust cortical thickness measurements to predict AD among subjects with mild cognitive impairment (MCI) from a single T1-weighted MRI scan. Using AD and cognitively normal individuals, we generate a set of features potentially discriminating between MCI subjects who convert to AD and those who remain stable over a period of 3 years. Using mutual information-based feature selection, we identify 5 key features optimizing the classification of MCI converters. These features are the left and right hippocampi gradings and cortical thicknesses of the left precuneus, left superior temporal sulcus, and right anterior part of the parahippocampal gyrus. We show that these features are highly stable in cross-validation and enable a prediction accuracy of 72% using a simple linear discriminant classifier, the highest prediction accuracy obtained on the baseline Alzheimer's Disease Neuroimaging Initiative first phase cohort to date. The proposed structural features are consistent with Braak stages and previously reported atrophic patterns in AD and are easy to transfer to new cohorts and to clinical practice. PMID:25260851

  16. PREDICTING TOXICOLOGICAL ENDPOINTS OF CHEMICALS USING QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS (QSARS)

    EPA Science Inventory

    Quantitative structure-activity relationships (QSARs) are being developed to predict the toxicological endpoints for untested chemicals similar in structure to chemicals that have known experimental toxicological data. Based on a very large number of predetermined descriptors, a...

  17. Gogny HFB prediction of nuclear structure properties

    SciTech Connect

    Goriely, S.; Hilaire, S.; Girod, M.

    2011-10-28

    Large scale mean field calculations from proton to neutron drip lines have been performed using the Hartree-Fock-Bogoliubov method based on the Gogny nucleon-nucleon effective interaction. This extensive study has shown the ability of the method to reproduce bulk nuclear structure data available experimentally. This includes nuclear masses, radii, matter densities, deformations, moment of inertia as well as collective mode (low energy and giant resonances). In particular, the first mass table based on a Gogny-Hartree-Fock-Bogolyubov calculation including an explicit and coherent account of all the quadrupole correlation energies is presented. The rms deviation with respect to essentially all the available mass data is 798 keV. Nearly 8000 nuclei have been studied under the axial symmetry hypothesis and going beyond the mean-field approach.

  18. Structural model of ρ1 GABAC receptor based on evolutionary analysis: Testing of predicted protein–protein interactions involved in receptor assembly and function

    PubMed Central

    Adamian, Larisa; Gussin, Hélène A; Tseng, Yan Yuan; Muni, Niraj J; Feng, Feng; Qian, Haohua; Pepperberg, David R; Liang, Jie

    2009-01-01

    The homopentameric ρ1 GABAC receptor is a ligand-gated ion channel with a binding pocket for γ-aminobutyric acid (GABA) at the interfaces of N-terminal extracellular domains. We combined evolutionary analysis, structural modeling, and experimental testing to study determinants of GABAC receptor assembly and channel gating. We estimated the posterior probability of selection pressure at amino acid residue sites measured as ω-values and built a comparative structural model, which identified several polar residues under strong selection pressure at the subunit interfaces that may form intersubunit hydrogen bonds or salt bridges. At three selected sites (R111, T151, and E55), mutations disrupting intersubunit interactions had strong effects on receptor folding, assembly, and function. We next examined the role of a predicted intersubunit salt bridge for residue pair R158–D204. The mutant R158D, where the positively charged residue is replaced by a negatively charged aspartate, yielded a partially degraded receptor and lacked membrane surface expression. The membrane surface expression was rescued by the double mutant R158D–D204R, where positive and negative charges are switched, although the mutant receptor was inactive. The single mutants R158A, D204R, and D204A exhibited diminished activities and altered kinetic profiles with fast recovery kinetics, suggesting that R158–D204 salt bridge perhaps stabilizes the open state of the GABAC receptor. Our results emphasize the functional importance of highly conserved polar residues at the protein–protein interfaces in GABAC ρ1 receptors and demonstrate how the integration of computational and experimental approaches can aid discovery of functionally important interactions. PMID:19768800

  19. RNAex: an RNA secondary structure prediction server enhanced by high-throughput structure-probing data.

    PubMed

    Wu, Yang; Qu, Rihao; Huang, Yiming; Shi, Binbin; Liu, Mengrong; Li, Yang; Lu, Zhi John

    2016-07-01

    Several high-throughput technologies have been developed to probe RNA base pairs and loops at the transcriptome level in multiple species. However, to obtain the final RNA secondary structure, extensive effort and considerable expertise is required to statistically process the probing data and combine them with free energy models. Therefore, we developed an RNA secondary structure prediction server that is enhanced by experimental data (RNAex). RNAex is a web interface that enables non-specialists to easily access cutting-edge structure-probing data and predict RNA secondary structures enhanced by in vivo and in vitro data. RNAex annotates the RNA editing, RNA modification and SNP sites on the predicted structures. It provides four structure-folding methods, restrained MaxExpect, SeqFold, RNAstructure (Fold) and RNAfold that can be selected by the user. The performance of these four folding methods has been verified by previous publications on known structures. We re-mapped the raw sequencing data of the probing experiments to the whole genome for each species. RNAex thus enables users to predict secondary structures for both known and novel RNA transcripts in human, mouse, yeast and Arabidopsis The RNAex web server is available at http://RNAex.ncrnalab.org/. PMID:27137891

  20. RNAex: an RNA secondary structure prediction server enhanced by high-throughput structure-probing data

    PubMed Central

    Wu, Yang; Qu, Rihao; Huang, Yiming; Shi, Binbin; Liu, Mengrong; Li, Yang; Lu, Zhi John

    2016-01-01

    Several high-throughput technologies have been developed to probe RNA base pairs and loops at the transcriptome level in multiple species. However, to obtain the final RNA secondary structure, extensive effort and considerable expertise is required to statistically process the probing data and combine them with free energy models. Therefore, we developed an RNA secondary structure prediction server that is enhanced by experimental data (RNAex). RNAex is a web interface that enables non-specialists to easily access cutting-edge structure-probing data and predict RNA secondary structures enhanced by in vivo and in vitro data. RNAex annotates the RNA editing, RNA modification and SNP sites on the predicted structures. It provides four structure-folding methods, restrained MaxExpect, SeqFold, RNAstructure (Fold) and RNAfold that can be selected by the user. The performance of these four folding methods has been verified by previous publications on known structures. We re-mapped the raw sequencing data of the probing experiments to the whole genome for each species. RNAex thus enables users to predict secondary structures for both known and novel RNA transcripts in human, mouse, yeast and Arabidopsis. The RNAex web server is available at http://RNAex.ncrnalab.org/. PMID:27137891

  1. A Hybrid Loss for Multiclass and Structured Prediction.

    PubMed

    Shi, Qinfeng; Reid, Mark; Caetano, Tiberio; van den Hengel, Anton; Wang, Zhenhua

    2015-01-01

    We propose a novel hybrid loss for multiclass and structured prediction problems that is a convex combination of a log loss for Conditional Random Fields (CRFs) and a multiclass hinge loss for Support Vector Machines (SVMs). We provide a sufficient condition for when the hybrid loss is Fisher consistent for classification. This condition depends on a measure of dominance between labels-specifically, the gap between the probabilities of the best label and the second best label. We also prove Fisher consistency is necessary for parametric consistency when learning models such as CRFs. We demonstrate empirically that the hybrid loss typically performs least as well as-and often better than-both of its constituent losses on a variety of tasks, such as human action recognition. In doing so we also provide an empirical comparison of the efficacy of probabilistic and margin based approaches to multiclass and structured prediction. PMID:26353204

  2. Prediction of the structure of symmetrical protein assemblies

    PubMed Central

    André, Ingemar; Bradley, Philip; Wang, Chu; Baker, David

    2007-01-01

    Biological supramolecular systems are commonly built up by the self-assembly of identical protein subunits to produce symmetrical oligomers with cyclical, icosahedral, or helical symmetry that play roles in processes ranging from allosteric control and molecular transport to motor action. The large size of these systems often makes them difficult to structurally characterize using experimental techniques. We have developed a computational protocol to predict the structure of symmetrical protein assemblies based on the structure of a single subunit. The method carries out simultaneous optimization of backbone, side chain, and rigid-body degrees of freedom, while restricting the search space to symmetrical conformations. Using this protocol, we can reconstruct, starting from the structure of a single subunit, the structure of cyclic oligomers and the icosahedral virus capsid of satellite panicum virus using a rigid backbone approximation. We predict the oligomeric state of EscJ from the type III secretion system both in its proposed cyclical and crystallized helical form. Finally, we show that the method can recapitulate the structure of an amyloid-like fibril formed by the peptide NNQQNY from the yeast prion protein Sup35 starting from the amino acid sequence alone and searching the complete space of backbone, side chain, and rigid-body degrees of freedom. PMID:17978193

  3. Generalized Pattern Search Algorithm for Peptide Structure Prediction

    PubMed Central

    Nicosia, Giuseppe; Stracquadanio, Giovanni

    2008-01-01

    Finding the near-native structure of a protein is one of the most important open problems in structural biology and biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to structures already known. This situation occurs frequently when we need to predict the tertiary structure of small molecules, called peptides. In this research work, we propose a new ab initio algorithm, the generalized pattern search algorithm, based on the well-known class of Search-and-Poll algorithms. We performed an extensive set of simulations over a well-known set of 44 peptides to investigate the robustness and reliability of the proposed algorithm, and we compared the peptide conformation with a state-of-the-art algorithm for peptide structure prediction known as PEPstr. In particular, we tested the algorithm on the instances proposed by the originators of PEPstr, to validate the proposed algorithm; the experimental results confirm that the generalized pattern search algorithm outperforms PEPstr by 21.17% in terms of average root mean-square deviation, RMSD Cα. PMID:18487293

  4. A tool for the prediction of structures of complex sugars.

    PubMed

    Xia, Junchao; Margulis, Claudio

    2008-12-01

    In two recent back to back articles(Xia et al., J Chem Theory Comput 3:1620-1628 and 1629-1643, 2007a, b) we have started to address the problem of complex oligosaccharide conformation and folding. The scheme previously presented was based on exhaustive searches in configuration space in conjunction with Nuclear Overhauser Effect (NOE) calculations and the use of a complex rotameric library that takes branching into account. NOEs are extremely useful for structural determination but only provide information about short range interactions and ordering. Instead, the measurement of residual dipolar couplings (RDC), yields information about molecular ordering or folding that is long range in nature. In this article we show the results obtained by incorporation RDC calculations into our prediction scheme. Using this new approach we are able to accurately predict the structure of six human milk sugars: LNF-1, LND-1, LNF-2, LNF-3, LNnT and LNT. Our exhaustive search in dihedral configuration space combined with RDC and NOE calculations allows for highly accurate structural predictions that, because of the non-ergodic nature of these molecules on a time scale compatible with molecular dynamics simulations, are extremely hard to obtain otherwise (Almond et al., Biochemistry 43:5853-5863, 2004). Molecular dynamics simulations in explicit solvent using as initial configurations the structures predicted by our algorithm show that the histo-blood group epitopes in these sugars are relatively rigid and that the whole family of oligosaccharides derives its conformational variability almost exclusively from their common linkage (beta-D: -GlcNAc-(1-->3)-beta-D: -Gal) which can exist in two distinct conformational states. A population analysis based on the conformational variability of this flexible glycosidic link indicates that the relative population of the two distinct states varies for different human milk oligosaccharides. PMID:18953494

  5. PREDICTING RNA STRUCTURE BY MULTIPLE TEMPLATE HOMOLOGY MODELING

    PubMed Central

    FLORES, SAMUEL C.; WAN, YAQI; RUSSELL, RICK; ALTMAN, RUSS B.

    2010-01-01

    Despite the importance of 3D structure to understand the myriad functions of RNAs in cells, most RNA molecules remain out of reach of crystallographic and NMR methods. However, certain structural information such as base pairing and some tertiary contacts can be determined readily for many RNAs by bioinformatics or relatively low cost experiments. Further, because RNA structure is highly modular, it is possible to deduce local 3D structure from the solved structures of evolutionarily related RNAs or even unrelated RNAs that share the same module. RNABuilder is a software package that generates model RNA structures by treating the kinematics and forces at separate, multiple levels of resolution. Kinematically, bonds in bases, certain stretches of residues, and some entire molecules are rigid while other bonds remain flexible. Forces act on the rigid bases and selected individual atoms. Here we use RNABuilder to predict the structure of the 200-nucleotide Azoarcus group I intron by homology modeling against fragments of the distantly-related Twort and Tetrahymena group I introns and by incorporating base pairing forces where necessary. In the absence of any information from the solved Azoarcus intron crystal structure, the model accurately depicts the global topology, secondary and tertiary connections, and gives an overall RMSD value of 4.6 Å relative to the crystal structure. The accuracy of the model is even higher in the intron core (RMSD = 3.5 Å), whereas deviations are modestly larger for peripheral regions that differ more substantially between the different introns. These results lay the groundwork for using this approach for larger and more diverse group I introns, as well for still larger RNAs and RNA-protein complexes such as group II introns and the ribosomal subunits. PMID:19908374

  6. Predicting fracture in micron-scale polycrystalline silicon MEMS structures.

    SciTech Connect

    Hazra, Siddharth S.; de Boer, Maarten Pieter; Boyce, Brad Lee; Ohlhausen, James Anthony; Foulk, James W., III; Reedy, Earl David, Jr.

    2010-09-01

    Designing reliable MEMS structures presents numerous challenges. Polycrystalline silicon fractures in a brittle manner with considerable variability in measured strength. Furthermore, it is not clear how to use a measured tensile strength distribution to predict the strength of a complex MEMS structure. To address such issues, two recently developed high throughput MEMS tensile test techniques have been used to measure strength distribution tails. The measured tensile strength distributions enable the definition of a threshold strength as well as an inferred maximum flaw size. The nature of strength-controlling flaws has been identified and sources of the observed variation in strength investigated. A double edge-notched specimen geometry was also tested to study the effect of a severe, micron-scale stress concentration on the measured strength distribution. Strength-based, Weibull-based, and fracture mechanics-based failure analyses were performed and compared with the experimental results.

  7. Statistical energy analysis response prediction methods for structural systems

    NASA Technical Reports Server (NTRS)

    Davis, R. F.

    1979-01-01

    The results of an effort to document methods for accomplishing response predictions for commonly encountered aerospace structural configurations is presented. Application of these methods to specified aerospace structure to provide sample analyses is included. An applications manual, with the structural analyses appended as example problems is given. Comparisons of the response predictions with measured data are provided for three of the example problems.

  8. The Proteome Folding Project: Proteome-scale prediction of structure and function

    PubMed Central

    Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard

    2011-01-01

    The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995

  9. Ichthyophonus parasite phylogeny based on ITS rDNA structure prediction and alignment identifies six clades, with a single dominant marine type

    USGS Publications Warehouse

    Gregg, Jacob; Thompson, Rachel L.; Purcell, Maureen; Friedman, Carolyn S.; Hershberger, Paul

    2016-01-01

    Despite their widespread, global impact in both wild and cultured fishes, little is known of the diversity, transmission patterns, and phylogeography of parasites generally identified as Ichthyophonus. This study constructed a phylogeny based on the structural alignment of internal transcribed spacer (ITS) rDNA sequences to compare Ichthyophonus isolates from fish hosts in the Atlantic and Pacific oceans, and several rivers and aquaculture sites in North America, Europe, and Japan. Structure of the Ichthyophonus ITS1–5.8S–ITS2 transcript exhibited several homologies with other eukaryotes, and 6 distinct clades were identified within Ichthyophonus. A single clade contained a majority (71 of 98) of parasite isolations. This ubiquitous Ichthyophonus type occurred in 13 marine and anadromous hosts and was associated with epizootics in Atlantic herring, Chinook salmon, and American shad. A second clade contained all isolates from aquaculture, despite great geographic separation of the freshwater hosts. Each of the 4 remaining clades contained isolates from single host species. This study is the first to evaluate the genetic relationships among Ichthyophonus species across a significant portion of their host and geographic range. Additionally, parasite infection prevalence is reported in 16 fish species.

  10. Ichthyophonus parasite phylogeny based on ITS rDNA structure prediction and alignment identifies six clades, with a single dominant marine type.

    PubMed

    Gregg, Jacob L; Powers, Rachel L; Purcell, Maureen K; Friedman, Carolyn S; Hershberger, Paul K

    2016-07-01

    Despite their widespread, global impact in both wild and cultured fishes, little is known of the diversity, transmission patterns, and phylogeography of parasites generally identified as Ichthyophonus. This study constructed a phylogeny based on the structural alignment of internal transcribed spacer (ITS) rDNA sequences to compare Ichthyophonus isolates from fish hosts in the Atlantic and Pacific oceans, and several rivers and aquaculture sites in North America, Europe, and Japan. Structure of the Ichthyophonus ITS1-5.8S-ITS2 transcript exhibited several homologies with other eukaryotes, and 6 distinct clades were identified within Ichthyophonus. A single clade contained a majority (71 of 98) of parasite isolations. This ubiquitous Ichthyophonus type occurred in 13 marine and anadromous hosts and was associated with epizootics in Atlantic herring, Chinook salmon, and American shad. A second clade contained all isolates from aquaculture, despite great geographic separation of the freshwater hosts. Each of the 4 remaining clades contained isolates from single host species. This study is the first to evaluate the genetic relationships among Ichthyophonus species across a significant portion of their host and geographic range. Additionally, parasite infection prevalence is reported in 16 fish species. PMID:27409236