Sample records for background protein structures

  1. Structural deformation upon protein-protein interaction: A structural alphabet approach

    PubMed Central

    Martin, Juliette; Regad, Leslie; Lecornet, Hélène; Camproux, Anne-Claude

    2008-01-01

    Background In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion Our study provides qualitative information about induced fit. These results could be of help for flexible docking. PMID:18307769

  2. Course 12: Proteins: Structural, Thermodynamic and Kinetic Aspects

    NASA Astrophysics Data System (ADS)

    Finkelstein, A. V.

    1 Introduction 2 Overview of protein architectures and discussion of physical background of their natural selection 2.1 Protein structures 2.2 Physical selection of protein structures 3 Thermodynamic aspects of protein folding 3.1 Reversible denaturation of protein structures 3.2 What do denatured proteins look like? 3.3 Why denaturation of a globular protein is the first-order phase transition 3.4 "Gap" in energy spectrum: The main characteristic that distinguishes protein chains from random polymers 4 Kinetic aspects of protein folding 4.1 Protein folding in vivo 4.2 Protein folding in vitro (in the test-tube) 4.3 Theory of protein folding rates and solution of the Levinthal paradox

  3. Projections for fast protein structure retrieval

    PubMed Central

    Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R

    2006-01-01

    Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310

  4. Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space

    PubMed Central

    2014-01-01

    Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel

  5. Protein structural similarity search by Ramachandran codes

    PubMed Central

    Lo, Wei-Cheng; Huang, Po-Jung; Chang, Chih-Hung; Lyu, Ping-Chiang

    2007-01-01

    Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. PMID:17716377

  6. Proteins with Novel Structure, Function and Dynamics

    NASA Technical Reports Server (NTRS)

    Pohorille, Andrew

    2014-01-01

    Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.

  7. Quality assessment of protein model-structures based on structural and functional similarities

    PubMed Central

    2012-01-01

    Background Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best

  8. Protein Structure Prediction by Protein Threading

    NASA Astrophysics Data System (ADS)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  9. Classification of protein quaternary structure by functional domain composition

    PubMed Central

    Yu, Xiaojing; Wang, Chuan; Li, Yixue

    2006-01-01

    Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572

  10. Structure prediction of polyglutamine disease proteins: comparison of methods

    PubMed Central

    2014-01-01

    Background The expansion of polyglutamine (poly-Q) repeats in several unrelated proteins is associated with at least ten neurodegenerative diseases. The length of the poly-Q regions plays an important role in the progression of the diseases. The number of glutamines (Q) is inversely related to the onset age of these polyglutamine diseases, and the expansion of poly-Q repeats has been associated with protein misfolding. However, very little is known about the structural changes induced by the expansion of the repeats. Computational methods can provide an alternative to determine the structure of these poly-Q proteins, but it is important to evaluate their performance before large scale prediction work is done. Results In this paper, two popular protein structure prediction programs, I-TASSER and Rosetta, have been used to predict the structure of the N-terminal fragment of a protein associated with Huntington's disease with 17 glutamines. Results show that both programs have the ability to find the native structures, but I-TASSER performs better for the overall task. Conclusions Both I-TASSER and Rosetta can be used for structure prediction of proteins with poly-Q repeats. Knowledge of poly-Q structure may significantly contribute to development of therapeutic strategies for poly-Q diseases. PMID:25080018

  11. Structural deformation upon protein-protein interaction: a structural alphabet approach.

    PubMed

    Martin, Juliette; Regad, Leslie; Lecornet, Hélène; Camproux, Anne-Claude

    2008-02-28

    In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Our study provides qualitative information about induced fit. These results could be of help for flexible docking.

  12. Designing and benchmarking the MULTICOM protein structure prediction system

    PubMed Central

    2013-01-01

    Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819

  13. Protein Structure

    ERIC Educational Resources Information Center

    Asmus, Elaine Garbarino

    2007-01-01

    Individual students model specific amino acids and then, through dehydration synthesis, a class of students models a protein. The students clearly learn amino acid structure, primary, secondary, tertiary, and quaternary structure in proteins and the nature of the bonds maintaining a protein's shape. This activity is fun, concrete, inexpensive and…

  14. Improved protein surface comparison and application to low-resolution protein structure data

    PubMed Central

    2010-01-01

    Background Recent advancements of experimental techniques for determining protein tertiary structures raise significant challenges for protein bioinformatics. With the number of known structures of unknown function expanding at a rapid pace, an urgent task is to provide reliable clues to their biological function on a large scale. Conventional approaches for structure comparison are not suitable for a real-time database search due to their slow speed. Moreover, a new challenge has arisen from recent techniques such as electron microscopy (EM), which provide low-resolution structure data. Previously, we have introduced a method for protein surface shape representation using the 3D Zernike descriptors (3DZDs). The 3DZD enables fast structure database searches, taking advantage of its rotation invariance and compact representation. The search results of protein surface represented with the 3DZD has showngood agreement with the existing structure classifications, but some discrepancies were also observed. Results The three new surface representations of backbone atoms, originally devised all-atom-surface representation, and the combination of all-atom surface with the backbone representation are examined. All representations are encoded with the 3DZD. Also, we have investigated the applicability of the 3DZD for searching protein EM density maps of varying resolutions. The surface representations are evaluated on structure retrieval using two existing classifications, SCOP and the CE-based classification. Conclusions Overall, the 3DZDs representing backbone atoms show better retrieval performance than the original all-atom surface representation. The performance further improved when the two representations are combined. Moreover, we observed that the 3DZD is also powerful in comparing low-resolution structures obtained by electron microscopy. PMID:21172052

  15. Representing and comparing protein structures as paths in three-dimensional space

    PubMed Central

    Zhi, Degui; Krishna, S Sri; Cao, Haibo; Pevzner, Pavel; Godzik, Adam

    2006-01-01

    Background Most existing formulations of protein structure comparison are based on detailed atomic level descriptions of protein structures and bypass potential insights that arise from a higher-level abstraction. Results We propose a structure comparison approach based on a simplified representation of proteins that describes its three-dimensional path by local curvature along the generalized backbone of the polypeptide. We have implemented a dynamic programming procedure that aligns curvatures of proteins by optimizing a defined sum turning angle deviation measure. Conclusion Although our procedure does not directly optimize global structural similarity as measured by RMSD, our benchmarking results indicate that it can surprisingly well recover the structural similarity defined by structure classification databases and traditional structure alignment programs. In addition, our program can recognize similarities between structures with extensive conformation changes that are beyond the ability of traditional structure alignment programs. We demonstrate the applications of procedure to several contexts of structure comparison. An implementation of our procedure, CURVE, is available as a public webserver. PMID:17052359

  16. Visual signal detection in structured backgrounds. II. Effects of contrast gain control, background variations, and white noise

    NASA Technical Reports Server (NTRS)

    Eckstein, M. P.; Ahumada, A. J. Jr; Watson, A. B.

    1997-01-01

    Studies of visual detection of a signal superimposed on one of two identical backgrounds show performance degradation when the background has high contrast and is similar in spatial frequency and/or orientation to the signal. To account for this finding, models include a contrast gain control mechanism that pools activity across spatial frequency, orientation and space to inhibit (divisively) the response of the receptor sensitive to the signal. In tasks in which the observer has to detect a known signal added to one of M different backgrounds grounds due to added visual noise, the main sources of degradation are the stochastic noise in the image and the suboptimal visual processing. We investigate how these two sources of degradation (contrast gain control and variations in the background) interact in a task in which the signal is embedded in one of M locations in a complex spatially varying background (structured background). We use backgrounds extracted from patient digital medical images. To isolate effects of the fixed deterministic background (the contrast gain control) from the effects of the background variations, we conduct detection experiments with three different background conditions: (1) uniform background, (2) a repeated sample of structured background, and (3) different samples of structured background. Results show that human visual detection degrades from the uniform background condition to the repeated background condition and degrades even further in the different backgrounds condition. These results suggest that both the contrast gain control mechanism and the background random variations degrade human performance in detection of a signal in a complex, spatially varying background. A filter model and added white noise are used to generate estimates of sampling efficiencies, an equivalent internal noise, an equivalent contrast-gain-control-induced noise, and an equivalent noise due to the variations in the structured background.

  17. HDAPD: a web tool for searching the disease-associated protein structures

    PubMed Central

    2010-01-01

    Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919

  18. Accelerating large-scale protein structure alignments with graphics processing units

    PubMed Central

    2012-01-01

    Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132

  19. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

    PubMed

    Kinjo, Akira R; Nakamura, Haruki

    2013-01-01

    Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.

  20. Hydrogen atoms in protein structures: high-resolution X-ray diffraction structure of the DFPase

    PubMed Central

    2013-01-01

    Background Hydrogen atoms represent about half of the total number of atoms in proteins and are often involved in substrate recognition and catalysis. Unfortunately, X-ray protein crystallography at usual resolution fails to access directly their positioning, mainly because light atoms display weak contributions to diffraction. However, sub-Ångstrom diffraction data, careful modeling and a proper refinement strategy can allow the positioning of a significant part of hydrogen atoms. Results A comprehensive study on the X-ray structure of the diisopropyl-fluorophosphatase (DFPase) was performed, and the hydrogen atoms were modeled, including those of solvent molecules. This model was compared to the available neutron structure of DFPase, and differences in the protein and the active site solvation were noticed. Conclusions A further examination of the DFPase X-ray structure provides substantial evidence about the presence of an activated water molecule that may constitute an interesting piece of information as regard to the enzymatic hydrolysis mechanism. PMID:23915572

  1. Protein structure prediction with local adjust tabu search algorithm

    PubMed Central

    2014-01-01

    Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708

  2. An approach to large scale identification of non-obvious structural similarities between proteins

    PubMed Central

    Cherkasov, Artem; Jones, Steven JM

    2004-01-01

    Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578

  3. Protein structure based prediction of catalytic residues

    PubMed Central

    2013-01-01

    Background Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. Results We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. Conclusions We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific

  4. Structures of membrane proteins

    PubMed Central

    Vinothkumar, Kutti R.; Henderson, Richard

    2010-01-01

    In reviewing the structures of membrane proteins determined up to the end of 2009, we present in words and pictures the most informative examples from each family. We group the structures together according to their function and architecture to provide an overview of the major principles and variations on the most common themes. The first structures, determined 20 years ago, were those of naturally abundant proteins with limited conformational variability, and each membrane protein structure determined was a major landmark. With the advent of complete genome sequences and efficient expression systems, there has been an explosion in the rate of membrane protein structure determination, with many classes represented. New structures are published every month and more than 150 unique membrane protein structures have been determined. This review analyses the reasons for this success, discusses the challenges that still lie ahead, and presents a concise summary of the key achievements with illustrated examples selected from each class. PMID:20667175

  5. BAYESIAN PROTEIN STRUCTURE ALIGNMENT.

    PubMed

    Rodriguez, Abel; Schmidler, Scott C

    The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.

  6. Objective identification of residue ranges for the superposition of protein structures

    PubMed Central

    2011-01-01

    Background The automation of objectively selecting amino acid residue ranges for structure superpositions is important for meaningful and consistent protein structure analyses. So far there is no widely-used standard for choosing these residue ranges for experimentally determined protein structures, where the manual selection of residue ranges or the use of suboptimal criteria remain commonplace. Results We present an automated and objective method for finding amino acid residue ranges for the superposition and analysis of protein structures, in particular for structure bundles resulting from NMR structure calculations. The method is implemented in an algorithm, CYRANGE, that yields, without protein-specific parameter adjustment, appropriate residue ranges in most commonly occurring situations, including low-precision structure bundles, multi-domain proteins, symmetric multimers, and protein complexes. Residue ranges are chosen to comprise as many residues of a protein domain that increasing their number would lead to a steep rise in the RMSD value. Residue ranges are determined by first clustering residues into domains based on the distance variance matrix, and then refining for each domain the initial choice of residues by excluding residues one by one until the relative decrease of the RMSD value becomes insignificant. A penalty for the opening of gaps favours contiguous residue ranges in order to obtain a result that is as simple as possible, but not simpler. Results are given for a set of 37 proteins and compared with those of commonly used protein structure validation packages. We also provide residue ranges for 6351 NMR structures in the Protein Data Bank. Conclusions The CYRANGE method is capable of automatically determining residue ranges for the superposition of protein structure bundles for a large variety of protein structures. The method correctly identifies ordered regions. Global structure superpositions based on the CYRANGE residue ranges allow a

  7. Structural domains and main-chain flexibility in prion proteins.

    PubMed

    Blinov, N; Berjanskii, M; Wishart, D S; Stepanova, M

    2009-02-24

    In this study we describe a novel approach to define structural domains and to characterize the local flexibility in both human and chicken prion proteins. The approach we use is based on a comprehensive theory of collective dynamics in proteins that was recently developed. This method determines the essential collective coordinates, which can be found from molecular dynamics trajectories via principal component analysis. Under this particular framework, we are able to identify the domains where atoms move coherently while at the same time to determine the local main-chain flexibility for each residue. We have verified this approach by comparing our results for the predicted dynamic domain systems with the computed main-chain flexibility profiles and the NMR-derived random coil indexes for human and chicken prion proteins. The three sets of data show excellent agreement. Additionally, we demonstrate that the dynamic domains calculated in this fashion provide a highly sensitive measure of protein collective structure and dynamics. Furthermore, such an analysis is capable of revealing structural and dynamic properties of proteins that are inaccessible to the conventional assessment of secondary structure. Using the collective dynamic simulation approach described here along with a high-temperature simulations of unfolding of human prion protein, we have explored whether locations of relatively low stability could be identified where the unfolding process could potentially be facilitated. According to our analysis, the locations of relatively low stability may be associated with the beta-sheet formed by strands S1 and S2 and the adjacent loops, whereas helix HC appears to be a relatively stable part of the protein. We suggest that this kind of structural analysis may provide a useful background for a more quantitative assessment of potential routes of spontaneous misfolding in prion proteins.

  8. Automatic classification of protein structures relying on similarities between alignments

    PubMed Central

    2012-01-01

    Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard

  9. Identify High-Quality Protein Structural Models by Enhanced K-Means.

    PubMed

    Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang

    2017-01-01

    Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.

  10. Identify High-Quality Protein Structural Models by Enhanced K-Means

    PubMed Central

    Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang

    2017-01-01

    Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198

  11. Modularity in protein structures: study on all-alpha proteins.

    PubMed

    Khan, Taushif; Ghosh, Indira

    2015-01-01

    Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.

  12. Extraction, integration and analysis of alternative splicing and protein structure distributed information

    PubMed Central

    D'Antonio, Matteo; Masseroli, Marco

    2009-01-01

    Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075

  13. Protein enriched pasta: structure and digestibility of its protein network.

    PubMed

    Laleg, Karima; Barron, Cécile; Santé-Lhoutellier, Véronique; Walrand, Stéphane; Micard, Valérie

    2016-02-01

    Wheat (W) pasta was enriched in 6% gluten (G), 35% faba (F) or 5% egg (E) to increase its protein content (13% to 17%). The impact of the enrichment on the multiscale structure of the pasta and on in vitro protein digestibility was studied. Increasing the protein content (W- vs. G-pasta) strengthened pasta structure at molecular and macroscopic scales but reduced its protein digestibility by 3% by forming a higher covalently linked protein network. Greater changes in the macroscopic and molecular structure of the pasta were obtained by varying the nature of protein used for enrichment. Proteins in G- and E-pasta were highly covalently linked (28-32%) resulting in a strong pasta structure. Conversely, F-protein (98% SDS-soluble) altered the pasta structure by diluting gluten and formed a weak protein network (18% covalent link). As a result, protein digestibility in F-pasta was significantly higher (46%) than in E- (44%) and G-pasta (39%). The effect of low (55 °C, LT) vs. very high temperature (90 °C, VHT) drying on the protein network structure and digestibility was shown to cause greater molecular changes than pasta formulation. Whatever the pasta, a general strengthening of its structure, a 33% to 47% increase in covalently linked proteins and a higher β-sheet structure were observed. However, these structural differences were evened out after the pasta was cooked, resulting in identical protein digestibility in LT and VHT pasta. Even after VHT drying, F-pasta had the best amino acid profile with the highest protein digestibility, proof of its nutritional interest.

  14. Structure-based barcoding of proteins.

    PubMed

    Metri, Rahul; Jerath, Gaurav; Kailas, Govind; Gacche, Nitin; Pal, Adityabarna; Ramakrishnan, Vibin

    2014-01-01

    A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/. © 2013 The Protein Society.

  15. Taking advantage of local structure descriptors to analyze interresidue contacts in protein structures and protein complexes.

    PubMed

    Martin, Juliette; Regad, Leslie; Etchebest, Catherine; Camproux, Anne-Claude

    2008-11-15

    Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Voronoï tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.

  16. Predicting PDZ domain mediated protein interactions from structure

    PubMed Central

    2013-01-01

    Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on

  17. MolTalk – a programming library for protein structures and structure analysis

    PubMed Central

    Diemand, Alexander V; Scheib, Holger

    2004-01-01

    Background Two of the mostly unsolved but increasingly urgent problems for modern biologists are a) to quickly and easily analyse protein structures and b) to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB). Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn. Results We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code. Conclusion MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications: 1) To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot. 2) To quickly retrieve information for (a limited

  18. Template-based structure modeling of protein-protein interactions

    PubMed Central

    Szilagyi, Andras; Zhang, Yang

    2014-01-01

    The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the proteinprotein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement. PMID:24721449

  19. Mapping monomeric threading to protein-protein structure prediction.

    PubMed

    Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang

    2013-03-25

    The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD <2.5 Å by SPRING is 134% and 167% higher than these competing methods. SPRING is controlled with ZDOCK on 77 docking benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.

  20. Graphene as a protein crystal mounting material to reduce background scatter.

    PubMed

    Wierman, Jennifer L; Alden, Jonathan S; Kim, Chae Un; McEuen, Paul L; Gruner, Sol M

    2013-10-01

    The overall signal-to-noise ratio per unit dose for X-ray diffraction data from protein crystals can be improved by reducing the mass and density of all material surrounding the crystals. This article demonstrates a path towards the practical ultimate in background reduction by use of atomically thin graphene sheets as a crystal mounting platform for protein crystals. The results show the potential for graphene in protein crystallography and other cases where X-ray scatter from the mounting material must be reduced and specimen dehydration prevented, such as in coherent X-ray diffraction imaging of microscopic objects.

  1. Graphene as a protein crystal mounting material to reduce background scatter

    PubMed Central

    Wierman, Jennifer L.; Alden, Jonathan S.; Kim, Chae Un; McEuen, Paul L.; Gruner, Sol M.

    2013-01-01

    The overall signal-to-noise ratio per unit dose for X-ray diffraction data from protein crystals can be improved by reducing the mass and density of all material surrounding the crystals. This article demonstrates a path towards the practical ultimate in background reduction by use of atomically thin graphene sheets as a crystal mounting platform for protein crystals. The results show the potential for graphene in protein crystallography and other cases where X-ray scatter from the mounting material must be reduced and specimen dehydration prevented, such as in coherent X-ray diffraction imaging of microscopic objects. PMID:24068843

  2. Recent developments in structural proteomics for protein structure determination.

    PubMed

    Liu, Hsuan-Liang; Hsu, Jyh-Ping

    2005-05-01

    The major challenges in structural proteomics include identifying all the proteins on the genome-wide scale, determining their structure-function relationships, and outlining the precise three-dimensional structures of the proteins. Protein structures are typically determined by experimental approaches such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. However, the knowledge of three-dimensional space by these techniques is still limited. Thus, computational methods such as comparative and de novo approaches and molecular dynamic simulations are intensively used as alternative tools to predict the three-dimensional structures and dynamic behavior of proteins. This review summarizes recent developments in structural proteomics for protein structure determination; including instrumental methods such as X-ray crystallography and NMR spectroscopy, and computational methods such as comparative and de novo structure prediction and molecular dynamics simulations.

  3. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  4. Structural Genomics of Protein Phosphatases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Almo,S.; Bonanno, J.; Sauder, J.

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptionalmore » regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.« less

  5. Modular architecture of protein structures and allosteric communications: potential implications for signaling proteins and regulatory linkages

    PubMed Central

    del Sol, Antonio; Araúzo-Bravo, Marcos J; Amoros, Dolors; Nussinov, Ruth

    2007-01-01

    Background Allosteric communications are vital for cellular signaling. Here we explore a relationship between protein architectural organization and shortcuts in signaling pathways. Results We show that protein domains consist of modules interconnected by residues that mediate signaling through the shortest pathways. These mediating residues tend to be located at the inter-modular boundaries, which are more rigid and display a larger number of long-range interactions than intra-modular regions. The inter-modular boundaries contain most of the residues centrally conserved in the protein fold, which may be crucial for information transfer between amino acids. Our approach to modular decomposition relies on a representation of protein structures as residue-interacting networks, and removal of the most central residue contacts, which are assumed to be crucial for allosteric communications. The modular decomposition of 100 multi-domain protein structures indicates that modules constitute the building blocks of domains. The analysis of 13 allosteric proteins revealed that modules characterize experimentally identified functional regions. Based on the study of an additional functionally annotated dataset of 115 proteins, we propose that high-modularity modules include functional sites and are the basic functional units. We provide examples (the Gαs subunit and P450 cytochromes) to illustrate that the modular architecture of active sites is linked to their functional specialization. Conclusion Our method decomposes protein structures into modules, allowing the study of signal transmission between functional sites. A modular configuration might be advantageous: it allows signaling proteins to expand their regulatory linkages and may elicit a broader range of control mechanisms either via modular combinations or through modulation of inter-modular linkages. PMID:17531094

  6. Lessons from making the Structural Classification of Proteins (SCOP) and their implications for protein structure modelling.

    PubMed

    Andreeva, Antonina

    2016-06-15

    The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.

  7. Mining protein loops using a structural alphabet and statistical exceptionality

    PubMed Central

    2010-01-01

    Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a

  8. SDSL-ESR-based protein structure characterization.

    PubMed

    Strancar, Janez; Kavalenka, Aleh; Urbancic, Iztok; Ljubetic, Ajasja; Hemminga, Marcus A

    2010-03-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be seen in the number of protein structures published in the Protein Data Bank. This is especially the case for less-ordered, more hydrophobic and more flexible protein systems. The lack of efficient methods for structure determination calls for urgent development of a new class of biophysical techniques. This work attempts to address this problem with a novel combination of site-directed spin labelling electron spin resonance spectroscopy (SDSL-ESR) and protein structure modelling, which is coupled by restriction of the conformational spaces of the amino acid side chains. Comparison of the application to four different protein systems enables us to generalize the new method and to establish a general procedure for determination of protein structure.

  9. Structure and non-structure of centrosomal proteins.

    PubMed

    Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis

    2013-01-01

    Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.

  10. Significance of structural changes in proteins: expected errors in refined protein structures.

    PubMed Central

    Stroud, R. M.; Fauman, E. B.

    1995-01-01

    A quantitative expression key to evaluating significant structural differences or induced shifts between any two protein structures is derived. Because crystallography leads to reports of a single (or sometimes dual) position for each atom, the significance of any structural change based on comparison of two structures depends critically on knowing the expected precision of each median atomic position reported, and on extracting it for each atom, from the information provided in the Protein Data Bank and in the publication. The differences between structures of protein molecules that should be identical, and that are normally distributed, indicating that they are not affected by crystal contacts, were analyzed with respect to many potential indicators of structure precision, so as to extract, essentially by "machine learning" principles, a generally applicable expression involving the highest correlates. Eighteen refined crystal structures from the Protein Data Bank, in which there are multiple molecules in the crystallographic asymmetric unit, were selected and compared. The thermal B factor, the connectivity of the atom, and the ratio of the number of reflections to the number of atoms used in refinement correlate best with the magnitude of the positional differences between regions of the structures that otherwise would be expected to be the same. These results are embodied in a six-parameter equation that can be applied to any crystallographically refined structure to estimate the expected uncertainty in position of each atom. Structure change in a macromolecule can thus be referenced to the expected uncertainty in atomic position as reflected in the variance between otherwise identical structures with the observed values of correlated parameters. PMID:8563637

  11. Advances in Homology Protein Structure Modeling

    PubMed Central

    Xiang, Zhexin

    2007-01-01

    Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function. PMID:16787261

  12. 3D Protein structure prediction with genetic tabu search algorithm

    PubMed Central

    2010-01-01

    Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill

  13. Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative.

    PubMed

    Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras

    2014-03-11

    The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.

  14. FPGA accelerator for protein secondary structure prediction based on the GOR algorithm

    PubMed Central

    2011-01-01

    Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582

  15. PDBFlex: exploring flexibility in protein structures

    PubMed Central

    Hrabe, Thomas; Li, Zhanwen; Sedova, Mayya; Rotkiewicz, Piotr; Jaroszewski, Lukasz; Godzik, Adam

    2016-01-01

    The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software. PMID:26615193

  16. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

    PubMed Central

    2012-01-01

    Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function

  17. Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative

    PubMed Central

    Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C.; Fiser, Andras

    2014-01-01

    The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins—including proteins for which reliable homology models can be generated—on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long. PMID:24567391

  18. Building protein-protein interaction networks for Leishmania species through protein structural information.

    PubMed

    Dos Santos Vasconcelos, Crhisllane Rafaele; de Lima Campos, Túlio; Rezende, Antonio Mauro

    2018-03-06

    Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported.

  19. Quantification of the Influence of Protein-Protein Interactions on Adsorbed Protein Structure and Bioactivity

    PubMed Central

    Wei, Yang; Thyparambil, Aby A.; Latour, Robert A.

    2013-01-01

    While protein-surface interactions have been widely studied, relatively little is understood at this time regarding how protein-surface interaction effects are influenced by protein-protein interactions and how these effects combine with the internal stability of a protein to influence its adsorbed-state structure and bioactivity. The objectives of this study were to develop a method to study these combined effects under widely varying protein-protein interaction conditions using hen egg-white lysozyme (HEWL) adsorbed on silica glass, poly(methyl methacrylate), and polyethylene as our model systems. In order to vary protein-protein interaction effects over a wide range, HEWL was first adsorbed to each surface type under widely varying protein solution concentrations for 2 h to saturate the surface, followed by immersion in pure buffer solution for 15 h to equilibrate the adsorbed protein layers in the absence of additionally adsorbing protein. Periodic measurements were made at selected time points of the areal density of the adsorbed protein layer as an indicator of the level of protein-protein interaction effects within the layer, and these values were then correlated with measurements of the adsorbed protein’s secondary structure and bioactivity. The results from these studies indicate that protein-protein interaction effects help stabilize the structure of HEWL adsorbed on silica glass, have little influence on the structural behavior of HEWL on HDPE, and actually serve to destabilize HEWL’s structure on PMMA. The bioactivity of HEWL on silica glass and HDPE was found to decrease in direct proportion to the degree of adsorption-induce protein unfolding. A direct correlation between bioactivity and the conformational state of adsorbed HEWL was less apparent on PMMA, thus suggesting that other factors influenced HEWL’s bioactivity on this surface, such as the accessibility of HEWL’s bioactive site being blocked by neighboring proteins or the surface

  20. Mixture models for protein structure ensembles.

    PubMed

    Hirsch, Michael; Habeck, Michael

    2008-10-01

    Protein structure ensembles provide important insight into the dynamics and function of a protein and contain information that is not captured with a single static structure. However, it is not clear a priori to what extent the variability within an ensemble is caused by internal structural changes. Additional variability results from overall translations and rotations of the molecule. And most experimental data do not provide information to relate the structures to a common reference frame. To report meaningful values of intrinsic dynamics, structural precision, conformational entropy, etc., it is therefore important to disentangle local from global conformational heterogeneity. We consider the task of disentangling local from global heterogeneity as an inference problem. We use probabilistic methods to infer from the protein ensemble missing information on reference frames and stable conformational sub-states. To this end, we model a protein ensemble as a mixture of Gaussian probability distributions of either entire conformations or structural segments. We learn these models from a protein ensemble using the expectation-maximization algorithm. Our first model can be used to find multiple conformers in a structure ensemble. The second model partitions the protein chain into locally stable structural segments or core elements and less structured regions typically found in loops. Both models are simple to implement and contain only a single free parameter: the number of conformers or structural segments. Our models can be used to analyse experimental ensembles, molecular dynamics trajectories and conformational change in proteins. The Python source code for protein ensemble analysis is available from the authors upon request.

  1. SSEP: secondary structural elements of proteins

    PubMed Central

    Shanthi, V.; Selvarani, P.; Kiran Kumar, Ch.; Mohire, C. S.; Sekar, K.

    2003-01-01

    SSEP is a comprehensive resource for accessing information related to the secondary structural elements present in the 25 and 90% non-redundant protein chains. The database contains 1771 protein chains from 1670 protein structures and 6182 protein chains from 5425 protein structures in 25 and 90% non-redundant protein chains, respectively. The current version provides information about the α-helical segments and β-strand fragments of varying lengths. In addition, it also contains the information about 310-helix, β- and ν-turns and hairpin loops. The free graphics program RASMOL has been interfaced with the search engine to visualize the three-dimensional structures of the user queried secondary structural fragment. The database is updated regularly and is available through Bioinformatics web server at http://cluster.physics.iisc.ernet.in/ssep/ or http://144.16.71.148/ssep/. PMID:12824336

  2. Protein Structure Determination using Metagenome sequence data

    PubMed Central

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David

    2017-01-01

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891

  3. Structural anatomy of telomere OB proteins.

    PubMed

    Horvath, Martin P

    2011-10-01

    Telomere DNA-binding proteins protect the ends of chromosomes in eukaryotes. A subset of these proteins are constructed with one or more OB folds and bind with G+T-rich single-stranded DNA found at the extreme termini. The resulting DNA-OB protein complex interacts with other telomere components to coordinate critical telomere functions of DNA protection and DNA synthesis. While the first crystal and NMR structures readily explained protection of telomere ends, the picture of how single-stranded DNA becomes available to serve as primer and template for synthesis of new telomere DNA is only recently coming into focus. New structures of telomere OB fold proteins alongside insights from genetic and biochemical experiments have made significant contributions towards understanding how protein-binding OB proteins collaborate with DNA-binding OB proteins to recruit telomerase and DNA polymerase for telomere homeostasis. This review surveys telomere OB protein structures alongside highly comparable structures derived from replication protein A (RPA) components, with the goal of providing a molecular context for understanding telomere OB protein evolution and mechanism of action in protection and synthesis of telomere DNA.

  4. An Interactive Introduction to Protein Structure

    ERIC Educational Resources Information Center

    Lee, W. Theodore

    2004-01-01

    To improve student understanding of protein structure and the significance of noncovalent interactions in protein structure and function, students are assigned a project to write a paper complemented with computer-generated images. The assignment provides an opportunity for students to select a protein structure that is of interest and detail…

  5. Ensemble-based evaluation for protein structure models.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2016-06-15

    Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts' intuitive assessment of computational models and provides information of practical usefulness of models. https://bitbucket.org/mjamroz/flexscore dkihara@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  6. Ensemble-based evaluation for protein structure models

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2016-01-01

    Motivation: Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. Results: We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts’ intuitive assessment of computational models and provides information of practical usefulness of models. Availability and implementation: https://bitbucket.org/mjamroz/flexscore Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307633

  7. Beta-structures in fibrous proteins.

    PubMed

    Kajava, Andrey V; Squire, John M; Parry, David A D

    2006-01-01

    The beta-form of protein folding, one of the earliest protein structures to be defined, was originally observed in studies of silks. It was then seen in early studies of synthetic polypeptides and, of course, is now known to be present in a variety of guises as an essential component of globular protein structures. However, in the last decade or so it has become clear that the beta-conformation of chains is present not only in many of the amyloid structures associated with, for example, Alzheimer's Disease, but also in the prion structures associated with the spongiform encephalopathies. Furthermore, X-ray crystallography studies have revealed the high incidence of the beta-fibrous proteins among virulence factors of pathogenic bacteria and viruses. Here we describe the basic forms of the beta-fold, summarize the many different new forms of beta-structural fibrous arrangements that have been discovered, and review advances in structural studies of amyloid and prion fibrils. These and other issues are described in detail in later chapters.

  8. Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

    PubMed Central

    2010-01-01

    Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental

  9. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    PubMed Central

    2011-01-01

    Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388

  10. The interface of protein structure, protein biophysics, and molecular evolution

    PubMed Central

    Liberles, David A; Teichmann, Sarah A; Bahar, Ivet; Bastolla, Ugo; Bloom, Jesse; Bornberg-Bauer, Erich; Colwell, Lucy J; de Koning, A P Jason; Dokholyan, Nikolay V; Echave, Julian; Elofsson, Arne; Gerloff, Dietlind L; Goldstein, Richard A; Grahnen, Johan A; Holder, Mark T; Lakner, Clemens; Lartillot, Nicholas; Lovell, Simon C; Naylor, Gavin; Perica, Tina; Pollock, David D; Pupko, Tal; Regan, Lynne; Roger, Andrew; Rubinstein, Nimrod; Shakhnovich, Eugene; Sjölander, Kimmen; Sunyaev, Shamil; Teufel, Ashley I; Thorne, Jeffrey L; Thornton, Joseph W; Weinreich, Daniel M; Whelan, Simon

    2012-01-01

    Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction. PMID:22528593

  11. Comparative Protein Structure Modeling Using MODELLER.

    PubMed

    Webb, Benjamin; Sali, Andrej

    2014-09-08

    Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. Copyright © 2014 John Wiley & Sons, Inc.

  12. Structural anatomy of telomere OB proteins

    PubMed Central

    Horvath, Martin P.

    2015-01-01

    Telomere DNA-binding proteins protect the ends of chromosomes in eukaryotes. A subset of these proteins are constructed with one or more OB folds and bind with G+T-rich single-stranded DNA found at the extreme termini. The resulting DNA-OB protein complex interacts with other telomere components to coordinate critical telomere functions of DNA protection and DNA synthesis. While the first crystal and NMR structures readily explained protection of telomere ends, the picture of how single-stranded DNA becomes available to serve as primer and template for synthesis of new telomere DNA is only recently coming into focus. New structures of telomere OB fold proteins alongside insights from genetic and biochemical experiments have made significant contributions towards understanding how protein-binding OB proteins collaborate with DNA-binding OB proteins to recruit telomerase and DNA polymerase for telomere homeostasis. This review surveys telomere OB protein structures alongside highly comparable structures derived from replication protein A (RPA) components, with the goal of providing a molecular context for understanding telomere OB protein evolution and mechanism of action in protection and synthesis of telomere DNA. PMID:21950380

  13. Protein Molecular Structures, Protein SubFractions, and Protein Availability Affected by Heat Processing: A Review

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu,P.

    2007-01-01

    The utilization and availability of protein depended on the types of protein and their specific susceptibility to enzymatic hydrolysis (inhibitory activities) in the gastrointestine and was highly associated with protein molecular structures. Studying internal protein structure and protein subfraction profiles leaded to an understanding of the components that make up a whole protein. An understanding of the molecular structure of the whole protein was often vital to understanding its digestive behavior and nutritive value in animals. In this review, recently obtained information on protein molecular structural effects of heat processing was reviewed, in relation to protein characteristics affecting digestive behaviormore » and nutrient utilization and availability. The emphasis of this review was on (1) using the newly advanced synchrotron technology (S-FTIR) as a novel approach to reveal protein molecular chemistry affected by heat processing within intact plant tissues; (2) revealing the effects of heat processing on the profile changes of protein subfractions associated with digestive behaviors and kinetics manipulated by heat processing; (3) prediction of the changes of protein availability and supply after heat processing, using the advanced DVE/OEB and NRC-2001 models, and (4) obtaining information on optimal processing conditions of protein as intestinal protein source to achieve target values for potential high net absorbable protein in the small intestine. The information described in this article may give better insight in the mechanisms involved and the intrinsic protein molecular structural changes occurring upon processing.« less

  14. Fourier-based classification of protein secondary structures.

    PubMed

    Shu, Jian-Jun; Yong, Kian Yan

    2017-04-15

    The correct prediction of protein secondary structures is one of the key issues in predicting the correct protein folded shape, which is used for determining gene function. Existing methods make use of amino acids properties as indices to classify protein secondary structures, but are faced with a significant number of misclassifications. The paper presents a technique for the classification of protein secondary structures based on protein "signal-plotting" and the use of the Fourier technique for digital signal processing. New indices are proposed to classify protein secondary structures by analyzing hydrophobicity profiles. The approach is simple and straightforward. Results show that the more types of protein secondary structures can be classified by means of these newly-proposed indices. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Structure based alignment and clustering of proteins (STRALCP)

    DOEpatents

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  16. Reduction of background clutter in structured lighting systems

    DOEpatents

    Carlson, Jeffrey J.; Giles, Michael K.; Padilla, Denise D.; Davidson, Jr., Patrick A.; Novick, David K.; Wilson, Christopher W.

    2010-06-22

    Methods for segmenting the reflected light of an illumination source having a characteristic wavelength from background illumination (i.e. clutter) in structured lighting systems can comprise pulsing the light source used to illuminate a scene, pulsing the light source synchronously with the opening of a shutter in an imaging device, estimating the contribution of background clutter by interpolation of images of the scene collected at multiple spectral bands not including the characteristic wavelength and subtracting the estimated background contribution from an image of the scene comprising the wavelength of the light source and, placing a polarizing filter between the imaging device and the scene, where the illumination source can be polarized in the same orientation as the polarizing filter. Apparatus for segmenting the light of an illumination source from background illumination can comprise an illuminator, an image receiver for receiving images of multiple spectral bands, a processor for calculations and interpolations, and a polarizing filter.

  17. Efficient protein structure search using indexing methods

    PubMed Central

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543

  18. Efficient protein structure search using indexing methods.

    PubMed

    Kim, Sungchul; Sael, Lee; Yu, Hwanjo

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.

  19. A 'periodic table' for protein structures.

    PubMed

    Taylor, William R

    2002-04-11

    Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.

  20. Protein structure recognition: From eigenvector analysis to structural threading method

    NASA Astrophysics Data System (ADS)

    Cao, Haibo

    In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  1. TAP score: torsion angle propensity normalization applied to local protein structure evaluation

    PubMed Central

    Tosatto, Silvio CE; Battistutta, Roberto

    2007-01-01

    Background Experimentally determined protein structures may contain errors and require validation. Conformational criteria based on the Ramachandran plot are mainly used to distinguish between distorted and adequately refined models. While the readily available criteria are sufficient to detect totally wrong structures, establishing the more subtle differences between plausible structures remains more challenging. Results A new criterion, called TAP score, measuring local sequence to structure fitness based on torsion angle propensities normalized against the global minimum and maximum is introduced. It is shown to be more accurate than previous methods at estimating the validity of a protein model in terms of commonly used experimental quality parameters on two test sets representing the full PDB database and a subset of obsolete PDB structures. Highly selective TAP thresholds are derived to recognize over 90% of the top experimental structures in the absence of experimental information. Both a web server and an executable version of the TAP score are available at . Conclusion A novel procedure for energy normalization (TAP) has significantly improved the possibility to recognize the best experimental structures. It will allow the user to more reliably isolate problematic structures in the context of automated experimental structure determination. PMID:17504537

  2. Structures composing protein domains.

    PubMed

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel; Hudeček, Jiří

    2013-08-01

    This review summarizes available data concerning intradomain structures (IS) such as functionally important amino acid residues, short linear motifs, conserved or disordered regions, peptide repeats, broadly occurring secondary structures or folds, etc. IS form structural features (units or elements) necessary for interactions with proteins or non-peptidic ligands, enzyme reactions and some structural properties of proteins. These features have often been related to a single structural level (e.g. primary structure) mostly requiring certain structural context of other levels (e.g. secondary structures or supersecondary folds) as follows also from some examples reported or demonstrated here. In addition, we deal with some functionally important dynamic properties of IS (e.g. flexibility and different forms of accessibility), and more special dynamic changes of IS during enzyme reactions and allosteric regulation. Selected notes concern also some experimental methods, still more necessary tools of bioinformatic processing and clinically interesting relationships. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  3. Current strategies for protein production and purification enabling membrane protein structural biology.

    PubMed

    Pandey, Aditya; Shin, Kyungsoo; Patterson, Robin E; Liu, Xiang-Qin; Rainey, Jan K

    2016-12-01

    Membrane proteins are still heavily under-represented in the protein data bank (PDB), owing to multiple bottlenecks. The typical low abundance of membrane proteins in their natural hosts makes it necessary to overexpress these proteins either in heterologous systems or through in vitro translation/cell-free expression. Heterologous expression of proteins, in turn, leads to multiple obstacles, owing to the unpredictability of compatibility of the target protein for expression in a given host. The highly hydrophobic and (or) amphipathic nature of membrane proteins also leads to challenges in producing a homogeneous, stable, and pure sample for structural studies. Circumventing these hurdles has become possible through the introduction of novel protein production protocols; efficient protein isolation and sample preparation methods; and, improvement in hardware and software for structural characterization. Combined, these advances have made the past 10-15 years very exciting and eventful for the field of membrane protein structural biology, with an exponential growth in the number of solved membrane protein structures. In this review, we focus on both the advances and diversity of protein production and purification methods that have allowed this growth in structural knowledge of membrane proteins through X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM).

  4. Current strategies for protein production and purification enabling membrane protein structural biology

    PubMed Central

    Pandey, Aditya; Shin, Kyungsoo; Patterson, Robin E.; Liu, Xiang-Qin; Rainey, Jan K.

    2017-01-01

    Membrane proteins are still heavily underrepresented in the protein data bank (PDB) due to multiple bottlenecks. The typical low abundance of membrane proteins in their natural hosts makes it necessary to overexpress these proteins either in heterologous systems or through in vitro translation/cell-free expression. Heterologous expression of proteins, in turn, leads to multiple obstacles due to the unpredictability of compatibility of the target protein for expression in a given host. The highly hydrophobic and/or amphipathic nature of membrane proteins also leads to challenges in producing a homogeneous, stable, and pure sample for structural studies. Circumventing these hurdles has become possible through introduction of novel protein production protocols; efficient protein isolation and sample preparation methods; and, improvement in hardware and software for structural characterization. Combined, these advances have made the past 10–15 years very exciting and eventful for the field of membrane protein structural biology, with an exponential growth in the number of solved membrane protein structures. In this review, we focus on both the advances and diversity of protein production and purification methods that have allowed this growth in structural knowledge of membrane proteins through X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM). PMID:27010607

  5. Protein flexibility in the light of structural alphabets

    PubMed Central

    Craveur, Pierrick; Joseph, Agnel P.; Esque, Jeremy; Narwani, Tarun J.; Noël, Floriane; Shinada, Nicolas; Goguet, Matthieu; Leonard, Sylvain; Poulain, Pierre; Bertrand, Olivier; Faure, Guilhem; Rebehmed, Joseph; Ghozlane, Amine; Swapna, Lakshmipuram S.; Bhaskara, Ramachandra M.; Barnoud, Jonathan; Téletchéa, Stéphane; Jallu, Vincent; Cerny, Jiri; Schneider, Bohdan; Etchebest, Catherine; Srinivasan, Narayanaswamy; Gelly, Jean-Christophe; de Brevern, Alexandre G.

    2015-01-01

    Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases. PMID:26075209

  6. Protein structure similarity from Principle Component Correlation analysis.

    PubMed

    Zhou, Xiaobo; Chou, James; Wong, Stephen T C

    2006-01-25

    Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or

  7. Origins of structure in globular proteins.

    PubMed Central

    Chan, H S; Dill, K A

    1990-01-01

    The principal forces of protein folding--hydrophobicity and conformational entropy--are nonspecific. A long-standing puzzle has, therefore, been: What forces drive the formation of the specific internal architectures in globular proteins? We find that any self-avoiding flexible polymer molecule will develop large amounts of secondary structure, helices and parallel and antiparallel sheets, as it is driven to increasing compactness by any force of attraction among the chain monomers. Thus structure formation arises from the severity of steric constraints in compact polymers. This steric principle of organization can account for why short helices are stable in globular proteins, why there are parallel and anti-parallel sheets in proteins, and why weakly unfolded proteins have some secondary structure. On this basis, it should be possible to construct copolymers, not necessarily using amino acids, that can collapse to maximum compactness in incompatible solvents and that should then have structural organization resembling that of proteins. Images PMID:2385597

  8. In situ data collection and structure refinement from microcapillary protein crystallization

    PubMed Central

    Yadav, Maneesh K.; Gerdts, Cory J.; Sanishvili, Ruslan; Smith, Ward W.; Roach, L. Spencer; Ismagilov, Rustem F.; Kuhn, Peter; Stevens, Raymond C.

    2007-01-01

    In situ X-ray data collection has the potential to eliminate the challenging task of mounting and cryocooling often fragile protein crystals, reducing a major bottleneck in the structure determination process. An apparatus used to grow protein crystals in capillaries and to compare the background X-ray scattering of the components, including thin-walled glass capillaries against Teflon, and various fluorocarbon oils against each other, is described. Using thaumatin as a test case at 1.8 Å resolution, this study demonstrates that high-resolution electron density maps and refined models can be obtained from in situ diffraction of crystals grown in microcapillaries. PMID:17468785

  9. Gaia: automated quality assessment of protein structure models.

    PubMed

    Kota, Pradeep; Ding, Feng; Ramachandran, Srinivas; Dokholyan, Nikolay V

    2011-08-15

    Increasing use of structural modeling for understanding structure-function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures. In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement. We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures. dokh@unc.edu Supplementary data are available at Bioinformatics online.

  10. Protein family clustering for structural genomics.

    PubMed

    Yan, Yongpan; Moult, John

    2005-10-28

    A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.

  11. A structural-alphabet-based strategy for finding structural motifs across protein families

    PubMed Central

    Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay

    2010-01-01

    Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797

  12. Accounting for epistatic interactions improves the functional analysis of protein structures.

    PubMed

    Wilkins, Angela D; Venner, Eric; Marciano, David C; Erdin, Serkan; Atri, Benu; Lua, Rhonald C; Lichtarge, Olivier

    2013-11-01

    The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. lichtarge@bcm.edu. Supplementary data are available at Bioinformatics online.

  13. NMR-based automated protein structure determination.

    PubMed

    Würz, Julia M; Kazemi, Sina; Schmidt, Elena; Bagaria, Anurag; Güntert, Peter

    2017-08-15

    NMR spectra analysis for protein structure determination can now in many cases be performed by automated computational methods. This overview of the computational methods for NMR protein structure analysis presents recent automated methods for signal identification in multidimensional NMR spectra, sequence-specific resonance assignment, collection of conformational restraints, and structure calculation, as implemented in the CYANA software package. These algorithms are sufficiently reliable and integrated into one software package to enable the fully automated structure determination of proteins starting from NMR spectra without manual interventions or corrections at intermediate steps, with an accuracy of 1-2 Å backbone RMSD in comparison with manually solved reference structures. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Utilization of protein intrinsic disorder knowledge in structural proteomics

    PubMed Central

    Oldfield, Christopher J.; Xue, Bin; Van, Ya-Yue; Ulrich, Eldon L.; Markley, John L.; Dunker, A. Keith; Uversky, Vladimir N.

    2014-01-01

    Intrinsically disordered proteins (IDPs) and proteins with long disordered regions are highly abundant in various proteomes. Despite their lack of well-defined ordered structure, these proteins and regions are frequently involved in crucial biological processes. Although in recent years these proteins have attracted the attention of many researchers, IDPs represent a significant challenge for structural characterization since these proteins can impact many of the processes in the structure determination pipeline. Here we investigate the effects of IDPs on the structure determination process and the utility of disorder prediction in selecting and improving proteins for structural characterization. Examination of the extent of intrinsic disorder in existing crystal structures found that relatively few protein crystal structures contain extensive regions of intrinsic disorder. Although intrinsic disorder is not the only cause of crystallization failures and many structured proteins cannot be crystallized, filtering out highly disordered proteins from structure-determination target lists is still likely to be cost effective. Therefore it is desirable to avoid highly disordered proteins from structure-determination target lists and we show that disorder prediction can be applied effectively to enrich structure determination pipelines with proteins more likely to yield crystal structures. For structural investigation of specific proteins, disorder prediction can be used to improve targets for structure determination. Finally, a framework for considering intrinsic disorder in the structure determination pipeline is proposed. PMID:23232152

  15. Resource for structure related information on transmembrane proteins

    NASA Astrophysics Data System (ADS)

    Tusnády, Gábor E.; Simon, István

    Transmembrane proteins are involved in a wide variety of vital biological processes including transport of water-soluble molecules, flow of information and energy production. Despite significant efforts to determine the structures of these proteins, only a few thousand solved structures are known so far. Here, we review the various resources for structure-related information on these types of proteins ranging from the 3D structure to the topology and from the up-to-date databases to the various Internet sites and servers dealing with structure prediction and structure analysis. Abbreviations: 3D, three dimensional; PDB, Protein Data Bank; TMP, transmembrane protein.

  16. Overcoming barriers to membrane protein structure determination.

    PubMed

    Bill, Roslyn M; Henderson, Peter J F; Iwata, So; Kunji, Edmund R S; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G; Vogel, Horst

    2011-04-01

    After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new approaches to solving membrane protein structures based on recent technological advances. Rational approaches to overcoming the bottlenecks in the field are urgently required as membrane proteins, which typically comprise ~30% of the proteomes of organisms, are dramatically under-represented in the structural database of the Protein Data Bank.

  17. Biophysical and structural considerations for protein sequence evolution

    PubMed Central

    2011-01-01

    Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites. Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model. PMID:22171550

  18. Quality assessment of protein model-structures based on structural and functional similarities.

    PubMed

    Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata

    2012-09-21

    Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and

  19. An ambiguity principle for assigning protein structural domains.

    PubMed

    Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe

    2017-01-01

    Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object-in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our "multipartitioning" approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules.

  20. An ambiguity principle for assigning protein structural domains

    PubMed Central

    Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe

    2017-01-01

    Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object—in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our “multipartitioning” approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules. PMID:28097215

  1. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  2. Structural determination of intact proteins using mass spectrometry

    DOEpatents

    Kruppa, Gary [San Francisco, CA; Schoeniger, Joseph S [Oakland, CA; Young, Malin M [Livermore, CA

    2008-05-06

    The present invention relates to novel methods of determining the sequence and structure of proteins. Specifically, the present invention allows for the analysis of intact proteins within a mass spectrometer. Therefore, preparatory separations need not be performed prior to introducing a protein sample into the mass spectrometer. Also disclosed herein are new instrumental developments for enhancing the signal from the desired modified proteins, methods for producing controlled protein fragments in the mass spectrometer, eliminating complex microseparations, and protein preparatory chemical steps necessary for cross-linking based protein structure determination.Additionally, the preferred method of the present invention involves the determination of protein structures utilizing a top-down analysis of protein structures to search for covalent modifications. In the preferred method, intact proteins are ionized and fragmented within the mass spectrometer.

  3. SCOWLP classification: Structural comparison and analysis of protein binding regions

    PubMed Central

    Teyra, Joan; Paszkowski-Rogacz, Maciej; Anders, Gerd; Pisabarro, M Teresa

    2008-01-01

    Background Detailed information about protein interactions is critical for our understanding of the principles governing protein recognition mechanisms. The structures of many proteins have been experimentally determined in complex with different ligands bound either in the same or different binding regions. Thus, the structural interactome requires the development of tools to classify protein binding regions. A proper classification may provide a general view of the regions that a protein uses to bind others and also facilitate a detailed comparative analysis of the interacting information for specific protein binding regions at atomic level. Such classification might be of potential use for deciphering protein interaction networks, understanding protein function, rational engineering and design. Description Protein binding regions (PBRs) might be ideally described as well-defined separated regions that share no interacting residues one another. However, PBRs are often irregular, discontinuous and can share a wide range of interacting residues among them. The criteria to define an individual binding region can be often arbitrary and may differ from other binding regions within a protein family. Therefore, the rational behind protein interface classification should aim to fulfil the requirements of the analysis to be performed. We extract detailed interaction information of protein domains, peptides and interfacial solvent from the SCOWLP database and we classify the PBRs of each domain family. For this purpose, we define a similarity index based on the overlapping of interacting residues mapped in pair-wise structural alignments. We perform our classification with agglomerative hierarchical clustering using the complete-linkage method. Our classification is calculated at different similarity cut-offs to allow flexibility in the analysis of PBRs, feature especially interesting for those protein families with conflictive binding regions. The hierarchical

  4. Proteins as sponges: a statistical journey along protein structure organization principles.

    PubMed

    Paola, Luisa Di; Paci, Paola; Santoni, Daniele; Ruvo, Micol De; Giuliani, Alessandro

    2012-02-27

    The analysis of a large database of protein structures by means of topological and shape indexes inspired by complex network and fractal analysis shed light on some organizational principles of proteins. Proteins appear much more similar to "fractal" sponges than to closely packed spheres, casting doubts on the tenability of the hydrophobic core concept. Principal component analysis highlighted three main order parameters shaping the protein universe: (1) "size", with the consequent generation of progressively less dense and more empty structures at an increasing number of residues, (2) "microscopic structuring", linked to the existence of a spectrum going from the prevalence of heterologous (different hydrophobicity) to the prevalence of homologous (similar hydrophobicity) contacts, and (3) "fractal shape", an organizing protein data set along a continuum going from approximately linear to very intermingled structures. Perhaps the time has come for seriously taking into consideration the real relevance of time-honored principles like the hydrophobic core and hydrophobic effect.

  5. A simple and fast heuristic for protein structure comparison

    PubMed Central

    Pelta, David A; González, Juan R; Moreno Vega, Marcos

    2008-01-01

    Background Protein structure comparison is a key problem in bioinformatics. There exist several methods for doing protein comparison, being the solution of the Maximum Contact Map Overlap problem (MAX-CMO) one of the alternatives available. Although this problem may be solved using exact algorithms, researchers require approximate algorithms that obtain good quality solutions using less computational resources than the formers. Results We propose a variable neighborhood search metaheuristic for solving MAX-CMO. We analyze this strategy in two aspects: 1) from an optimization point of view the strategy is tested on two different datasets, obtaining an error of 3.5%(over 2702 pairs) and 1.7% (over 161 pairs) with respect to optimal values; thus leading to high accurate solutions in a simpler and less expensive way than exact algorithms; 2) in terms of protein structure classification, we conduct experiments on three datasets and show that is feasible to detect structural similarities at SCOP's family and CATH's architecture levels using normalized overlap values. Some limitations and the role of normalization are outlined for doing classification at SCOP's fold level. Conclusion We designed, implemented and tested.a new tool for solving MAX-CMO, based on a well-known metaheuristic technique. The good balance between solution's quality and computational effort makes it a valuable tool. Moreover, to the best of our knowledge, this is the first time the MAX-CMO measure is tested at SCOP's fold and CATH's architecture levels with encouraging results. Software is available for download at . PMID:18366735

  6. Use of designed sequences in protein structure recognition.

    PubMed

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  7. Rift Valley fever virus structural and non-structural proteins: Recombinant protein expression and immunoreactivity against antisera from sheep

    USDA-ARS?s Scientific Manuscript database

    The Rift Valley fever virus (RVFV) encodes structural proteins, nucleoprotein (N), N-terminus glycoprotein (Gn), C-terminus glycoprotein (Gc) and L protein, 78-kDa and non-structural proteins NSm and NSs. Using the baculovirus system we expressed the full-length coding sequence of N, NSs, NSm, Gc an...

  8. Introduction to Protein Structure through Genetic Diseases

    ERIC Educational Resources Information Center

    Schneider, Tanya L.; Linton, Brian R.

    2008-01-01

    An illuminating way to learn about protein function is to explore high-resolution protein structures. Analysis of the proteins involved in genetic diseases has been used to introduce students to protein structure and the role that individual mutations can play in the onset of disease. Known mutations can be correlated to changes in protein…

  9. Accounting for epistatic interactions improves the functional analysis of protein structures

    PubMed Central

    Wilkins, Angela D.; Venner, Eric; Marciano, David C.; Erdin, Serkan; Atri, Benu; Lua, Rhonald C.; Lichtarge, Olivier

    2013-01-01

    Motivation: The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. Methods and Results: We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Conclusions: Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. Contact: lichtarge@bcm.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24021383

  10. mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

    PubMed

    Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

    2018-05-21

    With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.

  11. The Protein Structure Initiative Structural Biology Knowledgebase Technology Portal: a structural biology web resource.

    PubMed

    Gifford, Lida K; Carter, Lester G; Gabanyi, Margaret J; Berman, Helen M; Adams, Paul D

    2012-06-01

    The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/ ) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB ( http://sbkb.org ), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a functional sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.

  12. Local backbone structure prediction of proteins

    PubMed Central

    De Brevern, Alexandre G.; Benros, Cristina; Gautier, Romain; Valadié, Hélène; Hazout, Serge; Etchebest, Catherine

    2004-01-01

    Summary A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (φ, Ψ) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288

  13. How Structure Defines Affinity in Protein-Protein Interactions

    PubMed Central

    Erijman, Ariel; Rosenthal, Eran; Shifman, Julia M.

    2014-01-01

    Protein-protein interactions (PPI) in nature are conveyed by a multitude of binding modes involving various surfaces, secondary structure elements and intermolecular interactions. This diversity results in PPI binding affinities that span more than nine orders of magnitude. Several early studies attempted to correlate PPI binding affinities to various structure-derived features with limited success. The growing number of high-resolution structures, the appearance of more precise methods for measuring binding affinities and the development of new computational algorithms enable more thorough investigations in this direction. Here, we use a large dataset of PPI structures with the documented binding affinities to calculate a number of structure-based features that could potentially define binding energetics. We explore how well each calculated biophysical feature alone correlates with binding affinity and determine the features that could be used to distinguish between high-, medium- and low- affinity PPIs. Furthermore, we test how various combinations of features could be applied to predict binding affinity and observe a slow improvement in correlation as more features are incorporated into the equation. In addition, we observe a considerable improvement in predictions if we exclude from our analysis low-resolution and NMR structures, revealing the importance of capturing exact intermolecular interactions in our calculations. Our analysis should facilitate prediction of new interactions on the genome scale, better characterization of signaling networks and design of novel binding partners for various target proteins. PMID:25329579

  14. Improved cryoEM-Guided Iterative Molecular Dynamics–Rosetta Protein Structure Refinement Protocol for High Precision Protein Structure Prediction

    PubMed Central

    2016-01-01

    Many excellent methods exist that incorporate cryo-electron microscopy (cryoEM) data to constrain computational protein structure prediction and refinement. Previously, it was shown that iteration of two such orthogonal sampling and scoring methods – Rosetta and molecular dynamics (MD) simulations – facilitated exploration of conformational space in principle. Here, we go beyond a proof-of-concept study and address significant remaining limitations of the iterative MD–Rosetta protein structure refinement protocol. Specifically, all parts of the iterative refinement protocol are now guided by medium-resolution cryoEM density maps, and previous knowledge about the native structure of the protein is no longer necessary. Models are identified solely based on score or simulation time. All four benchmark proteins showed substantial improvement through three rounds of the iterative refinement protocol. The best-scoring final models of two proteins had sub-Ångstrom RMSD to the native structure over residues in secondary structure elements. Molecular dynamics was most efficient in refining secondary structure elements and was thus highly complementary to the Rosetta refinement which is most powerful in refining side chains and loop regions. PMID:25883538

  15. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    PubMed

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

  16. Classification of proteins: available structural space for molecular modeling.

    PubMed

    Andreeva, Antonina

    2012-01-01

    The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.

  17. Functional classification of protein structures by local structure matching in graph representation.

    PubMed

    Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo

    2018-03-31

    As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  18. Thermostabilisation of membrane proteins for structural studies

    PubMed Central

    Magnani, Francesca; Serrano-Vega, Maria J.; Shibata, Yoko; Abdul-Hussein, Saba; Lebon, Guillaume; Miller-Gallacher, Jennifer; Singhal, Ankita; Strege, Annette; Thomas, Jennifer A.; Tate, Christopher G.

    2017-01-01

    The thermostability of an integral membrane protein in detergent solution is a key parameter that dictates the likelihood of obtaining well-diffracting crystals suitable for structure determination. However, many mammalian membrane proteins are too unstable for crystallisation. We developed a thermostabilisation strategy based on systematic mutagenesis coupled to a radioligand-binding thermostability assay that can be applied to receptors, ion channels and transporters. It takes approximately 6-12 months to thermostabilise a G protein-coupled receptor (GPCR) containing 300 amino acid residues. The resulting thermostabilised membrane proteins are more easily crystallised and result in high-quality structures. This methodology has facilitated structure-based drug design applied to GPCRs, because it is possible to determine multiple structures of the thermostabilised receptors bound to low affinity ligands. Protocols and advice are given on how to develop thermostability assays for membrane proteins and how to combine mutations to make an optimally stable mutant suitable for structural studies. PMID:27466713

  19. Protein docking by the interface structure similarity: how much structure is needed?

    PubMed

    Sinha, Rohita; Kundrotas, Petras J; Vakser, Ilya A

    2012-01-01

    The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.

  20. Protein structure database search and evolutionary classification.

    PubMed

    Yang, Jinn-Moon; Tung, Chi-Hua

    2006-01-01

    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].

  1. The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods.

    PubMed

    Gabanyi, Margaret J; Adams, Paul D; Arnold, Konstantin; Bordoli, Lorenza; Carter, Lester G; Flippen-Andersen, Judith; Gifford, Lida; Haas, Juergen; Kouranov, Andrei; McLaughlin, William A; Micallef, David I; Minor, Wladek; Shah, Raship; Schwede, Torsten; Tao, Yi-Ping; Westbrook, John D; Zimmerman, Matthew; Berman, Helen M

    2011-07-01

    The Protein Structure Initiative's Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org ) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI's high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.

  2. Protein Structure and Function Prediction Using I-TASSER

    PubMed Central

    Yang, Jianyi; Zhang, Yang

    2016-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386

  3. Recognition of coarse-grained protein tertiary structure.

    PubMed

    Lezon, Timothy; Banavar, Jayanth R; Maritan, Amos

    2004-05-15

    A model of the protein backbone is considered in which each residue is characterized by the location of its C(alpha) atom and one of a discrete set of conformal (phi, psi) states. We investigate the key differences between a description that offers a locally precise fit to known backbone structures and one that provides a globally accurate fit to protein structures. Using a statistical scoring scheme and threading, a protein's local best-fit conformation is highly recognizable, but its global structure cannot be directly determined from an amino acid sequence. The incorporation of information about the conformal states of neighboring residues along the chain allows one to accurately translate the local structure into a global structure. We present a two-step algorithm, which recognizes up to 95% of the tested protein native-state structures to within a 2.5 A root mean square deviation. Copyright 2004 Wiley-Liss, Inc.

  4. Recovery of Background Structures in Nanoscale Helium Ion Microscope Imaging

    PubMed Central

    Carasso, Alfred S; Vladár, András E

    2014-01-01

    This paper discusses a two step enhancement technique applicable to noisy Helium Ion Microscope images in which background structures are not easily discernible due to a weak signal. The method is based on a preliminary adaptive histogram equalization, followed by ‘slow motion’ low-exponent Lévy fractional diffusion smoothing. This combined approach is unexpectedly effective, resulting in a companion enhanced image in which background structures are rendered much more visible, and noise is significantly reduced, all with minimal loss of image sharpness. The method also provides useful enhancements of scanning charged-particle microscopy images obtained by composing multiple drift-corrected ‘fast scan’ frames. The paper includes software routines, written in Interactive Data Language (IDL),1 that can perform the above image processing tasks. PMID:26601050

  5. Recovery of Background Structures in Nanoscale Helium Ion Microscope Imaging.

    PubMed

    Carasso, Alfred S; Vladár, András E

    2014-01-01

    This paper discusses a two step enhancement technique applicable to noisy Helium Ion Microscope images in which background structures are not easily discernible due to a weak signal. The method is based on a preliminary adaptive histogram equalization, followed by 'slow motion' low-exponent Lévy fractional diffusion smoothing. This combined approach is unexpectedly effective, resulting in a companion enhanced image in which background structures are rendered much more visible, and noise is significantly reduced, all with minimal loss of image sharpness. The method also provides useful enhancements of scanning charged-particle microscopy images obtained by composing multiple drift-corrected 'fast scan' frames. The paper includes software routines, written in Interactive Data Language (IDL),(1) that can perform the above image processing tasks.

  6. High-Throughput Characterization of Intrinsic Disorder in Proteins from the Protein Structure Initiative

    PubMed Central

    Johnson, Derrick E.; Xue, Bin; Sickmeier, Megan D.; Meng, Jingwei; Cortese, Marc S.; Oldfield, Christopher J.; Le Gall, Tanguy; Dunker, A. Keith; Uversky, Vladimir N.

    2012-01-01

    The identification of intrinsically disordered proteins (IDPs) among the targets that fail to form satisfactory crystal structures in the Protein Structure Initiative represent a key to reducing the costs and time for determining three-dimensional structures of proteins. To help in this endeavor, several Protein Structure Initiative Centers were asked to send samples of both crystallizable proteins and proteins that failed to crystallize. The abundance of intrinsic disorder in these proteins was evaluated via computational analysis using Predictors of Natural Disordered Regions (PONDR®) and the potential cleavage sites and corresponding fragments were determined. Then, the target proteins were analyzed for intrinsic disorder by their resistance to limited proteolysis. The rates of tryptic digestion of sample target proteins were compared to those of lysozyme/myoglobin, apo-myoglobin and α-casein as standards of ordered, partially disordered and completely disordered proteins, respectively. At the next stage, the protein samples were subjected to both far-UV and near-UV circular dichroism (CD) analysis. For most of the samples, a good agreement between CD data, predictions of disorder and the rates of limited tryptic digestion was established. Further experimentation is being performed on a smaller subset of these samples in order to obtain more detailed information on the ordered/disordered nature of the proteins. PMID:22651963

  7. General overview on structure prediction of twilight-zone proteins.

    PubMed

    Khor, Bee Yin; Tye, Gee Jun; Lim, Theam Soon; Choong, Yee Siew

    2015-09-04

    Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined structures are unavailable, the predictive structures may serve as starting points to study a protein. If the target protein consists of homologous region, high-resolution (typically <1.5 Å) model can be built via comparative modelling. However, when confronted with low sequence similarity of the target protein (also known as twilight-zone protein, sequence identity with available templates is less than 30%), the protein structure prediction has to be initiated from scratch. Traditionally, twilight-zone proteins can be predicted via threading or ab initio method. Based on the current trend, combination of different methods brings an improved success in the prediction of twilight-zone proteins. In this mini review, the methods, progresses and challenges for the prediction of twilight-zone proteins were discussed.

  8. Comparative Protein Structure Modeling Using MODELLER

    PubMed Central

    Webb, Benjamin; Sali, Andrej

    2016-01-01

    Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. PMID:27322406

  9. Exploring the Universe of Protein Structures beyond the Protein Data Bank

    PubMed Central

    Cossio, Pilar; Trovato, Antonio; Pietrucci, Fabio; Seno, Flavio; Maritan, Amos; Laio, Alessandro

    2010-01-01

    It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds. PMID:21079678

  10. Dissecting the relationship between protein structure and sequence variation

    NASA Astrophysics Data System (ADS)

    Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

    2015-03-01

    Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.

  11. Binding free energy analysis of protein-protein docking model structures by evERdock.

    PubMed

    Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio

    2018-03-14

    To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.

  12. Binding free energy analysis of protein-protein docking model structures by evERdock

    NASA Astrophysics Data System (ADS)

    Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio

    2018-03-01

    To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.

  13. Protein-protein structure prediction by scoring molecular dynamics trajectories of putative poses.

    PubMed

    Sarti, Edoardo; Gladich, Ivan; Zamuner, Stefano; Correia, Bruno E; Laio, Alessandro

    2016-09-01

    The prediction of protein-protein interactions and their structural configuration remains a largely unsolved problem. Most of the algorithms aimed at finding the native conformation of a protein complex starting from the structure of its monomers are based on searching the structure corresponding to the global minimum of a suitable scoring function. However, protein complexes are often highly flexible, with mobile side chains and transient contacts due to thermal fluctuations. Flexibility can be neglected if one aims at finding quickly the approximate structure of the native complex, but may play a role in structure refinement, and in discriminating solutions characterized by similar scores. We here benchmark the capability of some state-of-the-art scoring functions (BACH-SixthSense, PIE/PISA and Rosetta) in discriminating finite-temperature ensembles of structures corresponding to the native state and to non-native configurations. We produce the ensembles by running thousands of molecular dynamics simulations in explicit solvent starting from poses generated by rigid docking and optimized in vacuum. We find that while Rosetta outperformed the other two scoring functions in scoring the structures in vacuum, BACH-SixthSense and PIE/PISA perform better in distinguishing near-native ensembles of structures generated by molecular dynamics in explicit solvent. Proteins 2016; 84:1312-1320. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  14. Protein sectors: evolutionary units of three-dimensional structure

    PubMed Central

    Halabi, Najeeb; Rivoire, Olivier; Leibler, Stanislas; Ranganathan, Rama

    2011-01-01

    Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term “protein sectors”. Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories. PMID:19703402

  15. A novel structural tree for wrap-proteins, a subclass of (α+β)-proteins.

    PubMed

    Boshkova, Eugenia A; Gordeev, Alexey B; Efimov, Alexander V

    2014-01-01

    In this paper, a novel structural subclass of (α+β)-proteins is presented. A characteristic feature of these proteins and domains is that they consist of strongly twisted and coiled β-sheets wrapped around one or two α-helices, so they are referred to here as wrap-proteins. It is shown that overall folds of the wrap-proteins can be obtained by stepwise addition of α-helices and/or β-strands to the strongly twisted and coiled β-hairpin taken as the starting structure in modeling. As a result of modeling, a structural tree for the wrap-proteins was constructed that includes 201 folds of which 49 occur in known nonhomologous proteins.

  16. Structural alignment of protein descriptors - a combinatorial model.

    PubMed

    Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek

    2016-09-17

    Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare

  17. A Web-Accessible Protein Structure Prediction Pipeline

    DTIC Science & Technology

    2009-06-01

    Abstract Proteins are the molecular basis of nearly all structural, catalytic, sensory, and regulatory functions in living organisms. The biological...sensory, and regulatory functions in living organisms. The structure of a protein is essential in understanding its function at the molecular level...Characterizing sequence-structure and structure-function relationships have been the goals of molecular biology for more than three decades

  18. Proteins without unique 3D structures: biotechnological applications of intrinsically unstable/disordered proteins.

    PubMed

    Uversky, Vladimir N

    2015-03-01

    Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Automation of NMR structure determination of proteins.

    PubMed

    Altieri, Amanda S; Byrd, R Andrew

    2004-10-01

    The automation of protein structure determination using NMR is coming of age. The tedious processes of resonance assignment, followed by assignment of NOE (nuclear Overhauser enhancement) interactions (now intertwined with structure calculation), assembly of input files for structure calculation, intermediate analyses of incorrect assignments and bad input data, and finally structure validation are all being automated with sophisticated software tools. The robustness of the different approaches continues to deal with problems of completeness and uniqueness; nevertheless, the future is very bright for automation of NMR structure generation to approach the levels found in X-ray crystallography. Currently, near completely automated structure determination is possible for small proteins, and the prospect for medium-sized and large proteins is good. Copyright 2004 Elsevier Ltd.

  20. Query3d: a new method for high-throughput analysis of functional residues in protein structures

    PubMed Central

    Ausiello, Gabriele; Via, Allegra; Helmer-Citterich, Manuela

    2005-01-01

    Background The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Results Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. Conclusion With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface. PMID:16351754

  1. Structural perturbation of proteins in low denaturant concentrations.

    PubMed

    Basak, S; Debnath, D; Haque, E; Ray, S; Chakrabarti, A

    2001-01-01

    The presence of very low concentrations of the widely used chemical denaturants, guanidinium chloride and urea, induce changes in the tertiary structure of proteins. We have presented results on such changes in four structurally unrelated proteins to show that such structural perturbations are common irrespective of their origin. Data representative of such structural changes are shown for the monomeric globular proteins such as horseradish peroxidase (HRP) from a plant, human serum albumin (HSA) and prothrombin from ovine blood serum, and for the membrane-associated, worm-like elongated protein, spectrin, from ovine erythrocytes. Structural alterations in these proteins were reflected in quenching studies of tryptophan fluorescence using the widely used quencher acrylamide. Stern-Volmer quenching constants measured in presence of the denaturants, even at concentrations below 100 mM, were higher than those measured in absence of the denaturants. Both steady-state and time-resolved fluorescence emission properties of tryptophan and of the extrinsic probe PRODAN were used for monitoring conformational changes in the proteins in presence of different low concentrations of the denaturants. These results are consistent with earlier studies from our laboratory indicating structural perturbations in proteins at the tertiary level, keeping their native-like secondary structure and their biological activity more or less intact.

  2. The structure of a cholesterol-trapping protein

    Science.gov Websites

    Date February 28, 2003 Date Berkeley Lab Science Beat Berkeley Lab Science Beat The structure of a Institute researchers determined the three-dimensional structure of a protein that controls cholesterol level in the bloodstream. Knowing the structure of the protein, a cellular receptor that ensnares

  3. Computer analysis of protein functional sites projection on exon structure of genes in Metazoa

    PubMed Central

    2015-01-01

    Background Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. Results One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. Conclusions These results characterize the features of the coding exon-intron structure that affect the

  4. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

    PubMed

    Fan, Ming; Zheng, Bin; Li, Lihua

    2015-10-01

    Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.

  5. Structural adaptation of extreme halophilic proteins through decrease of conserved hydrophobic contact surface

    PubMed Central

    2011-01-01

    Background Halophiles are extremophilic microorganisms growing optimally at high salt concentrations. There are two strategies used by halophiles to maintain proper osmotic pressure in their cytoplasm: accumulation of molar concentrations of potassium and chloride with extensive adaptation of the intracellular macromolecules ("salt-in" strategy) or biosynthesis and/or accumulation of organic osmotic solutes ("osmolyte" strategy). Our work was aimed at contributing to the understanding of the shared molecular mechanisms of protein haloadaptation through a detailed and systematic comparison of a sample of several three-dimensional structures of halophilic and non-halophilic proteins. Structural differences observed between the "salt-in" and the mesophilic homologous proteins were contrasted to those observed between the "osmolyte" and mesophilic pairs. Results The results suggest that haloadaptation strategy in the presence of molar salt concentration, but not of osmolytes, necessitates a weakening of the hydrophobic interactions, in particular at the level of conserved hydrophobic contacts. Weakening of these interactions counterbalances their strengthening by the presence of salts in solution and may help the structure preventing aggregation and/or loss of function in hypersaline environments. Conclusions Considering the significant increase of biotechnology applications of halophiles, the understanding of halophilicity can provide the theoretical basis for the engineering of proteins of great interest because stable at concentrations of salts that cause the denaturation or aggregation of the majority of macromolecules. PMID:22192175

  6. Accurate protein structure modeling using sparse NMR data and homologous structure information.

    PubMed

    Thompson, James M; Sgourakis, Nikolaos G; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L; Szyperski, Thomas; Montelione, Gaetano T; Baker, David

    2012-06-19

    While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 Å relative to the conventional determined NMR ensembles and of 0.9-1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.

  7. An overview of the structures of protein-DNA complexes

    PubMed Central

    Luscombe, Nicholas M; Austin, Susan E; Berman , Helen M; Thornton, Janet M

    2000-01-01

    On the basis of a structural analysis of 240 protein-DNA complexes contained in the Protein Data Bank (PDB), we have classified the DNA-binding proteins involved into eight different structural/functional groups, which are further classified into 54 structural families. Here we present this classification and review the functions, structures and binding interactions of these protein-DNA complexes. PMID:11104519

  8. Structure-Based Druggability Assessment of the Mammalian Structural Proteome with Inclusion of Light Protein Flexibility

    PubMed Central

    Loving, Kathryn A.; Lin, Andy; Cheng, Alan C.

    2014-01-01

    Advances reported over the last few years and the increasing availability of protein crystal structure data have greatly improved structure-based druggability approaches. However, in practice, nearly all druggability estimation methods are applied to protein crystal structures as rigid proteins, with protein flexibility often not directly addressed. The inclusion of protein flexibility is important in correctly identifying the druggability of pockets that would be missed by methods based solely on the rigid crystal structure. These include cryptic pockets and flexible pockets often found at protein-protein interaction interfaces. Here, we apply an approach that uses protein modeling in concert with druggability estimation to account for light protein backbone movement and protein side-chain flexibility in protein binding sites. We assess the advantages and limitations of this approach on widely-used protein druggability sets. Applying the approach to all mammalian protein crystal structures in the PDB results in identification of 69 proteins with potential druggable cryptic pockets. PMID:25079060

  9. Discrete-continuous duality of protein structure space.

    PubMed

    Sadreyev, Ruslan I; Kim, Bong-Hyun; Grishin, Nick V

    2009-06-01

    Recently, the nature of protein structure space has been widely discussed in the literature. The traditional discrete view of protein universe as a set of separate folds has been criticized in the light of growing evidence that almost any arrangement of secondary structures is possible and the whole protein space can be traversed through a path of similar structures. Here we argue that the discrete and continuous descriptions are not mutually exclusive, but complementary: the space is largely discrete in evolutionary sense, but continuous geometrically when purely structural similarities are quantified. Evolutionary connections are mainly confined to separate structural prototypes corresponding to folds as islands of structural stability, with few remaining traceable links between the islands. However, for a geometric similarity measure, it is usually possible to find a reasonable cutoff that yields paths connecting any two structures through intermediates.

  10. Hidden Structural Codes in Protein Intrinsic Disorder.

    PubMed

    Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo

    2017-10-17

    Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.

  11. PBxplore: a tool to analyze local protein structure and deformability with Protein Blocks

    PubMed Central

    Craveur, Pierrick; Joseph, Agnel Praveen; Jallu, Vincent

    2017-01-01

    This paper describes the development and application of a suite of tools, called PBxplore, to analyze the dynamics and deformability of protein structures using Protein Blocks (PBs). Proteins are highly dynamic macromolecules, and a classical way to analyze their inherent flexibility is to perform molecular dynamics simulations. The advantage of using small structural prototypes such as PBs is to give a good approximation of the local structure of the protein backbone. More importantly, by reducing the conformational complexity of protein structures, PBs allow analysis of local protein deformability which cannot be done with other methods and had been used efficiently in different applications. PBxplore is able to process large amounts of data such as those produced by molecular dynamics simulations. It produces frequencies, entropy and information logo outputs as text and graphics. PBxplore is available at https://github.com/pierrepo/PBxplore and is released under the open-source MIT license. PMID:29177113

  12. Protein Structures Revealed at Record Pace

    ScienceCinema

    Hura, Greg

    2017-12-11

    The structure of a protein in days -- not months or years -- ushers in a new era in genomics research. Berkeley Lab scientists have developed a high-throughput protein pipeline that could expedite the development of biofuels and elucidate how proteins carry out lifes vital functions.

  13. Protein Structures Revealed at Record Pace

    ScienceCinema

    Greg Hura

    2017-12-09

    The structure of a protein in days -- not months or years -- ushers in a new era in genomics research. Berkeley Lab scientists have developed a high-throughput protein pipeline that could expedite the development of biofuels and elucidate how proteins carry out lifes vital functions.

  14. Structural principles within the human-virus protein-protein interaction network

    PubMed Central

    Franzosa, Eric A.; Xia, Yu

    2011-01-01

    General properties of the antagonistic biomolecular interactions between viruses and their hosts (exogenous interactions) remain poorly understood, and may differ significantly from known principles governing the cooperative interactions within the host (endogenous interactions). Systems biology approaches have been applied to study the combined interaction networks of virus and human proteins, but such efforts have so far revealed only low-resolution patterns of host-virus interaction. Here, we layer curated and predicted 3D structural models of human-virus and human-human protein complexes on top of traditional interaction networks to reconstruct the human-virus structural interaction network. This approach reveals atomic resolution, mechanistic patterns of host-virus interaction, and facilitates systematic comparison with the host’s endogenous interactions. We find that exogenous interfaces tend to overlap with and mimic endogenous interfaces, thereby competing with endogenous binding partners. The endogenous interfaces mimicked by viral proteins tend to participate in multiple endogenous interactions which are transient and regulatory in nature. While interface overlap in the endogenous network results largely from gene duplication followed by divergent evolution, viral proteins frequently achieve interface mimicry without any sequence or structural similarity to an endogenous binding partner. Finally, while endogenous interfaces tend to evolve more slowly than the rest of the protein surface, exogenous interfaces—including many sites of endogenous-exogenous overlap—tend to evolve faster, consistent with an evolutionary “arms race” between host and pathogen. These significant biophysical, functional, and evolutionary differences between host-pathogen and within-host protein-protein interactions highlight the distinct consequences of antagonism versus cooperation in biological networks. PMID:21680884

  15. Structural Mass Spectrometry of Proteins Using Hydroxyl Radical Based Protein Footprinting

    PubMed Central

    Wang, Liwen; Chance, Mark R.

    2011-01-01

    Structural MS is a rapidly growing field with many applications in basic research and pharmaceutical drug development. In this feature article the overall technology is described and several examples of how hydroxyl radical based footprinting MS can be used to map interfaces, evaluate protein structure, and identify ligand dependent conformational changes in proteins are described. PMID:21770468

  16. Relation between native ensembles and experimental structures of proteins

    PubMed Central

    Best, Robert B.; Lindorff-Larsen, Kresten; DePristo, Mark A.; Vendruscolo, Michele

    2006-01-01

    Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of “high-sequence similarity Protein Data Bank” (HSP) structures and consider the extent to which such ensembles represent the structural heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest that even a modest number of structures of a protein determined under different conditions, or with small variations in sequence, capture a representative subset of the true native-state ensemble. PMID:16829580

  17. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    PubMed

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  18. A Generative Angular Model of Protein Structure Evolution

    PubMed Central

    Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

    2017-01-01

    Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724

  19. PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics.

    PubMed

    von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I

    2006-02-06

    The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.

  20. Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase

    PubMed Central

    2014-01-01

    Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively. PMID:24742328

  1. DNA mimic proteins: functions, structures, and bioinformatic analysis.

    PubMed

    Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

    2014-05-13

    DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.

  2. The use of experimental structures to model protein dynamics.

    PubMed

    Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L

    2015-01-01

    The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step

  3. GeneSilico protein structure prediction meta-server.

    PubMed

    Kurowski, Michal A; Bujnicki, Janusz M

    2003-07-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.

  4. Matt: local flexibility aids protein multiple structure alignment.

    PubMed

    Menke, Matthew; Berger, Bonnie; Cowen, Lenore

    2008-01-01

    Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these "bent" alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of alpha-helices and beta-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root

  5. Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

    PubMed Central

    Schmitz, Christophe; Vernon, Robert; Otting, Gottfried; Baker, David; Huber, Thomas

    2013-01-01

    Paramagnetic metal ions generate pseudocontact shifts (PCSs) in nuclear magnetic resonance spectra that are manifested as easily measurable changes in chemical shifts. Metals can be incorporated into proteins through metal binding tags, and PCS data constitute powerful long-range restraints on the positions of nuclear spins relative to the coordinate system of the magnetic susceptibility anisotropy tensor (Δχ-tensor) of the metal ion. We show that three-dimensional structures of proteins can reliably be determined using PCS data from a single metal binding site combined with backbone chemical shifts. The program PCS-ROSETTA automatically determines the Δχ-tensor and metal position from the PCS data during the structure calculations, without any prior knowledge of the protein structure. The program can determine structures accurately for proteins of up to 150 residues, offering a powerful new approach to protein structure determination that relies exclusively on readily measurable backbone chemical shifts and easily discriminates between correctly and incorrectly folded conformations. PMID:22285518

  6. Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM.

    PubMed

    Tuncbag, Nurcan; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem

    2011-08-11

    Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.

  7. Method for protein structure alignment

    DOEpatents

    Blankenbecler, Richard; Ohlsson, Mattias; Peterson, Carsten; Ringner, Markus

    2005-02-22

    This invention provides a method for protein structure alignment. More particularly, the present invention provides a method for identification, classification and prediction of protein structures. The present invention involves two key ingredients. First, an energy or cost function formulation of the problem simultaneously in terms of binary (Potts) assignment variables and real-valued atomic coordinates. Second, a minimization of the energy or cost function by an iterative method, where in each iteration (1) a mean field method is employed for the assignment variables and (2) exact rotation and/or translation of atomic coordinates is performed, weighted with the corresponding assignment variables.

  8. Improved protein surface comparison and application to low-resolution protein structure data.

    PubMed

    Sael, Lee; Kihara, Daisuke

    2010-12-14

    Recent advancements of experimental techniques for determining protein tertiary structures raise significant challenges for protein bioinformatics. With the number of known structures of unknown function expanding at a rapid pace, an urgent task is to provide reliable clues to their biological function on a large scale. Conventional approaches for structure comparison are not suitable for a real-time database search due to their slow speed. Moreover, a new challenge has arisen from recent techniques such as electron microscopy (EM), which provide low-resolution structure data. Previously, we have introduced a method for protein surface shape representation using the 3D Zernike descriptors (3DZDs). The 3DZD enables fast structure database searches, taking advantage of its rotation invariance and compact representation. The search results of protein surface represented with the 3DZD has showngood agreement with the existing structure classifications, but some discrepancies were also observed. The three new surface representations of backbone atoms, originally devised all-atom-surface representation, and the combination of all-atom surface with the backbone representation are examined. All representations are encoded with the 3DZD. Also, we have investigated the applicability of the 3DZD for searching protein EM density maps of varying resolutions. The surface representations are evaluated on structure retrieval using two existing classifications, SCOP and the CE-based classification. Overall, the 3DZDs representing backbone atoms show better retrieval performance than the original all-atom surface representation. The performance further improved when the two representations are combined. Moreover, we observed that the 3DZD is also powerful in comparing low-resolution structures obtained by electron microscopy.

  9. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  10. Topological properties of complex networks in protein structures

    NASA Astrophysics Data System (ADS)

    Kim, Kyungsik; Jung, Jae-Won; Min, Seungsik

    2014-03-01

    We study topological properties of networks in structural classification of proteins. We model the native-state protein structure as a network made of its constituent amino-acids and their interactions. We treat four structural classes of proteins composed predominantly of α helices and β sheets and consider several proteins from each of these classes whose sizes range from amino acids of the Protein Data Bank. Particularly, we simulate and analyze the network metrics such as the mean degree, the probability distribution of degree, the clustering coefficient, the characteristic path length, the local efficiency, and the cost. This work was supported by the KMAR and DP under Grant WISE project (153-3100-3133-302-350).

  11. 3D bioprinting of structural proteins.

    PubMed

    Włodarczyk-Biegun, Małgorzata K; Del Campo, Aránzazu

    2017-07-01

    3D bioprinting is a booming method to obtain scaffolds of different materials with predesigned and customized morphologies and geometries. In this review we focus on the experimental strategies and recent achievements in the bioprinting of major structural proteins (collagen, silk, fibrin), as a particularly interesting technology to reconstruct the biochemical and biophysical composition and hierarchical morphology of natural scaffolds. The flexibility in molecular design offered by structural proteins, combined with the flexibility in mixing, deposition, and mechanical processing inherent to bioprinting technologies, enables the fabrication of highly functional scaffolds and tissue mimics with a degree of complexity and organization which has only just started to be explored. Here we describe the printing parameters and physical (mechanical) properties of bioinks based on structural proteins, including the biological function of the printed scaffolds. We describe applied printing techniques and cross-linking methods, highlighting the modifications implemented to improve scaffold properties. The used cell types, cell viability, and possible construct applications are also reported. We envision that the application of printing technologies to structural proteins will enable unprecedented control over their supramolecular organization, conferring printed scaffolds biological properties and functions close to natural systems. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Structural changes of malt proteins during boiling.

    PubMed

    Jin, Bei; Li, Lin; Liu, Guo-Qin; Li, Bing; Zhu, Yu-Kui; Liao, Liao-Ning

    2009-03-09

    Changes in the physicochemical properties and structure of proteins derived from two malt varieties (Baudin and Guangmai) during wort boiling were investigated by differential scanning calorimetry, SDS-PAGE, two-dimensional electrophoresis, gel filtration chromatography and circular dichroism spectroscopy. The results showed that both protein content and amino acid composition changed only slightly during boiling, and that boiling might cause a gradual unfolding of protein structures, as indicated by the decrease in surface hydrophobicity and free sulfhydryl content and enthalpy value, as well as reduced alpha-helix contents and markedly increased random coil contents. It was also found that major component of both worts was a boiling-resistant protein with a molecular mass of 40 kDa, and that according to the two-dimensional electrophoresis and SE-HPLC analyses, a small amount of soluble aggregates might be formed via hydrophobic interactions. It was thus concluded that changes of protein structure caused by boiling that might influence beer quality are largely independent of malt variety.

  13. Conservation of protein structure over four billion years.

    PubMed

    Ingles-Prieto, Alvaro; Ibarra-Molero, Beatriz; Delgado-Delgado, Asuncion; Perez-Jimenez, Raul; Fernandez, Julio M; Gaucher, Eric A; Sanchez-Ruiz, Jose M; Gavira, Jose A

    2013-09-03

    Little is known about the evolution of protein structures and the degree of protein structure conservation over planetary time scales. Here, we report the X-ray crystal structures of seven laboratory resurrections of Precambrian thioredoxins dating up to approximately four billion years ago. Despite considerable sequence differences compared with extant enzymes, the ancestral proteins display the canonical thioredoxin fold, whereas only small structural changes have occurred over four billion years. This remarkable degree of structure conservation since a time near the last common ancestor of life supports a punctuated-equilibrium model of structure evolution in which the generation of new folds occurs over comparatively short periods and is followed by long periods of structural stasis. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. A hidden markov model derived structural alphabet for proteins.

    PubMed

    Camproux, A C; Gautier, R; Tufféry, P

    2004-06-04

    Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.

  15. 3D Complex: A Structural Classification of Protein Complexes

    PubMed Central

    Levy, Emmanuel D; Pereira-Leal, Jose B; Chothia, Cyrus; Teichmann, Sarah A

    2006-01-01

    Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes. PMID:17112313

  16. Rebelling for a Reason: Protein Structural “Outliers”

    PubMed Central

    Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini

    2013-01-01

    Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209

  17. HARMONY: a server for the assessment of protein structures

    PubMed Central

    Pugalenthi, G.; Shameer, K.; Srinivasan, N.; Sowdhamini, R.

    2006-01-01

    Protein structure validation is an important step in computational modeling and structure determination. Stereochemical assessment of protein structures examine internal parameters such as bond lengths and Ramachandran (φ,ψ) angles. Gross structure prediction methods such as inverse folding procedure and structure determination especially at low resolution can sometimes give rise to models that are incorrect due to assignment of misfolds or mistracing of electron density maps. Such errors are not reflected as strain in internal parameters. HARMONY is a procedure that examines the compatibility between the sequence and the structure of a protein by assigning scores to individual residues and their amino acid exchange patterns after considering their local environments. Local environments are described by the backbone conformation, solvent accessibility and hydrogen bonding patterns. We are now providing HARMONY through a web server such that users can submit their protein structure files and, if required, the alignment of homologous sequences. Scores are mapped on the structure for subsequent examination that is useful to also recognize regions of possible local errors in protein structures. HARMONY server is located at PMID:16844999

  18. The Use of Experimental Structures to Model Protein Dynamics

    PubMed Central

    Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.

    2014-01-01

    Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to

  19. Website on Protein Interaction and Protein Structure Related Work

    NASA Technical Reports Server (NTRS)

    Samanta, Manoj; Liang, Shoudan; Biegel, Bryan (Technical Monitor)

    2003-01-01

    In today's world, three seemingly diverse fields - computer information technology, nanotechnology and biotechnology are joining forces to enlarge our scientific knowledge and solve complex technological problems. Our group is dedicated to conduct theoretical research exploring the challenges in this area. The major areas of research include: 1) Yeast Protein Interactions; 2) Protein Structures; and 3) Current Transport through Small Molecules.

  20. GeneSilico protein structure prediction meta-server

    PubMed Central

    Kurowski, Michal A.; Bujnicki, Janusz M.

    2003-01-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313

  1. Protein domain assignment from the recurrence of locally similar structures

    PubMed Central

    Tai, Chin-Hsien; Sam, Vichetra; Gibrat, Jean-Francois; Garnier, Jean; Munson, Peter J.

    2010-01-01

    Domains are basic units of protein structure and essential for exploring protein fold space and structure evolution. With the structural genomics initiative, the number of protein structures in the Protein Databank (PDB) is increasing dramatically and domain assignments need to be done automatically. Most existing structural domain assignment programs define domains using the compactness of the domains and/or the number and strength of intra-domain versus inter-domain contacts. Here we present a different approach based on the recurrence of locally similar structural pieces (LSSPs) found by one-against-all structure comparisons with a dataset of 6,373 protein chains from the PDB. Residues of the query protein are clustered using LSSPs via three different procedures to define domains. This approach gives results that are comparable to several existing programs that use geometrical and other structural information explicitly. Remarkably, most of the proteins that contribute the LSSPs defining a domain do not themselves contain the domain of interest. This study shows that domains can be defined by a collection of relatively small locally similar structural pieces containing, on average, four secondary structure elements. In addition, it indicates that domains are indeed made of recurrent small structural pieces that are used to build protein structures of many different folds as suggested by recent studies. PMID:21287617

  2. Fragger: a protein fragment picker for structural queries.

    PubMed

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  3. Meeting Report: Structural Determination of Environmentally Responsive Proteins

    PubMed Central

    Reinlib, Leslie

    2005-01-01

    The three-dimensional structure of gene products continues to be a missing lynchpin between linear genome sequences and our understanding of the normal and abnormal function of proteins and pathways. Enhanced activity in this area is likely to lead to better understanding of how discrete changes in molecular patterns and conformation underlie functional changes in protein complexes and, with it, sensitivity of an individual to an exposure. The National Institute of Environmental Health Sciences convened a workshop of experts in structural determination and environmental health to solicit advice for future research in structural resolution relative to environmentally responsive proteins and pathways. The highest priorities recommended by the workshop were to support studies of structure, analysis, control, and design of conformational and functional states at molecular resolution for environmentally responsive molecules and complexes; promote understanding of dynamics, kinetics, and ligand responses; investigate the mechanisms and steps in posttranslational modifications, protein partnering, impact of genetic polymorphisms on structure/function, and ligand interactions; and encourage integrated experimental and computational approaches. The workshop participants also saw value in improving the throughput and purity of protein samples and macromolecular assemblies; developing optimal processes for design, production, and assembly of macromolecular complexes; encouraging studies on protein–protein and macromolecular interactions; and examining assemblies of individual proteins and their functions in pathways of interest for environmental health. PMID:16263521

  4. Protein structure-structure alignment with discrete Fréchet distance.

    PubMed

    Jiang, Minghui; Xu, Ying; Zhu, Binhai

    2008-02-01

    Matching two geometric objects in two-dimensional (2D) and three-dimensional (3D) spaces is a central problem in computer vision, pattern recognition, and protein structure prediction. In particular, the problem of aligning two polygonal chains under translation and rotation to minimize their distance has been studied using various distance measures. It is well known that the Hausdorff distance is useful for matching two point sets, and that the Fréchet distance is a superior measure for matching two polygonal chains. The discrete Fréchet distance closely approximates the (continuous) Fréchet distance, and is a natural measure for the geometric similarity of the folded 3D structures of biomolecules such as proteins. In this paper, we present new algorithms for matching two polygonal chains in two dimensions to minimize their discrete Fréchet distance under translation and rotation, and an effective heuristic for matching two polygonal chains in three dimensions. We also describe our empirical results on the application of the discrete Fréchet distance to protein structure-structure alignment.

  5. Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar for Beta-Strand Sandwich-Like Structures

    PubMed Central

    Kister, Alexander

    2015-01-01

    We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198

  6. Visualisation and graph-theoretic analysis of a large-scale protein structural interactome

    PubMed Central

    Bolser, Dan; Dafas, Panos; Harrington, Richard; Park, Jong; Schroeder, Michael

    2003-01-01

    Background Large-scale protein interaction maps provide a new, global perspective with which to analyse protein function. PSIMAP, the Protein Structural Interactome Map, is a database of all the structurally observed interactions between superfamilies of protein domains with known three-dimensional structure in the PDB. PSIMAP incorporates both functional and evolutionary information into a single network. Results We present a global analysis of PSIMAP using several distinct network measures relating to centrality, interactivity, fault-tolerance, and taxonomic diversity. We found the following results: Centrality: we show that the center and barycenter of PSIMAP do not coincide, and that the superfamilies forming the barycenter relate to very general functions, while those constituting the center relate to enzymatic activity. Interactivity: we identify the P-loop and immunoglobulin superfamilies as the most highly interactive. We successfully use connectivity and cluster index, which characterise the connectivity of a superfamily's neighbourhood, to discover superfamilies of complex I and II. This is particularly significant as the structure of complex I is not yet solved. Taxonomic diversity: we found that highly interactive superfamilies are in general taxonomically very diverse and are thus amongst the oldest. Fault-tolerance: we found that the network is very robust as for the majority of superfamilies removal from the network will not break up the network. Conclusions Overall, we can single out the P-loop containing nucleotide triphosphate hydrolases superfamily as it is the most highly connected and has the highest taxonomic diversity. In addition, this superfamily has the highest interaction rank, is the barycenter of the network (it has the shortest average path to every other superfamily in the network), and is an articulation vertex, whose removal will disconnect the network. More generally, we conclude that the graph-theoretic and taxonomic analysis of

  7. ProTSAV: A protein tertiary structure analysis and validation server.

    PubMed

    Singh, Ankita; Kaushik, Rahul; Mishra, Avinash; Shanker, Asheesh; Jayaram, B

    2016-01-01

    Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

    PubMed Central

    2014-01-01

    Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245

  9. Sequence-similar, structure-dissimilar protein pairs in the PDB.

    PubMed

    Kosloff, Mickey; Kolodny, Rachel

    2008-05-01

    It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We

  10. Constraint Logic Programming approach to protein structure prediction.

    PubMed

    Dal Palù, Alessandro; Dovier, Agostino; Fogolari, Federico

    2004-11-30

    The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  11. Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces.

    PubMed

    Venselaar, Hanka; Te Beek, Tim A H; Kuipers, Remko K P; Hekkelman, Maarten L; Vriend, Gert

    2010-11-08

    Many newly detected point mutations are located in protein-coding regions of the human genome. Knowledge of their effects on the protein's 3D structure provides insight into the protein's mechanism, can aid the design of further experiments, and eventually can lead to the development of new medicines and diagnostic tools. In this article we describe HOPE, a fully automatic program that analyzes the structural and functional effects of point mutations. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein's 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers. We tested HOPE by comparing its output to the results of manually performed projects. In all straightforward cases HOPE performed similar to a trained bioinformatician. The use of 3D structures helps optimize the results in terms of reliability and details. HOPE's results are easy to understand and are presented in a way that is attractive for researchers without an extensive bioinformatics background.

  12. Structural hot spots for the solubility of globular proteins

    PubMed Central

    Ganesan, Ashok; Siekierska, Aleksandra; Beerten, Jacinte; Brams, Marijke; Van Durme, Joost; De Baets, Greet; Van der Kant, Rob; Gallardo, Rodrigo; Ramakers, Meine; Langenberg, Tobias; Wilkinson, Hannah; De Smet, Frederik; Ulens, Chris; Rousseau, Frederic; Schymkowitz, Joost

    2016-01-01

    Natural selection shapes protein solubility to physiological requirements and recombinant applications that require higher protein concentrations are often problematic. This raises the question whether the solubility of natural protein sequences can be improved. We here show an anti-correlation between the number of aggregation prone regions (APRs) in a protein sequence and its solubility, suggesting that mutational suppression of APRs provides a simple strategy to increase protein solubility. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein solubility are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein solubility independently of structure and function. PMID:26905391

  13. Chromophore Structure of Photochromic Fluorescent Protein Dronpa: Acid-Base Equilibrium of Two Cis Configurations.

    PubMed

    Higashino, Asuka; Mizuno, Misao; Mizutani, Yasuhisa

    2016-04-07

    Dronpa is a novel photochromic fluorescent protein that exhibits fast response to light. The present article is the first report of the resonance and preresonance Raman spectra of Dronpa. We used the intensity and frequency of Raman bands to determine the structure of the Dronpa chromophore in two thermally stable photochromic states. The acid-base equilibrium in one photochromic state was observed by spectroscopic pH titration. The Raman spectra revealed that the chromophore in this state shows a protonation/deprotonation transition with a pKa of 5.2 ± 0.3 and maintains the cis configuration. The observed resonance Raman bands showed that the other photochromic state of the chromophore is in a trans configuration. The results demonstrate that Raman bands selectively enhanced for the chromophore yield valuable information on the molecular structure of the chromophore in photochromic fluorescent proteins after careful elimination of the fluorescence background.

  14. Tertiary alphabet for the observable protein structural universe.

    PubMed

    Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg

    2016-11-22

    Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.

  15. Tertiary alphabet for the observable protein structural universe

    PubMed Central

    Mackenzie, Craig O.; Zhou, Jianfu; Grigoryan, Gevorg

    2016-01-01

    Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence—a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure. PMID:27810958

  16. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.

    PubMed

    Li, Haiou; Lyu, Qiang; Cheng, Jianlin

    2016-12-01

    Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

  17. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    PubMed

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on

  18. Fibrous Protein Structures: Hierarchy, History and Heroes.

    PubMed

    Squire, John M; Parry, David A D

    2017-01-01

    During the 1930s and 1940s the technique of X-ray diffraction was applied widely by William Astbury and his colleagues to a number of naturally-occurring fibrous materials. On the basis of the diffraction patterns obtained, he observed that the structure of each of the fibres was dominated by one of a small number of different types of molecular conformation. One group of fibres, known as the k-m-e-f group of proteins (keratin - myosin - epidermin - fibrinogen), gave rise to diffraction characteristics that became known as the α-pattern. Others, such as those from a number of silks, gave rise to a different pattern - the β-pattern, while connective tissues yielded a third unique set of diffraction characteristics. At the time of Astbury's work, the structures of these materials were unknown, though the spacings of the main X-ray reflections gave an idea of the axial repeats and the lateral packing distances. In a breakthrough in the early 1950s, the basic structures of all of these fibrous proteins were determined. It was found that the long protein chains, composed of strings of amino acids, could be folded up in a systematic manner to generate a limited number of structures that were consistent with the X-ray data. The most important of these were known as the α-helix, the β-sheet, and the collagen triple helix. These studies provided information about the basic building blocks of all proteins, both fibrous and globular. They did not, however, provide detailed information about how these molecules packed together in three-dimensions to generate the fibres found in vivo. A number of possible packing arrangements were subsequently deduced from the X-ray diffraction and other data, but it is only in the last few years, through the continued improvements of electron microscopy, that the packing details within some fibrous proteins can now be seen directly. Here we outline briefly some of the milestones in fibrous protein structure determination, the role of the

  19. Control of Flexible Structures (COFS) Flight Experiment Background and Description

    NASA Technical Reports Server (NTRS)

    Hanks, B. R.

    1985-01-01

    A fundamental problem in designing and delivering large space structures to orbit is to provide sufficient structural stiffness and static configuration precision to meet performance requirements. These requirements are directly related to control requirements and the degree of control system sophistication available to supplement the as-built structure. Background and rationale are presented for a research study in structures, structural dynamics, and controls using a relatively large, flexible beam as a focus. This experiment would address fundamental problems applicable to large, flexible space structures in general and would involve a combination of ground tests, flight behavior prediction, and instrumented orbital tests. Intended to be multidisciplinary but basic within each discipline, the experiment should provide improved understanding and confidence in making design trades between structural conservatism and control system sophistication for meeting static shape and dynamic response/stability requirements. Quantitative results should be obtained for use in improving the validity of ground tests for verifying flight performance analyses.

  20. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of

  1. An Evolutionarily Structured Universe of Protein Architecture

    PubMed Central

    Caetano-Anollés, Gustavo; Caetano-Anollés, Derek

    2003-01-01

    Protein structural diversity encompasses a finite set of architectural designs. Embedded in these topologies are evolutionary histories that we here uncover using cladistic principles and measurements of protein-fold usage and sharing. The reconstructed phylogenies are inherently rooted and depict histories of protein and proteome diversification. Proteome phylogenies showed two monophyletic sister-groups delimiting Bacteria and Archaea, and a topology rooted in Eucarya. This suggests three dramatic evolutionary events and a common ancestor with a eukaryotic-like, gene-rich, and relatively modern organization. Conversely, a general phylogeny of protein architectures showed that structural classes of globular proteins appeared early in evolution and in defined order, the α/β class being the first. Although most ancestral folds shared a common architecture of barrels or interleaved β-sheets and α-helices, many were clearly derived, such as polyhedral folds in the all-α class and β-sandwiches, β-propellers, and β-prisms in all-β proteins. We also describe transformation pathways of architectures that are prevalently used in nature. For example, β-barrels with increased curl and stagger were favored evolutionary outcomes in the all-β class. Interestingly, we found cases where structural change followed the α-to-β tendency uncovered in the tree of architectures. Lastly, we traced the total number of enzymatic functions associated with folds in the trees and show that there is a general link between structure and enzymatic function. PMID:12840035

  2. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

    PubMed

    Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

    2018-03-10

    Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.

  3. Structure and sources of the sporadic meteor background from video observations

    NASA Astrophysics Data System (ADS)

    Jakšová, Ivana; Porubčan, Vladimír; Klačka, Jozef

    2015-10-01

    We investigate and discuss the structure of the sporadic meteor background population in the near-Earth space based on video meteor orbits from the SonotaCo database (SonotaCo 2009, WGN, 37, 55). The selection of the shower meteors was done by the Southworth-Hawkins streams-search criterion (Southworth & Hawkins 1963, Smithson. Contr. Astrophys., 7, 261). Of a total of 117786 orbits, 69.34% were assigned to sporadic background meteors. Our analysis revealed all the known sporadic sources, such as the dominant apex source which is splitting into the northern and southern branch. Part of a denser ring structure about the apex source connecting the antihelion and north toroidal sources is also evident. We showed that the annual activity of the apex source is similar to the annual variation in activity of the whole sporadic background. The antihelion source exhibits a very broad maximum from July until January and the north toroidal source shows three maxima similar to the radar observations by the Canadian Meteor Orbit Radar (CMOR). Potential parent bodies of the sporadic population were searched for by comparison of the distributions of the orbital elements of sporadic meteors, minor planets and comets.

  4. Design and structure of an equilibrium protein folding intermediate: a hint into dynamical regions of proteins.

    PubMed

    Ayuso-Tejedor, Sara; Angarica, Vladimir Espinosa; Bueno, Marta; Campos, Luis A; Abián, Olga; Bernadó, Pau; Sancho, Javier; Jiménez, M Angeles

    2010-07-23

    Partly unfolded protein conformations close to the native state may play important roles in protein function and in protein misfolding. Structural analyses of such conformations which are essential for their fully physicochemical understanding are complicated by their characteristic low populations at equilibrium. We stabilize here with a single mutation the equilibrium intermediate of apoflavodoxin thermal unfolding and determine its solution structure by NMR. It consists of a large native region identical with that observed in the X-ray structure of the wild-type protein plus an unfolded region. Small-angle X-ray scattering analysis indicates that the calculated ensemble of structures is consistent with the actual degree of expansion of the intermediate. The unfolded region encompasses discontinuous sequence segments that cluster in the 3D structure of the native protein forming the FMN cofactor binding loops and the binding site of a variety of partner proteins. Analysis of the apoflavodoxin inner interfaces reveals that those becoming destabilized in the intermediate are more polar than other inner interfaces of the protein. Natively folded proteins contain hydrophobic cores formed by the packing of hydrophobic surfaces, while natively unfolded proteins are rich in polar residues. The structure of the apoflavodoxin thermal intermediate suggests that the regions of natively folded proteins that are easily responsive to thermal activation may contain cores of intermediate hydrophobicity. Copyright (c) 2010 Elsevier Ltd. All rights reserved.

  5. Structural study of surfactant-dependent interaction with protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mehan, Sumit; Aswal, Vinod K., E-mail: vkaswal@barc.gov.in; Kohlbrecher, Joachim

    2015-06-24

    Small-angle neutron scattering (SANS) has been used to study the complex structure of anionic BSA protein with three different (cationic DTAB, anionic SDS and non-ionic C12E10) surfactants. These systems form very different surfactant-dependent complexes. We show that the structure of protein-surfactant complex is initiated by the site-specific electrostatic interaction between the components, followed by the hydrophobic interaction at high surfactant concentrations. It is also found that hydrophobic interaction is preferred over the electrostatic interaction in deciding the resultant structure of protein-surfactant complexes.

  6. Structural study of surfactant-dependent interaction with protein

    NASA Astrophysics Data System (ADS)

    Mehan, Sumit; Aswal, Vinod K.; Kohlbrecher, Joachim

    2015-06-01

    Small-angle neutron scattering (SANS) has been used to study the complex structure of anionic BSA protein with three different (cationic DTAB, anionic SDS and non-ionic C12E10) surfactants. These systems form very different surfactant-dependent complexes. We show that the structure of protein-surfactant complex is initiated by the site-specific electrostatic interaction between the components, followed by the hydrophobic interaction at high surfactant concentrations. It is also found that hydrophobic interaction is preferred over the electrostatic interaction in deciding the resultant structure of protein-surfactant complexes.

  7. Structural Elements Regulating AAA+ Protein Quality Control Machines.

    PubMed

    Chang, Chiung-Wen; Lee, Sukyeong; Tsai, Francis T F

    2017-01-01

    Members of the ATPases Associated with various cellular Activities (AAA+) superfamily participate in essential and diverse cellular pathways in all kingdoms of life by harnessing the energy of ATP binding and hydrolysis to drive their biological functions. Although most AAA+ proteins share a ring-shaped architecture, AAA+ proteins have evolved distinct structural elements that are fine-tuned to their specific functions. A central question in the field is how ATP binding and hydrolysis are coupled to substrate translocation through the central channel of ring-forming AAA+ proteins. In this mini-review, we will discuss structural elements present in AAA+ proteins involved in protein quality control, drawing similarities to their known role in substrate interaction by AAA+ proteins involved in DNA translocation. Elements to be discussed include the pore loop-1, the Inter-Subunit Signaling (ISS) motif, and the Pre-Sensor I insert (PS-I) motif. Lastly, we will summarize our current understanding on the inter-relationship of those structural elements and propose a model how ATP binding and hydrolysis might be coupled to polypeptide translocation in protein quality control machines.

  8. Structure of synaptophysin: a hexameric MARVEL-domain channel protein.

    PubMed

    Arthur, Christopher P; Stowell, Michael H B

    2007-06-01

    Synaptophysin I (SypI) is an archetypal member of the MARVEL-domain family of integral membrane proteins and one of the first synaptic vesicle proteins to be identified and cloned. Most all MARVEL-domain proteins are involved in membrane apposition and vesicle-trafficking events, but their precise role in these processes is unclear. We have purified mammalian SypI and determined its three-dimensional (3D) structure by using electron microscopy and single-particle 3D reconstruction. The hexameric structure resembles an open basket with a large pore and tenuous interactions within the cytosolic domain. The structure suggests a model for Synaptophysin's role in fusion and recycling that is regulated by known interactions with the SNARE machinery. This 3D structure of a MARVEL-domain protein provides a structural foundation for understanding the role of these important proteins in a variety of biological processes.

  9. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein In vitro Digestibility and Solubility.

    PubMed

    Bai, Mingmei; Qin, Guixin; Sun, Zewei; Long, Guohui

    2016-08-01

    The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller's dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR) spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area) of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003); moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004). On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (p<0.001) and solubility (p = 0.002). These results demonstrate that the molecular structure characteristics of feed proteins are closely related to their in vitro digestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.

  10. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein In vitro Digestibility and Solubility

    PubMed Central

    Bai, Mingmei; Qin, Guixin; Sun, Zewei; Long, Guohui

    2016-01-01

    The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller’s dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR) spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area) of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003); moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004). On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (p<0.001) and solubility (p = 0.002). These results demonstrate that the molecular structure characteristics of feed proteins are closely related to their in vitro digestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins. PMID:26954145

  11. Identification of Conserved Water Sites in Protein Structures for Drug Design.

    PubMed

    Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka

    2017-12-26

    Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.

  12. Blind test of physics-based prediction of protein structures.

    PubMed

    Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A

    2009-02-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.

  13. Blind Test of Physics-Based Prediction of Protein Structures

    PubMed Central

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  14. Protein Structural Analysis via Mass Spectrometry-Based Proteomics

    PubMed Central

    Artigues, Antonio; Nadeau, Owen W.; Rimmer, Mary Ashley; Villar, Maria T.; Du, Xiuxia; Fenton, Aron W.; Carlson, Gerald M.

    2017-01-01

    Modern mass spectrometry (MS) technologies have provided a versatile platform that can be combined with a large number of techniques to analyze protein structure and dynamics. These techniques include the three detailed in this chapter: 1) hydrogen/deuterium exchange (HDX), 2) limited proteolysis, and 3) chemical crosslinking (CX). HDX relies on the change in mass of a protein upon its dilution into deuterated buffer, which results in varied deuterium content within its backbone amides. Structural information on surface exposed, flexible or disordered linker regions of proteins can be achieved through limited proteolysis, using a variety of proteases and only small extents of digestion. CX refers to the covalent coupling of distinct chemical species and has been used to analyze the structure, function and interactions of proteins by identifying crosslinking sites that are formed by small multi-functional reagents, termed crosslinkers. Each of these MS applications is capable of revealing structural information for proteins when used either with or without other typical high resolution techniques, including NMR and X-ray crystallography. PMID:27975228

  15. Implementation of a parallel protein structure alignment service on cloud.

    PubMed

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  16. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    PubMed Central

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  17. Rapid search for tertiary fragments reveals protein sequence–structure relationships

    PubMed Central

    Zhou, Jianfu; Grigoryan, Gevorg

    2015-01-01

    Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure. PMID:25420575

  18. Contemporary Methodology for Protein Structure Determination.

    ERIC Educational Resources Information Center

    Hunkapiller, Michael W.; And Others

    1984-01-01

    Describes the nature and capabilities of methods used to characterize protein and peptide structure, indicating that they have undergone changes which have improved the speed, reliability, and applicability of the process. Also indicates that high-performance liquid chromatography and gel electrophoresis have made purifying proteins and peptides a…

  19. The Structure and Function of Non-Collagenous Bone Proteins

    NASA Technical Reports Server (NTRS)

    Hook, Magnus; McQuillan, David J.

    1997-01-01

    The research done under the cooperative research agreement for the project titled 'The structure and function of non-collagenous bone proteins' represented the first phase of an ongoing program to define the structural and functional relationships of the principal noncollagenous proteins in bone. An ultimate goal of this research is to enable design and execution of useful pharmacological compounds that will have a beneficial effect in treatment of osteoporosis, both land-based and induced by long-duration space travel. The goals of the now complete first phase were as follows: 1. Establish and/or develop powerful recombinant protein expression systems; 2. Develop and refine isolation and purification of recombinant proteins; 3. Express wild-type non-collagenous bone proteins; 4. Express site-specific mutant proteins and domains of wild-type proteins to enhance likelihood of crystal formation for subsequent solution of structure.

  20. MolTalk--a programming library for protein structures and structure analysis.

    PubMed

    Diemand, Alexander V; Scheib, Holger

    2004-04-19

    Two of the mostly unsolved but increasingly urgent problems for modern biologists are a) to quickly and easily analyse protein structures and b) to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB). Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn. We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page http://www.moltalk.org following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code. MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications:1) To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot.2) To quickly retrieve information for (a limited number of

  1. Predicting nucleic acid binding interfaces from structural models of proteins.

    PubMed

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  2. Predicting nucleic acid binding interfaces from structural models of proteins

    PubMed Central

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2011-01-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767

  3. Understand protein functions by comparing the similarity of local structural environments.

    PubMed

    Chen, Jiawen; Xie, Zhong-Ru; Wu, Yinghao

    2017-02-01

    The three-dimensional structures of proteins play an essential role in regulating binding between proteins and their partners, offering a direct relationship between structures and functions of proteins. It is widely accepted that the function of a protein can be determined if its structure is similar to other proteins whose functions are known. However, it is also observed that proteins with similar global structures do not necessarily correspond to the same function, while proteins with very different folds can share similar functions. This indicates that function similarity is originated from the local structural information of proteins instead of their global shapes. We assume that proteins with similar local environments prefer binding to similar types of molecular targets. In order to testify this assumption, we designed a new structural indicator to define the similarity of local environment between residues in different proteins. This indicator was further used to calculate the probability that a given residue binds to a specific type of structural neighbors, including DNA, RNA, small molecules and proteins. After applying the method to a large-scale non-redundant database of proteins, we show that the positive signal of binding probability calculated from the local structural indicator is statistically meaningful. In summary, our studies suggested that the local environment of residues in a protein is a good indicator to recognize specific binding partners of the protein. The new method could be a potential addition to a suite of existing template-based approaches for protein function prediction. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Infrared light-induced protein crystallization. Structuring of protein interfacial water and periodic self-assembly

    NASA Astrophysics Data System (ADS)

    Kowacz, Magdalena; Marchel, Mateusz; Juknaité, Lina; Esperança, José M. S. S.; Romão, Maria João; Carvalho, Ana Luísa; Rebelo, Luís Paulo N.

    2017-01-01

    We show that a physical trigger, a non-ionizing infrared (IR) radiation at wavelengths strongly absorbed by liquid water, can be used to induce and kinetically control protein (periodic) self-assembly in solution. This phenomenon is explained by considering the effect of IR light on the structuring of protein interfacial water. Our results indicate that the IR radiation can promote enhanced mutual correlations of water molecules in the protein hydration shell. We report on the radiation-induced increase in both the strength and cooperativeness of H-bonds. The presence of a structured dipolar hydration layer can lead to attractive interactions between like-charged biomacromolecules in solution (and crystal nucleation events). Furthermore, our study suggests that enveloping the protein within a layer of structured solvent (an effect enhanced by IR light) can prevent the protein non-specific aggregation favoring periodic self-assembly. Recognizing the ability to affect protein-water interactions by means of IR radiation may have important implications for biological and bio-inspired systems.

  5. Automated crystallographic system for high-throughput protein structure determination.

    PubMed

    Brunzelle, Joseph S; Shafaee, Padram; Yang, Xiaojing; Weigand, Steve; Ren, Zhong; Anderson, Wayne F

    2003-07-01

    High-throughput structural genomic efforts require software that is highly automated, distributive and requires minimal user intervention to determine protein structures. Preliminary experiments were set up to test whether automated scripts could utilize a minimum set of input parameters and produce a set of initial protein coordinates. From this starting point, a highly distributive system was developed that could determine macromolecular structures at a high throughput rate, warehouse and harvest the associated data. The system uses a web interface to obtain input data and display results. It utilizes a relational database to store the initial data needed to start the structure-determination process as well as generated data. A distributive program interface administers the crystallographic programs which determine protein structures. Using a test set of 19 protein targets, 79% were determined automatically.

  6. Challenging the state of the art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10.

    PubMed

    Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V; Craig, Timothy K; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C; van Raaij, Mark J; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten

    2014-02-01

    For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. Copyright © 2013 The Authors. Wiley Periodicals, Inc.

  7. Challenging the state-of-the-art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J. Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V.; Craig, Timothy K.; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C.; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten

    2014-01-01

    For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, over 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this paper, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict trans-membrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin IL-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fibre protein gp17 from bacteriophage T7; the Bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. PMID:24318984

  8. From Ramachandran Maps to Tertiary Structures of Proteins.

    PubMed

    DasGupta, Debarati; Kaushik, Rahul; Jayaram, B

    2015-08-27

    Sequence to structure of proteins is an unsolved problem. A possible coarse grained resolution to this entails specification of all the torsional (Φ, Ψ) angles along the backbone of the polypeptide chain. The Ramachandran map quite elegantly depicts the allowed conformational (Φ, Ψ) space of proteins which is still very large for the purposes of accurate structure generation. We have divided the allowed (Φ, Ψ) space in Ramachandran maps into 27 distinct conformations sufficient to regenerate a structure to within 5 Å from the native, at least for small proteins, thus reducing the structure prediction problem to a specification of an alphanumeric string, i.e., the amino acid sequence together with one of the 27 conformations preferred by each amino acid residue. This still theoretically results in 27(n) conformations for a protein comprising "n" amino acids. We then investigated the spatial correlations at the two-residue (dipeptide) and three-residue (tripeptide) levels in what may be described as higher order Ramachandran maps, with the premise that the allowed conformational space starts to shrink as we introduce neighborhood effects. We found, for instance, for a tripeptide which potentially can exist in any of the 27(3) "allowed" conformations, three-fourths of these conformations are redundant to the 95% confidence level, suggesting sequence context dependent preferred conformations. We then created a look-up table of preferred conformations at the tripeptide level and correlated them with energetically favorable conformations. We found in particular that Boltzmann probabilities calculated from van der Waals energies for each conformation of tripeptides correlate well with the observed populations in the structural database (the average correlation coefficient is ∼0.8). An alpha-numeric string and hence the tertiary structure can be generated for any sequence from the look-up table within minutes on a single processor and to a higher level of accuracy

  9. Using linear algebra for protein structural comparison and classification

    PubMed Central

    2009-01-01

    In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in. PMID:21637532

  10. Using linear algebra for protein structural comparison and classification.

    PubMed

    Gomide, Janaína; Melo-Minardi, Raquel; Dos Santos, Marcos Augusto; Neshich, Goran; Meira, Wagner; Lopes, Júlio César; Santoro, Marcelo

    2009-07-01

    In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.

  11. Protein Structure Determination by Assembling Super-Secondary Structure Motifs Using Pseudocontact Shifts.

    PubMed

    Pilla, Kala Bharath; Otting, Gottfried; Huber, Thomas

    2017-03-07

    Computational and nuclear magnetic resonance hybrid approaches provide efficient tools for 3D structure determination of small proteins, but currently available algorithms struggle to perform with larger proteins. Here we demonstrate a new computational algorithm that assembles the 3D structure of a protein from its constituent super-secondary structural motifs (Smotifs) with the help of pseudocontact shift (PCS) restraints for backbone amide protons, where the PCSs are produced from different metal centers. The algorithm, DINGO-PCS (3D assembly of Individual Smotifs to Near-native Geometry as Orchestrated by PCSs), employs the PCSs to recognize, orient, and assemble the constituent Smotifs of the target protein without any other experimental data or computational force fields. Using a universal Smotif database, the DINGO-PCS algorithm exhaustively enumerates any given Smotif. We benchmarked the program against ten different protein targets ranging from 100 to 220 residues with different topologies. For nine of these targets, the method was able to identify near-native Smotifs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein

  13. Structural studies of human glioma pathogenesis-related protein 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Asojo, Oluwatoyin A., E-mail: oasojo@unmc.edu; Koski, Raymond A.; Bonafé, Nathalie

    2011-10-01

    Structural analysis of a truncated soluble domain of human glioma pathogenesis-related protein 1, a membrane protein implicated in the proliferation of aggressive brain cancer, is presented. Human glioma pathogenesis-related protein 1 (GLIPR1) is a membrane protein that is highly upregulated in brain cancers but is barely detectable in normal brain tissue. GLIPR1 is composed of a signal peptide that directs its secretion, a conserved cysteine-rich CAP (cysteine-rich secretory proteins, antigen 5 and pathogenesis-related 1 proteins) domain and a transmembrane domain. GLIPR1 is currently being investigated as a candidate for prostate cancer gene therapy and for glioblastoma targeted therapy. Crystal structuresmore » of a truncated soluble domain of the human GLIPR1 protein (sGLIPR1) solved by molecular replacement using a truncated polyalanine search model of the CAP domain of stecrisp, a snake-venom cysteine-rich secretory protein (CRISP), are presented. The correct molecular-replacement solution could only be obtained by removing all loops from the search model. The native structure was refined to 1.85 Å resolution and that of a Zn{sup 2+} complex was refined to 2.2 Å resolution. The latter structure revealed that the putative binding cavity coordinates Zn{sup 2+} similarly to snake-venom CRISPs, which are involved in Zn{sup 2+}-dependent mechanisms of inflammatory modulation. Both sGLIPR1 structures have extensive flexible loop/turn regions and unique charge distributions that were not observed in any of the previously reported CAP protein structures. A model is also proposed for the structure of full-length membrane-bound GLIPR1.« less

  14. Structural Assembly of Multidomain Proteins and Protein Complexes Guided by the Overall Rotational Diffusion Tensor

    PubMed Central

    Ryabov, Yaroslav; Fushman, David

    2008-01-01

    We present a simple and robust approach that uses the overall rotational diffusion tensor as a structural constraint for domain positioning in multidomain proteins and protein-protein complexes. This method offers the possibility to use NMR relaxation data for detailed structure characterization of such systems provided the structures of individual domains are available. The proposed approach extends the concept of using long-range information contained in the overall rotational diffusion tensor. In contrast to the existing approaches, we use both the principal axes and principal values of protein’s rotational diffusion tensor to determine not only the orientation but also the relative positioning of the individual domains in a protein. This is achieved by finding the domain arrangement in a molecule that provides the best possible agreement with all components of the overall rotational diffusion tensor derived from experimental data. The accuracy of the proposed approach is demonstrated for two protein systems with known domain arrangement and parameters of the overall tumbling: the HIV-1 protease homodimer and Maltose Binding Protein. The accuracy of the method and its sensitivity to domain positioning is also tested using computer-generated data for three protein complexes, for which the experimental diffusion tensors are not available. In addition, the proposed method is applied here to determine, for the first time, the structure of both open and closed conformations of Lys48-linked di-ubiquitin chain, where domain motions render impossible accurate structure determination by other methods. The proposed method opens new avenues for improving structure characterization of proteins in solution. PMID:17550252

  15. Synchrotron IR microspectroscopy for protein structure analysis: Potential and questions

    DOE PAGES

    Yu, Peiqiang

    2006-01-01

    Synchrotron radiation-based Fourier transform infrared microspectroscopy (S-FTIR) has been developed as a rapid, direct, non-destructive, bioanalytical technique. This technique takes advantage of synchrotron light brightness and small effective source size and is capable of exploring the molecular chemical make-up within microstructures of a biological tissue without destruction of inherent structures at ultra-spatial resolutions within cellular dimension. To date there has been very little application of this advanced technique to the study of pure protein inherent structure at a cellular level in biological tissues. In this review, a novel approach was introduced to show the potential of the newly developed, advancedmore » synchrotron-based analytical technology, which can be used to localize relatively “pure“ protein in the plant tissues and relatively reveal protein inherent structure and protein molecular chemical make-up within intact tissue at cellular and subcellular levels. Several complex protein IR spectra data analytical techniques (Gaussian and Lorentzian multi-component peak modeling, univariate and multivariate analysis, principal component analysis (PCA), and hierarchical cluster analysis (CLA) are employed to relatively reveal features of protein inherent structure and distinguish protein inherent structure differences between varieties/species and treatments in plant tissues. By using a multi-peak modeling procedure, RELATIVE estimates (but not EXACT determinations) for protein secondary structure analysis can be made for comparison purpose. The issues of pro- and anti-multi-peaking modeling/fitting procedure for relative estimation of protein structure were discussed. By using the PCA and CLA analyses, the plant molecular structure can be qualitatively separate one group from another, statistically, even though the spectral assignments are not known. The synchrotron-based technology provides a new approach for protein structure research in

  16. Genome Pool Strategy for Structural Coverage of Protein Families

    PubMed Central

    Jaroszewski, Lukasz; Slabinski, Lukasz; Wooley, John; Deacon, Ashley M.; Lesley, Scott A.; Wilson, Ian. A.; Godzik, Adam

    2010-01-01

    As noticed by generations of structural biologists, closely homologous proteins may have substantially different crystallization properties and propensities. These observations can be used to systematically introduce additional dimensionality into crystallization trials by targeting homologous proteins from multiple genomes in a “genome pool” strategy. Through extensive use of our recently introduced “crystallization feasibility score” (Slabinski et al., 2007a), we can explain that the genome pool strategy works well because the crystallization feasibility scores are surprisingly broad within families of homologous proteins, with most families containing a range of optimal to very difficult targets. We also show that some families can be regarded as relatively “easy”, where a significant number of proteins are predicted to have optimal crystallization features, and others are “very difficult”, where almost none are predicted to result in a crystal structure. Thus, the outcome of such variable distributions of such crystallizability' preferences leads to uneven structural coverage of known families, with “easier” or “optimal” families having several times more solved structures than “very difficult” ones. Nevertheless, this latter category can be successfully targeted by increasing the number of genomes that are used to select targets from a given family. On average, adding 10 new genomes to the “genome pool” provides more promising targets for 7 “very difficult” families. In contrast, our crystallization feasibility score does not indicate that any specific microbial genomes can be readily classified as “easier” or “very difficult” with respect to providing suitable candidates for crystallization and structure determination. Finally, our analyses show that specific physicochemical properties of the protein sequence favor successful outcomes for structure determination and, hence, the group of proteins with known 3D

  17. What determines the spectrum of protein native state structures?

    PubMed

    Lezon, Timothy R; Banavar, Jayanth R; Lesk, Arthur M; Maritan, Amos

    2006-05-01

    We present a brief summary of the key factors underlying protein structure, as developed in the investigations of Pauling, Ramachandran, and Rose. We then outline a simplified physical model of proteins that focuses on geometry and symmetry. Although this model superficially appears unrelated to the detailed chemical descriptions commonly applied to proteins, we show that it captures the essential elements of the chemistry and provides a unified framework for understanding the common characteristics of folded proteins. We suggest that the spectrum of protein native state structures is determined by geometry and symmetry and the role of the sequence is to choose its native state structure from this predetermined menu. 2006 Wiley-Liss, Inc.

  18. A method of searching for related literature on protein structure analysis by considering a user's intention

    PubMed Central

    2015-01-01

    Background In recent years, with advances in techniques for protein structure analysis, the knowledge about protein structure and function has been published in a vast number of articles. A method to search for specific publications from such a large pool of articles is needed. In this paper, we propose a method to search for related articles on protein structure analysis by using an article itself as a query. Results Each article is represented as a set of concepts in the proposed method. Then, by using similarities among concepts formulated from databases such as Gene Ontology, similarities between articles are evaluated. In this framework, the desired search results vary depending on the user's search intention because a variety of information is included in a single article. Therefore, the proposed method provides not only one input article (primary article) but also additional articles related to it as an input query to determine the search intention of the user, based on the relationship between two query articles. In other words, based on the concepts contained in the input article and additional articles, we actualize a relevant literature search that considers user intention by varying the degree of attention given to each concept and modifying the concept hierarchy graph. Conclusions We performed an experiment to retrieve relevant papers from articles on protein structure analysis registered in the Protein Data Bank by using three query datasets. The experimental results yielded search results with better accuracy than when user intention was not considered, confirming the effectiveness of the proposed method. PMID:25952498

  19. Distance matrix-based approach to protein structure prediction.

    PubMed

    Kloczkowski, Andrzej; Jernigan, Robert L; Wu, Zhijun; Song, Guang; Yang, Lei; Kolinski, Andrzej; Pokarowski, Piotr

    2009-03-01

    Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the

  20. Structural and evolutionary analysis of Leishmania Alba proteins.

    PubMed

    da Costa, Kauê Santana; Galúcio, João Marcos Pereira; Leonardo, Elvis Santos; Cardoso, Guelber; Leal, Élcio; Conde, Guilherme; Lameira, Jerônimo

    2017-10-01

    The Alba superfamily proteins share a common RNA-binding domain. These proteins participate in a variety of regulatory pathways by controlling developmental gene expression. They also interact with ribosomal subunits, translation factors, and other RNA-binding proteins. The Leishmania infantum genome encodes two Alba-domain proteins, LiAlba1 and LiAlba3. In this work, we used homology modeling, protein-protein docking, and molecular dynamics (MD) simulations to explore the details of the Alba1-Alba3-RNA complex from Leishmania infantum at the molecular level. In addition, we compared the structure of LiAlba3 with the human ribonuclease P component, Rpp20. We also mapped the ligand-binding residues on the Alba3 surface to analyze its druggability and performed mutational analyses in Alba3 using alanine scanning to identify residues involved in its function and structural stability. These results suggest that the RGG-box motif of LiAlba1 is important for protein function and stability. Finally, we discuss the function of Alba proteins in the context of pathogen adaptation to host cells. The data provided herein will facilitate further translational research regarding Alba structure and function. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Knowledge-based prediction of protein backbone conformation using a structural alphabet.

    PubMed

    Vetrivel, Iyanar; Mahajan, Swapnil; Tyagi, Manoj; Hoffmann, Lionel; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; de Brevern, Alexandre G; Cadet, Frédéric; Offmann, Bernard

    2017-01-01

    Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

  2. Protein thermal denaturation is modulated by central residues in the protein structure network.

    PubMed

    Souza, Valquiria P; Ikegami, Cecília M; Arantes, Guilherme M; Marana, Sandro R

    2016-03-01

    Network structural analysis, known as residue interaction networks or graphs (RIN or RIG, respectively) or protein structural networks or graphs (PSN or PSG, respectively), comprises a useful tool for detecting important residues for protein function, stability, folding and allostery. In RIN, the tertiary structure is represented by a network in which residues (nodes) are connected by interactions (edges). Such structural networks have consistently presented a few central residues that are important for shortening the pathways linking any two residues in a protein structure. To experimentally demonstrate that central residues effectively participate in protein properties, mutations were directed to seven central residues of the β-glucosidase Sfβgly (β-D-glucoside glucohydrolase; EC 3.2.1.21). These mutations reduced the thermal stability of the enzyme, as evaluated by changes in transition temperature (Tm ) and the denaturation rate at 45 °C. Moreover, mutations directed to the vicinity of a central residue also caused significant decreases in the Tm of Sfβgly and clearly increased the unfolding rate constant at 45 °C. However, mutations at noncentral residues or at surrounding residues did not affect the thermal stability of Sfβgly. Therefore, the data reported in the present study suggest that the perturbation of the central residues reduced the stability of the native structure of Sfβgly. These results are in agreement with previous findings showing that networks are robust, whereas attacks on central nodes cause network failure. Finally, the present study demonstrates that central residues underlie the functional properties of proteins. © 2016 Federation of European Biochemical Societies.

  3. Cross-Linking/Mass Spectrometry for Studying Protein Structures and Protein-Protein Interactions: Where Are We Now and Where Should We Go from Here?

    PubMed

    Sinz, Andrea

    2018-05-28

    Structural mass spectrometry (MS) is gaining increasing importance for deriving valuable three-dimensional structural information on proteins and protein complexes, and it complements existing techniques, such as NMR spectroscopy and X-ray crystallography. Structural MS unites different MS-based techniques, such as hydrogen/deuterium exchange, native MS, ion-mobility MS, protein footprinting, and chemical cross-linking/MS, and it allows fundamental questions in structural biology to be addressed. In this Minireview, I will focus on the cross-linking/MS strategy. This method not only delivers tertiary structural information on proteins, but is also increasingly being used to decipher protein interaction networks, both in vitro and in vivo. Cross-linking/MS is currently one of the most promising MS-based approaches to derive structural information on very large and transient protein assemblies and intrinsically disordered proteins. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Tracing Primordial Protein Evolution through Structurally Guided Stepwise Segment Elongation*

    PubMed Central

    Watanabe, Hideki; Yamasaki, Kazuhiko; Honda, Shinya

    2014-01-01

    The understanding of how primordial proteins emerged has been a fundamental and longstanding issue in biology and biochemistry. For a better understanding of primordial protein evolution, we synthesized an artificial protein on the basis of an evolutionary hypothesis, segment-based elongation starting from an autonomously foldable short peptide. A 10-residue protein, chignolin, the smallest foldable polypeptide ever reported, was used as a structural support to facilitate higher structural organization and gain-of-function in the development of an artificial protein. Repetitive cycles of segment elongation and subsequent phage display selection successfully produced a 25-residue protein, termed AF.2A1, with nanomolar affinity against the Fc region of immunoglobulin G. AF.2A1 shows exquisite molecular recognition ability such that it can distinguish conformational differences of the same molecule. The structure determined by NMR measurements demonstrated that AF.2A1 forms a globular protein-like conformation with the chignolin-derived β-hairpin and a tryptophan-mediated hydrophobic core. Using sequence analysis and a mutation study, we discovered that the structural organization and gain-of-function emerged from the vicinity of the chignolin segment, revealing that the structural support served as the core in both structural and functional development. Here, we propose an evolutionary model for primordial proteins in which a foldable segment serves as the evolving core to facilitate structural and functional evolution. This study provides insights into primordial protein evolution and also presents a novel methodology for designing small sized proteins useful for industrial and pharmaceutical applications. PMID:24356963

  5. Four signature motifs define the first class of structurally related large coiled-coil proteins in plants.

    PubMed Central

    Gindullis, Frank; Rose, Annkatrin; Patel, Shalaka; Meier, Iris

    2002-01-01

    Background Animal and yeast proteins containing long coiled-coil domains are involved in attaching other proteins to the large, solid-state components of the cell. One subgroup of long coiled-coil proteins are the nuclear lamins, which are involved in attaching chromatin to the nuclear envelope and have recently been implicated in inherited human diseases. In contrast to other eukaryotes, long coiled-coil proteins have been barely investigated in plants. Results We have searched the completed Arabidopsis genome and have identified a family of structurally related long coiled-coil proteins. Filament-like plant proteins (FPP) were identified by sequence similarity to a tomato cDNA that encodes a coiled-coil protein which interacts with the nuclear envelope-associated protein, MAF1. The FPP family is defined by four novel unique sequence motifs and by two clusters of long coiled-coil domains separated by a non-coiled-coil linker. All family members are expressed in a variety of Arabidopsis tissues. A homolog sharing the structural features was identified in the monocot rice, indicating conservation among angiosperms. Conclusion Except for myosins, this is the first characterization of a family of long coiled-coil proteins in plants. The tomato homolog of the FPP family binds in a yeast two-hybrid assay to a nuclear envelope-associated protein. This might suggest that FPP family members function in nuclear envelope biology. Because the full Arabidopsis genome does not appear to contain genes for lamins, it is of interest to investigate other long coiled-coil proteins, which might functionally replace lamins in the plant kingdom. PMID:11972898

  6. Probing Protein Structure in Vivo with FRET

    PubMed Central

    Davis, Trisha; Muller, Eric

    2012-01-01

    Fluorescence resonance energy transfer (FRET) is widely used to construct probes for cellular activities and to complement two-hybrid results that predict protein-protein interactions. The Yeast Resource Center promotes an underutilized potential of FRET as an in vivo tool to position proteins within low resolution structures derived from electron microscopy. The success of this approach using widefield microscopy depends upon the choice of filter sets, standardized image acquisition, a robust metric and controls matched to the structure under investigation. A comparison of various CFP and YFP filter combinations from Chroma and Semrock demonstrated the strength of the Chroma filters when coupled with our FRET metric, termed FretR. Coupling CFP and YFP to a selection of proteins of known structure allowed us to create a standard curve of FretR versus distance. How well other FRET metrics conform was also evaluated. Finally FretR was linked to an approximation of the efficiency of energy transfer. Together this feature set has allowed us to contribute to our understanding of the organization of the yeast spindle pole body, cohesin complex and gamma-tubulin complex.

  7. Crystal structure of mitochondrial respiratory membrane protein complex II.

    PubMed

    Sun, Fei; Huo, Xia; Zhai, Yujia; Wang, Aojin; Xu, Jianxing; Su, Dan; Bartlam, Mark; Rao, Zihe

    2005-07-01

    The mitochondrial respiratory Complex II or succinate:ubiquinone oxidoreductase (SQR) is an integral membrane protein complex in both the tricarboxylic acid cycle and aerobic respiration. Here we report the first crystal structure of Complex II from porcine heart at 2.4 A resolution and its complex structure with inhibitors 3-nitropropionate and 2-thenoyltrifluoroacetone (TTFA) at 3.5 A resolution. Complex II is comprised of two hydrophilic proteins, flavoprotein (Fp) and iron-sulfur protein (Ip), and two transmembrane proteins (CybL and CybS), as well as prosthetic groups required for electron transfer from succinate to ubiquinone. The structure correlates the protein environments around prosthetic groups with their unique midpoint redox potentials. Two ubiquinone binding sites are discussed and elucidated by TTFA binding. The Complex II structure provides a bona fide model for study of the mitochondrial respiratory system and human mitochondrial diseases related to mutations in this complex.

  8. Discrete Haar transform and protein structure.

    PubMed

    Morosetti, S

    1997-12-01

    The discrete Haar transform of the sequence of the backbone dihedral angles (phi and psi) was performed over a set of X-ray protein structures of high resolution from the Brookhaven Protein Data Bank. Afterwards, the new dihedral angles were calculated by the inverse transform, using a growing number of Haar functions, from the lower to the higher degree. New structures were obtained using these dihedral angles, with standard values for bond lengths and angles, and with omega = 0 degree. The reconstructed structures were compared with the experimental ones, and analyzed by visual inspection and statistical analysis. When half of the Haar coefficients were used, all the reconstructed structures were not yet collapsed to a tertiary folding, but they showed yet realized most of the secondary motifs. These results indicate a substantial separation of structural information in the space of Haar transform, with the secondary structural information mainly present in the Haar coefficients of lower degrees, and the tertiary one present in the higher degree coefficients. Because of this separation, the representation of the folded structures in the space of Haar transform seems a promising candidate to encompass the problem of premature convergence in genetic algorithms.

  9. MODBASE, a database of annotated comparative protein structure models

    PubMed Central

    Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej

    2002-01-01

    MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309

  10. Tuning structure of oppositely charged nanoparticle and protein complexes

    NASA Astrophysics Data System (ADS)

    Kumar, Sugam; Aswal, V. K.; Callow, P.

    2014-04-01

    Small-angle neutron scattering (SANS) has been used to probe the structures of anionic silica nanoparticles (LS30) and cationic lyszyme protein (M.W. 14.7kD, I.P. ˜ 11.4) by tuning their interaction through the pH variation. The protein adsorption on nanoparticles is found to be increasing with pH and determined by the electrostatic attraction between two components as well as repulsion between protein molecules. We show the strong electrostatic attraction between nanoparticles and protein molecules leads to protein-mediated aggregation of nanoparticles which are characterized by fractal structures. At pH 5, the protein adsorption gives rise to nanoparticle aggregation having surface fractal morphology with close packing of nanoparticles. The surface fractals transform to open structures of mass fractal morphology at higher pH (7 and 9) on approaching isoelectric point (I.P.).

  11. MEGADOCK: An All-to-All Protein-Protein Interaction Prediction System Using Tertiary Structure Data

    PubMed Central

    Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package “MEGADOCK” that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673

  12. Linking structural biology with genome research: Beamlines for the Berlin ``Protein Structure Factory'' initiative

    NASA Astrophysics Data System (ADS)

    Illing, Gerd; Saenger, Wolfram; Heinemann, Udo

    2000-06-01

    The Protein Structure Factory will be established to characterize proteins encoded by human genes or cDNAs, which will be selected by criteria of potential structural novelty or medical or biotechnological usefulness. It represents an integrative approach to structure analysis combining bioinformatics techniques, automated gene expression and purification of gene products, generation of a biophysical fingerprint of the proteins and the determination of their three-dimensional structures either by NMR spectroscopy or by X-ray diffraction. The use of synchrotron radiation will be crucial to the Protein Structure Factory: high brilliance and tunable wavelengths are prerequisites for fast data collection, the use of small crystals and multiwavelength anomalous diffraction (MAD) phasing. With the opening of BESSY II, direct access to a third-generation XUV storage ring source with excellent conditions is available nearby. An insertion device with two MAD beamlines and one constant energy station will be set up until 2001.

  13. The role of porcine reproductive and respiratory syndrome (PRRS) virus structural and non-structural proteins in virus pathogenesis.

    PubMed

    Music, Nedzad; Gagnon, Carl A

    2010-12-01

    Porcine reproductive and respiratory syndrome (PRRS) is an economically devastating viral disease affecting the swine industry worldwide. The etiological agent, PRRS virus (PRRSV), possesses a RNA viral genome with nine open reading frames (ORFs). The ORF1a and ORF1b replicase-associated genes encode the polyproteins pp1a and pp1ab, respectively. The pp1a is processed in nine non-structural proteins (nsps): nsp1α, nsp1β, and nsp2 to nsp8. Proteolytic cleavage of pp1ab generates products nsp9 to nsp12. The proteolytic pp1a cleavage products process and cleave pp1a and pp1ab into nsp products. The nsp9 to nsp12 are involved in virus genome transcription and replication. The 3' end of the viral genome encodes four minor and three major structural proteins. The GP(2a), GP₃ and GP₄ (encoded by ORF2a, 3 and 4), are glycosylated membrane associated minor structural proteins. The fourth minor structural protein, the E protein (encoded by ORF2b), is an unglycosylated membrane associated protein. The viral envelope contains two major structural proteins: a glycosylated major envelope protein GP₅ (encoded by ORF5) and an unglycosylated membrane M protein (encoded by ORF6). The third major structural protein is the nucleocapsid N protein (encoded by ORF7). All PRRSV non-structural and structural proteins are essential for virus replication, and PRRSV infectivity is relatively intolerant to subtle changes within the structural proteins. PRRSV virulence is multigenic and resides in both the non-structural and structural viral proteins. This review discusses the molecular characteristics, biological and immunological functions of the PRRSV structural and nsps and their involvement in the virus pathogenesis.

  14. Improved in-cell structure determination of proteins at near-physiological concentration

    PubMed Central

    Ikeya, Teppei; Hanashima, Tomomi; Hosoya, Saori; Shimazaki, Manato; Ikeda, Shiro; Mishima, Masaki; Güntert, Peter; Ito, Yutaka

    2016-01-01

    Investigating three-dimensional (3D) structures of proteins in living cells by in-cell nuclear magnetic resonance (NMR) spectroscopy opens an avenue towards understanding the structural basis of their functions and physical properties under physiological conditions inside cells. In-cell NMR provides data at atomic resolution non-invasively, and has been used to detect protein-protein interactions, thermodynamics of protein stability, the behavior of intrinsically disordered proteins, etc. in cells. However, so far only a single de novo 3D protein structure could be determined based on data derived only from in-cell NMR. Here we introduce methods that enable in-cell NMR protein structure determination for a larger number of proteins at concentrations that approach physiological ones. The new methods comprise (1) advances in the processing of non-uniformly sampled NMR data, which reduces the measurement time for the intrinsically short-lived in-cell NMR samples, (2) automatic chemical shift assignment for obtaining an optimal resonance assignment, and (3) structure refinement with Bayesian inference, which makes it possible to calculate accurate 3D protein structures from sparse data sets of conformational restraints. As an example application we determined the structure of the B1 domain of protein G at about 250 μM concentration in living E. coli cells. PMID:27910948

  15. Design of structurally distinct proteins using strategies inspired by evolution

    DOE PAGES

    Jacobs, T. M.; Williams, B.; Williams, T.; ...

    2016-05-06

    Natural recombination combines pieces of preexisting proteins to create new tertiary structures and functions. In this paper, we describe a computational protocol, called SEWING, which is inspired by this process and builds new proteins from connected or disconnected pieces of existing structures. Helical proteins designed with SEWING contain structural features absent from other de novo designed proteins and, in some cases, remain folded at more than 100°C. High-resolution structures of the designed proteins CA01 and DA05R1 were solved by x-ray crystallography (2.2 angstrom resolution) and nuclear magnetic resonance, respectively, and there was excellent agreement with the design models. Finally, thismore » method provides a new strategy to rapidly create large numbers of diverse and designable protein scaffolds.« less

  16. Structural studies of G protein-coupled receptors.

    PubMed

    Lu, Mengjie; Wu, Beili

    2016-11-01

    G protein-coupled receptors (GPCRs) comprise the largest membrane protein family. These receptors sense a variety of signaling molecules, activate multiple intracellular signal pathways, and act as the targets of over 40% of marketed drugs. Recent progress on GPCR structural studies provides invaluable insights into the structure-function relationship of the GPCR superfamily, deepening our understanding about the molecular mechanisms of GPCR signal transduction. Here, we review recent breakthroughs on GPCR structure determination and the structural features of GPCRs, and take the structures of chemokine receptor CCR5 and purinergic receptors P2Y 1 R and P2Y 12 R as examples to discuss the importance of GPCR structures on functional studies and drug discovery. In addition, we discuss the prospect of GPCR structure-based drug discovery. © 2016 IUBMB Life, 68(11):894-903, 2016. © 2016 International Union of Biochemistry and Molecular Biology.

  17. Prelude and Fugue, predicting local protein structure, early folding regions and structural weaknesses.

    PubMed

    Kwasigroch, Jean Marc; Rooman, Marianne

    2006-07-15

    Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude(&Fugue) computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. (Prelude&)Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold. http://babylone.ulb.ac.be/Prelude_and_Fugue.

  18. 3DProIN: Protein-Protein Interaction Networks and Structure Visualization.

    PubMed

    Li, Hui; Liu, Chunmei

    2014-06-14

    3DProIN is a computational tool to visualize protein-protein interaction networks in both two dimensional (2D) and three dimensional (3D) view. It models protein-protein interactions in a graph and explores the biologically relevant features of the tertiary structures of each protein in the network. Properties such as color, shape and name of each node (protein) of the network can be edited in either 2D or 3D views. 3DProIN is implemented using 3D Java and C programming languages. The internet crawl technique is also used to parse dynamically grasped protein interactions from protein data bank (PDB). It is a java applet component that is embedded in the web page and it can be used on different platforms including Linux, Mac and Window using web browsers such as Firefox, Internet Explorer, Chrome and Safari. It also was converted into a mac app and submitted to the App store as a free app. Mac users can also download the app from our website. 3DProIN is available for academic research at http://bicompute.appspot.com.

  19. Rapid functional diversification in the structurally conserved ELAV family of neuronal RNA binding proteins

    PubMed Central

    Samson, Marie-Laure

    2008-01-01

    Background The Drosophila gene embryonic lethal abnormal visual system (elav) is the prototype of a gene family present in all metazoans. Its members encode structurally conserved neuronal proteins with three RNA Recognition Motifs (RRM) but they paradoxically act at diverse levels of post-transcriptional regulation. In an attempt to understand the history of this family, we searched for orthologs in eleven completely sequenced genomes, including those of humans, D. melanogaster and C. elegans, for which cDNAs are available. Results We analyzed 23 orthologs/paralogs of elav, and found evidence of gain/loss of gene copy number. For one set of genes, including elav itself, the coding sequences are free of introns and their products most resemble ELAV. The remaining genes show remarkable conservation of their exon organization, and their products most resemble FNE and RBP9, proteins encoded by the two elav paralogs of Drosophila. Remarkably, three of the conserved exon junctions are both close to structural elements, involved respectively in protein-RNA interactions and in the regulation of sub-cellular localization, and in the vicinity of diverse sequence variations. Conclusion The data indicate that the essential elav gene of Drosophila is newly emerged, restricted to dipterans and of retrotransposed origin. We propose that the conserved exon junctions constitute potential sites for sequence/function modifications, and that RRM binding proteins, whose function relies upon plastic RNA-protein interactions, may have played an important role in brain evolution. PMID:18715504

  20. GalaxyRefineComplex: Refinement of protein-protein complex model structures driven by interface repacking.

    PubMed

    Heo, Lim; Lee, Hasup; Seok, Chaok

    2016-08-18

    Protein-protein docking methods have been widely used to gain an atomic-level understanding of protein interactions. However, docking methods that employ low-resolution energy functions are popular because of computational efficiency. Low-resolution docking tends to generate protein complex structures that are not fully optimized. GalaxyRefineComplex takes such low-resolution docking structures and refines them to improve model accuracy in terms of both interface contact and inter-protein orientation. This refinement method allows flexibility at the protein interface and in the overall docking structure to capture conformational changes that occur upon binding. Symmetric refinement is also provided for symmetric homo-complexes. This method was validated by refining models produced by available docking programs, including ZDOCK and M-ZDOCK, and was successfully applied to CAPRI targets in a blind fashion. An example of using the refinement method with an existing docking method for ligand binding mode prediction of a drug target is also presented. A web server that implements the method is freely available at http://galaxy.seoklab.org/refinecomplex.

  1. Structure modification and functionality of whey proteins: quantitative structure-activity relationship approach.

    PubMed

    Nakai, S; Li-Chan, E

    1985-10-01

    According to the original idea of quantitative structure-activity relationship, electric, hydrophobic, and structural parameters should be taken into consideration for elucidating functionality. Changes in these parameters are reflected in the property of protein solubility upon modification of whey proteins by heating. Although solubility is itself a functional property, it has been utilized to explain other functionalities of proteins. However, better correlations were obtained when hydrophobic parameters of the proteins were used in conjunction with solubility. Various treatments reported in the literature were applied to whey protein concentrate in an attempt to obtain whipping and gelling properties similar to those of egg white. Mapping simplex optimization was used to search for the best results. Improvement in whipping properties by pepsin hydrolysis may have been due to higher protein solubility, and good gelling properties resulting from polyphosphate treatment may have been due to an increase in exposable hydrophobicity. However, the results of angel food cake making were still unsatisfactory.

  2. Membrane protein structure determination — The next generation☆☆☆

    PubMed Central

    Moraes, Isabel; Evans, Gwyndaf; Sanchez-Weatherby, Juan; Newstead, Simon; Stewart, Patrick D. Shaw

    2014-01-01

    The field of Membrane Protein Structural Biology has grown significantly since its first landmark in 1985 with the first three-dimensional atomic resolution structure of a membrane protein. Nearly twenty-six years later, the crystal structure of the beta2 adrenergic receptor in complex with G protein has contributed to another landmark in the field leading to the 2012 Nobel Prize in Chemistry. At present, more than 350 unique membrane protein structures solved by X-ray crystallography (http://blanco.biomol.uci.edu/mpstruc/exp/list, Stephen White Lab at UC Irvine) are available in the Protein Data Bank. The advent of genomics and proteomics initiatives combined with high-throughput technologies, such as automation, miniaturization, integration and third-generation synchrotrons, has enhanced membrane protein structure determination rate. X-ray crystallography is still the only method capable of providing detailed information on how ligands, cofactors, and ions interact with proteins, and is therefore a powerful tool in biochemistry and drug discovery. Yet the growth of membrane protein crystals suitable for X-ray diffraction studies amazingly remains a fine art and a major bottleneck in the field. It is often necessary to apply as many innovative approaches as possible. In this review we draw attention to the latest methods and strategies for the production of suitable crystals for membrane protein structure determination. In addition we also highlight the impact that third-generation synchrotron radiation has made in the field, summarizing the latest strategies used at synchrotron beamlines for screening and data collection from such demanding crystals. This article is part of a Special Issue entitled: Structural and biophysical characterisation of membrane protein-ligand binding. PMID:23860256

  3. Exploring Protein Dynamics Space: The Dynasome as the Missing Link between Protein Structure and Function

    PubMed Central

    Hensen, Ulf; Meyer, Tim; Haas, Jürgen; Rex, René; Vriend, Gert; Grubmüller, Helmut

    2012-01-01

    Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics. PMID:22606222

  4. Applications of graph theory in protein structure identification

    PubMed Central

    2011-01-01

    There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given. PMID:22165974

  5. Integrated Structural Biology for α-Helical Membrane Protein Structure Determination.

    PubMed

    Xia, Yan; Fischer, Axel W; Teixeira, Pedro; Weiner, Brian; Meiler, Jens

    2018-04-03

    While great progress has been made, only 10% of the nearly 1,000 integral, α-helical, multi-span membrane protein families are represented by at least one experimentally determined structure in the PDB. Previously, we developed the algorithm BCL::MP-Fold, which samples the large conformational space of membrane proteins de novo by assembling predicted secondary structure elements guided by knowledge-based potentials. Here, we present a case study of rhodopsin fold determination by integrating sparse and/or low-resolution restraints from multiple experimental techniques including electron microscopy, electron paramagnetic resonance spectroscopy, and nuclear magnetic resonance spectroscopy. Simultaneous incorporation of orthogonal experimental restraints not only significantly improved the sampling accuracy but also allowed identification of the correct fold, which is demonstrated by a protein size-normalized transmembrane root-mean-square deviation as low as 1.2 Å. The protocol developed in this case study can be used for the determination of unknown membrane protein folds when limited experimental restraints are available. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Fast protein tertiary structure retrieval based on global surface shape similarity.

    PubMed

    Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke

    2008-09-01

    Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.

  7. Bayesian comparison of protein structures using partial Procrustes distance.

    PubMed

    Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi

    2017-09-26

    An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.

  8. Knowledge-based computational intelligence development for predicting protein secondary structures from sequences.

    PubMed

    Shen, Hong-Bin; Yi, Dong-Liang; Yao, Li-Xiu; Yang, Jie; Chou, Kuo-Chen

    2008-10-01

    In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.

  9. Columba: an integrated database of proteins, structures, and annotations.

    PubMed

    Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

    2005-03-31

    Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.

  10. Structure and Protein-Protein Interaction Studies on Chlamydia trachomatis Protein CT670 (YscO Homolog)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lorenzini, Emily; Singer, Alexander; Singh, Bhag

    2010-07-28

    Comparative genomic studies have identified many proteins that are found only in various Chlamydiae species and exhibit no significant sequence similarity to any protein in organisms that do not belong to this group. The CT670 protein of Chlamydia trachomatis is one of the proteins whose genes are in one of the type III secretion gene clusters but whose cellular functions are not known. CT670 shares several characteristics with the YscO protein of Yersinia pestis, including the neighboring genes, size, charge, and secondary structure, but the structures and/or functions of these proteins remain to be determined. Although a BLAST search withmore » CT670 did not identify YscO as a related protein, our analysis indicated that these two proteins exhibit significant sequence similarity. In this paper, we report that the CT670 crystal, solved at a resolution of 2 {angstrom}, consists of a single coiled coil containing just two long helices. Gel filtration and analytical ultracentrifugation studies showed that in solution CT670 exists in both monomeric and dimeric forms and that the monomer predominates at lower protein concentrations. We examined the interaction of CT670 with many type III secretion system-related proteins (viz., CT091, CT665, CT666, CT667, CT668, CT669, CT671, CT672, and CT673) by performing bacterial two-hybrid assays. In these experiments, CT670 was found to interact only with the CT671 protein (YscP homolog), whose gene is immediately downstream of ct670. A specific interaction between CT670 and CT671 was also observed when affinity chromatography pull-down experiments were performed. These results suggest that CT670 and CT671 are putative homologs of the YcoO and YscP proteins, respectively, and that they likely form a chaperone-effector pair.« less

  11. Superimposition of protein structures with dynamically weighted RMSD.

    PubMed

    Wu, Di; Wu, Zhijun

    2010-02-01

    In protein modeling, one often needs to superimpose a group of structures for a protein. A common way to do this is to translate and rotate the structures so that the square root of the sum of squares of coordinate differences of the atoms in the structures, called the root-mean-square deviation (RMSD) of the structures, is minimized. While it has provided a general way of aligning a group of structures, this approach has not taken into account the fact that different atoms may have different properties and they should be compared differently. For this reason, when superimposed with RMSD, the coordinate differences of different atoms should be evaluated with different weights. The resulting RMSD is called the weighted RMSD (wRMSD). Here we investigate the use of a special wRMSD for superimposing a group of structures with weights assigned to the atoms according to certain thermal motions of the atoms. We call such an RMSD the dynamically weighted RMSD (dRMSD). We show that the thermal motions of the atoms can be obtained from several sources such as the mean-square fluctuations that can be estimated by Gaussian network model analysis. We show that the superimposition of structures with dRMSD can successfully identify protein domains and protein motions, and that it has important implications in practice, e.g., in aligning the ensemble of structures determined by nuclear magnetic resonance.

  12. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

    PubMed

    Jelínek, Jan; Škoda, Petr; Hoksza, David

    2017-12-06

    Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

  13. Rapid and reliable protein structure determination via chemical shift threading.

    PubMed

    Hafsa, Noor E; Berjanskii, Mark V; Arndt, David; Wishart, David S

    2018-01-01

    Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often < 10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)-with an average TM-score performance of 0.68 (vs. 0.50-0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .

  14. The use of supramolecular structures as protein ligands.

    PubMed

    Stopa, Barbara; Jagusiak, Anna; Konieczny, Leszek; Piekarska, Barbara; Rybarska, Janina; Zemanek, Grzegorz; Król, Marcin; Piwowar, Piotr; Roterman, Irena

    2013-11-01

    Congo red dye as well as other eagerly self-assembling organic molecules which form rod-like or ribbon-like supramolecular structures in water solutions, appears to represent a new class of protein ligands with possible wide-ranging medical applications. Such molecules associate with proteins as integral clusters and preferentially penetrate into areas of low molecular stability. Abnormal, partly unfolded proteins are the main binding target for such ligands, while well packed molecules are generally inaccessible. Of particular interest is the observation that local susceptibility for binding supramolecular ligands may be promoted in some proteins as a consequence of function-derived structural changes, and that such complexation may alter the activity profile of target proteins. Examples are presented in this paper.

  15. Protein structure estimation from NMR data by matrix completion.

    PubMed

    Li, Zhicheng; Li, Yang; Lei, Qiang; Zhao, Qing

    2017-09-01

    Knowledge of protein structures is very important to understand their corresponding physical and chemical properties. Nuclear Magnetic Resonance (NMR) spectroscopy is one of the main methods to measure protein structure. In this paper, we propose a two-stage approach to calculate the structure of a protein from a highly incomplete distance matrix, where most data are obtained from NMR. We first randomly "guess" a small part of unobservable distances by utilizing the triangle inequality, which is crucial for the second stage. Then we use matrix completion to calculate the protein structure from the obtained incomplete distance matrix. We apply the accelerated proximal gradient algorithm to solve the corresponding optimization problem. Furthermore, the recovery error of our method is analyzed, and its efficiency is demonstrated by several practical examples.

  16. Exploring Human Diseases and Biological Mechanisms by Protein Structure Prediction and Modeling.

    PubMed

    Wang, Juexin; Luttrell, Joseph; Zhang, Ning; Khan, Saad; Shi, NianQing; Wang, Michael X; Kang, Jing-Qiong; Wang, Zheng; Xu, Dong

    2016-01-01

    Protein structure prediction and modeling provide a tool for understanding protein functions by computationally constructing protein structures from amino acid sequences and analyzing them. With help from protein prediction tools and web servers, users can obtain the three-dimensional protein structure models and gain knowledge of functions from the proteins. In this chapter, we will provide several examples of such studies. As an example, structure modeling methods were used to investigate the relation between mutation-caused misfolding of protein and human diseases including epilepsy and leukemia. Protein structure prediction and modeling were also applied in nucleotide-gated channels and their interaction interfaces to investigate their roles in brain and heart cells. In molecular mechanism studies of plants, rice salinity tolerance mechanism was studied via structure modeling on crucial proteins identified by systems biology analysis; trait-associated protein-protein interactions were modeled, which sheds some light on the roles of mutations in soybean oil/protein content. In the age of precision medicine, we believe protein structure prediction and modeling will play more and more important roles in investigating biomedical mechanism of diseases and drug design.

  17. Mining protein loops using a structural alphabet and statistical exceptionality.

    PubMed

    Regad, Leslie; Martin, Juliette; Nuel, Gregory; Camproux, Anne-Claude

    2010-02-04

    Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 A). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of

  18. Structure elucidation of dimeric transmembrane domains of bitopic proteins.

    PubMed

    Bocharov, Eduard V; Volynsky, Pavel E; Pavlov, Konstantin V; Efremov, Roman G; Arseniev, Alexander S

    2010-01-01

    The interaction between transmembrane helices is of great interest because it directly determines biological activity of a membrane protein. Either destroying or enhancing such interactions can result in many diseases related to dysfunction of different tissues in human body. One much studied form of membrane proteins known as bitopic protein is a dimer containing two membrane-spanning helices associating laterally. Establishing structure-function relationship as well as rational design of new types of drugs targeting membrane proteins requires precise structural information about this class of objects. At present time, to investigate spatial structure and internal dynamics of such transmembrane helical dimers, several strategies were developed based mainly on a combination of NMR spectroscopy, optical spectroscopy, protein engineering and molecular modeling. These approaches were successfully applied to homo- and heterodimeric transmembrane fragments of several bitopic proteins, which play important roles in normal and in pathological conditions of human organism.

  19. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics

  20. Mixing and Matching Detergents for Membrane Protein NMR Structure Determination

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Columbus, Linda; Lipfert, Jan; Jambunathan, Kalyani

    2009-10-21

    One major obstacle to membrane protein structure determination is the selection of a detergent micelle that mimics the native lipid bilayer. Currently, detergents are selected by exhaustive screening because the effects of protein-detergent interactions on protein structure are poorly understood. In this study, the structure and dynamics of an integral membrane protein in different detergents is investigated by nuclear magnetic resonance (NMR) and electron paramagnetic resonance (EPR) spectroscopy and small-angle X-ray scattering (SAXS). The results suggest that matching of the micelle dimensions to the protein's hydrophobic surface avoids exchange processes that reduce the completeness of the NMR observations. Based onmore » these dimensions, several mixed micelles were designed that improved the completeness of NMR observations. These findings provide a basis for the rational design of mixed micelles that may advance membrane protein structure determination by NMR.« less

  1. The simulation study of protein-protein interfaces based on the 4-helix bundle structure

    NASA Astrophysics Data System (ADS)

    Fukuda, Masaki; Komatsu, Yu; Morikawa, Ryota; Miyakawa, Takeshi; Takasu, Masako; Akanuma, Satoshi; Yamagishi, Akihiko

    2013-02-01

    Docking of two protein molecules is induced by intermolecular interactions. Our purposes in this study are: designing binding interfaces on the two proteins, which specifically interact to each other; and inducing intermolecular interactions between the two proteins by mixing them. A 4-helix bundle structure was chosen as a scaffold on which binding interfaces were created. Based on this scaffold, we designed binding interfaces involving charged and nonpolar amino acid residues. We performed molecular dynamics (MD) simulation to identify suitable amino acid residues for the interfaces. We chose YciF protein as the scaffold for the protein-protein docking simulation. We observed the structure of two YciF protein molecules (I and II), and we calculated the distance between centroids (center of gravity) of the interfaces' surface planes of the molecules I and II. We found that the docking of the two protein molecules can be controlled by the number of hydrophobic and charged amino acid residues involved in the interfaces. Existence of six hydrophobic and five charged amino acid residues within an interface were most suitable for the protein-protein docking.

  2. Structural changes in gluten protein structure after addition of emulsifier. A Raman spectroscopy study

    NASA Astrophysics Data System (ADS)

    Ferrer, Evelina G.; Gómez, Analía V.; Añón, María C.; Puppo, María C.

    2011-06-01

    Food protein product, gluten protein, was chemically modified by varying levels of sodium stearoyl lactylate (SSL); and the extent of modifications (secondary and tertiary structures) of this protein was analyzed by using Raman spectroscopy. Analysis of the Amide I band showed an increase in its intensity mainly after the addition of the 0.25% of SSL to wheat flour to produced modified gluten protein, pointing the formation of a more ordered structure. Side chain vibrations also confirmed the observed changes.

  3. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.

    PubMed

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  4. On the large scale structure of X-ray background sources

    NASA Technical Reports Server (NTRS)

    Bi, H. G.; Meszaros, A.; Meszaros, P.

    1991-01-01

    The large scale clustering of the sources responsible for the X-ray background is discussed, under the assumption of a discrete origin. The formalism necessary for calculating the X-ray spatial fluctuations in the most general case where the source density contrast in structures varies with redshift is developed. A comparison of this with observational limits is useful for obtaining information concerning various galaxy formation scenarios. The calculations presented show that a varying density contrast has a small impact on the expected X-ray fluctuations. This strengthens and extends previous conclusions concerning the size and comoving density of large scale structures at redshifts 0.5 between 4.0.

  5. Density functional study of molecular interactions in secondary structures of proteins.

    PubMed

    Takano, Yu; Kusaka, Ayumi; Nakamura, Haruki

    2016-01-01

    Proteins play diverse and vital roles in biology, which are dominated by their three-dimensional structures. The three-dimensional structure of a protein determines its functions and chemical properties. Protein secondary structures, including α-helices and β-sheets, are key components of the protein architecture. Molecular interactions, in particular hydrogen bonds, play significant roles in the formation of protein secondary structures. Precise and quantitative estimations of these interactions are required to understand the principles underlying the formation of three-dimensional protein structures. In the present study, we have investigated the molecular interactions in α-helices and β-sheets, using ab initio wave function-based methods, the Hartree-Fock method (HF) and the second-order Møller-Plesset perturbation theory (MP2), density functional theory, and molecular mechanics. The characteristic interactions essential for forming the secondary structures are discussed quantitatively.

  6. Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution

    PubMed Central

    Wolf, Maxim Y; Wolf, Yuri I; Koonin, Eugene V

    2008-01-01

    Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in

  7. Structure and Calcium Binding Properties of a Neuronal Calcium-Myristoyl Switch Protein, Visinin-Like Protein 3.

    PubMed

    Li, Congmin; Lim, Sunghyuk; Braunewell, Karl H; Ames, James B

    2016-01-01

    Visinin-like protein 3 (VILIP-3) belongs to a family of Ca2+-myristoyl switch proteins that regulate signal transduction in the brain and retina. Here we analyze Ca2+ binding, characterize Ca2+-induced conformational changes, and determine the NMR structure of myristoylated VILIP-3. Three Ca2+ bind cooperatively to VILIP-3 at EF2, EF3 and EF4 (KD = 0.52 μM and Hill slope of 1.8). NMR assignments, mutagenesis and structural analysis indicate that the covalently attached myristoyl group is solvent exposed in Ca2+-bound VILIP-3, whereas Ca2+-free VILIP-3 contains a sequestered myristoyl group that interacts with protein residues (E26, Y64, V68), which are distinct from myristate contacts seen in other Ca2+-myristoyl switch proteins. The myristoyl group in VILIP-3 forms an unusual L-shaped structure that places the C14 methyl group inside a shallow protein groove, in contrast to the much deeper myristoyl binding pockets observed for recoverin, NCS-1 and GCAP1. Thus, the myristoylated VILIP-3 protein structure determined in this study is quite different from those of other known myristoyl switch proteins (recoverin, NCS-1, and GCAP1). We propose that myristoylation serves to fine tune the three-dimensional structures of neuronal calcium sensor proteins as a means of generating functional diversity.

  8. Lipid nanotechnologies for structural studies of membrane-associated proteins.

    PubMed

    Stoilova-McPhie, Svetla; Grushin, Kirill; Dalm, Daniela; Miller, Jaimy

    2014-11-01

    We present a methodology of lipid nanotubes (LNT) and nanodisks technologies optimized in our laboratory for structural studies of membrane-associated proteins at close to physiological conditions. The application of these lipid nanotechnologies for structure determination by cryo-electron microscopy (cryo-EM) is fundamental for understanding and modulating their function. The LNTs in our studies are single bilayer galactosylceramide based nanotubes of ∼20 nm inner diameter and a few microns in length, that self-assemble in aqueous solutions. The lipid nanodisks (NDs) are self-assembled discoid lipid bilayers of ∼10 nm diameter, which are stabilized in aqueous solutions by a belt of amphipathic helical scaffold proteins. By combining LNT and ND technologies, we can examine structurally how the membrane curvature and lipid composition modulates the function of the membrane-associated proteins. As proof of principle, we have engineered these lipid nanotechnologies to mimic the activated platelet's phosphtaidylserine rich membrane and have successfully assembled functional membrane-bound coagulation factor VIII in vitro for structure determination by cryo-EM. The macromolecular organization of the proteins bound to ND and LNT are further defined by fitting the known atomic structures within the calculated three-dimensional maps. The combination of LNT and ND technologies offers a means to control the design and assembly of a wide range of functional membrane-associated proteins and complexes for structural studies by cryo-EM. The presented results confirm the suitability of the developed methodology for studying the functional structure of membrane-associated proteins, such as the coagulation factors, at a close to physiological environment. © 2014 Wiley Periodicals, Inc.

  9. Geometry motivated alternative view on local protein backbone structures.

    PubMed

    Zacharias, Jan; Knapp, Ernst Walter

    2013-11-01

    We present an alternative to the classical Ramachandran plot (R-plot) to display local protein backbone structure. Instead of the (φ, ψ)-backbone angles relating to the chemical architecture of polypeptides generic helical parameters are used. These are the rotation or twist angle ϑ and the helical rise parameter d. Plots with these parameters provide a different view on the nature of local protein backbone structures. It allows to display the local structures in polar (d, ϑ)-coordinates, which is not possible for an R-plot, where structural regimes connected by periodicity appear disconnected. But there are other advantages, like a clear discrimination of the handedness of a local structure, a larger spread of the different local structure domains--the latter can yield a better separation of different local secondary structure motives--and many more. Compared to the R-plot we are not aware of any major disadvantage to classify local polypeptide structures with the (d, ϑ)-plot, except that it requires some elementary computations. To facilitate usage of the new (d, ϑ)-plot for protein structures we provide a web application (http://agknapp.chemie.fu-berlin.de/secsass), which shows the (d, ϑ)-plot side-by-side with the R-plot. © 2013 The Protein Society.

  10. Hot spot of structural ambivalence in prion protein revealed by secondary structure principal component analysis.

    PubMed

    Yamamoto, Norifumi

    2014-08-21

    The conformational conversion of proteins into an aggregation-prone form is a common feature of various neurodegenerative disorders including Alzheimer's, Huntington's, Parkinson's, and prion diseases. In the early stage of prion diseases, secondary structure conversion in prion protein (PrP) causing β-sheet expansion facilitates the formation of a pathogenic isoform with a high content of β-sheets and strong aggregation tendency to form amyloid fibrils. Herein, we propose a straightforward method to extract essential information regarding the secondary structure conversion of proteins from molecular simulations, named secondary structure principal component analysis (SSPCA). The definite existence of a PrP isoform with an increased β-sheet structure was confirmed in a free-energy landscape constructed by mapping protein structural data into a reduced space according to the principal components determined by the SSPCA. We suggest a "spot" of structural ambivalence in PrP-the C-terminal part of helix 2-that lacks a strong intrinsic secondary structure, thus promoting a partial α-helix-to-β-sheet conversion. This result is important to understand how the pathogenic conformational conversion of PrP is initiated in prion diseases. The SSPCA has great potential to solve various challenges in studying highly flexible molecular systems, such as intrinsically disordered proteins, structurally ambivalent peptides, and chameleon sequences.

  11. A Structural Perspective on the Modulation of Protein-Protein Interactions with Small Molecules.

    PubMed

    Demirel, Habibe Cansu; Dogan, Tunca; Tuncbag, Nurcan

    2018-05-31

    Protein-protein interactions (PPIs) are the key components in many cellular processes including signaling pathways, enzymatic reactions and epigenetic regulation. Abnormal interactions of some proteins may be pathogenic and cause various disorders including cancer and neurodegenerative diseases. Although inhibiting PPIs with small molecules is a challenging task, it gained an increasing interest because of its strong potential for drug discovery and design. The knowledge of the interface as well as the structural and chemical characteristics of the PPIs and their roles in the cellular pathways are necessary for a rational design of small molecules to modulate PPIs. In this study, we review the recent progress in the field and detail the physicochemical properties of PPIs including binding hot spots with a focus on structural methods. Then, we review recent approaches for structural prediction of PPIs. Finally, we revisit the concept of targeting PPIs in a systems biology perspective and we refer to the non-structural approaches, usually employed when the structural information is not present. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. Solving coiled-coil protein structures

    DOE PAGES

    Dauter, Zbigniew

    2015-02-26

    With the availability of more than 100,000 entries stored in the Protein Data Bank (PDB) that can be used as search models, molecular replacement (MR) is currently the most popular method of solving crystal structures of macromolecules. Significant methodological efforts have been directed in recent years towards making this approach more powerful and practical. This resulted in the creation of several computer programs, highly automated and user friendly, that are able to successfully solve many structures even by researchers who, although interested in structures of biomolecules, are not very experienced in crystallography.

  13. Structural interactions between retroviral Gag proteins examined by cysteine cross-linking.

    PubMed Central

    Hansen, M S; Barklis, E

    1995-01-01

    We have examined structural interactions between Gag proteins within Moloney murine leukemia virus (M-MuLV) particles by making use of the cysteine-specific cross-linking agents iodine and bis-maleimido hexane. Virion-associated wild-type M-MuLV Pr65Gag proteins in immature particles were intermolecularly cross-linked at cysteines to form Pr65Gag oligomers, from dimers to pentamers or hexamers. Following a systematic approach of cysteine-to-serine mutagenesis, we have shown that cross-linking of Pr65Gag occurred at cysteines of the nucleocapsid (NC) Cys-His motif, suggesting that the Cys-His motifs within virus particles are packed in close proximity. The M-MuLV Pr65Gag protein did not cross-link to the human immunodeficiency virus Pr55Gag protein when the two molecules were coexpressed, indicating either that they did not coassemble or that heterologous Gag proteins were not in close enough proximity to be cross-linked. Using an assembly-competent, protease-minus, cysteine-minus Pr65Gag protein as a template, novel cysteine residues were generated in the M-MuLV capsid domain major homology region (MHR). Cross-linking of proteins containing MHR cysteines showed above-background levels of Gag-Gag dimers but also identified a novel cellular factor, present in virions, that cross-linked to MHR residues. Although the NC cysteine mutation was compatible with M-MuLV particle assembly, deletions of the NC domain were not tolerated. These results suggest that the Cys-His motif is held in close proximity within immature M-MuLV particles by interactions between CA domains and/or non-Cys-His motif domains of the NC. PMID:7815493

  14. Ultra-high-resolution X-ray structure of proteins.

    PubMed

    Lecomte, C; Guillot, B; Muzet, N; Pichon-Pesme, V; Jelsch, C

    2004-04-01

    The constant advances in synchrotron radiation sources and crystallogenesis methods and the impulse of structural genomics projects have brought biocrystallography to a context favorable to subatomic resolution protein and nucleic acid structures. Thus, as soon as such precision can be frequently obtained, the amount of information available in the precise electron density should also be easily and naturally exploited, similarly to the field of small molecule charge density studies. Indeed, the use of a nonspherical model for the atomic electron density in the refinement of subatomic resolution protein structures allows the experimental description of their electrostatic properties. Some methods we have developed and implemented in our multipolar refinement program MoPro for this purpose are presented. Examples of successful applications to several subatomic resolution protein structures, including the 0.66 angstrom resolution human aldose reductase, are described.

  15. Sequential Release of Proteins from Structured Multishell Microcapsules.

    PubMed

    Shimanovich, Ulyana; Michaels, Thomas C T; De Genst, Erwin; Matak-Vinkovic, Dijana; Dobson, Christopher M; Knowles, Tuomas P J

    2017-10-09

    In nature, a wide range of functional materials is based on proteins. Increasing attention is also turning to the use of proteins as artificial biomaterials in the form of films, gels, particles, and fibrils that offer great potential for applications in areas ranging from molecular medicine to materials science. To date, however, most such applications have been limited to single component materials despite the fact that their natural analogues are composed of multiple types of proteins with a variety of functionalities that are coassembled in a highly organized manner on the micrometer scale, a process that is currently challenging to achieve in the laboratory. Here, we demonstrate the fabrication of multicomponent protein microcapsules where the different components are positioned in a controlled manner. We use molecular self-assembly to generate multicomponent structures on the nanometer scale and droplet microfluidics to bring together the different components on the micrometer scale. Using this approach, we synthesize a wide range of multiprotein microcapsules containing three well-characterized proteins: glucagon, insulin, and lysozyme. The localization of each protein component in multishell microcapsules has been detected by labeling protein molecules with different fluorophores, and the final three-dimensional microcapsule structure has been resolved by using confocal microscopy together with image analysis techniques. In addition, we show that these structures can be used to tailor the release of such functional proteins in a sequential manner. Moreover, our observations demonstrate that the protein release mechanism from multishell capsules is driven by the kinetic control of mass transport of the cargo and by the dissolution of the shells. The ability to generate artificial materials that incorporate a variety of different proteins with distinct functionalities increases the breadth of the potential applications of artificial protein-based materials

  16. Bioinformatics and variability in drug response: a protein structural perspective

    PubMed Central

    Lahti, Jennifer L.; Tang, Grace W.; Capriotti, Emidio; Liu, Tianyun; Altman, Russ B.

    2012-01-01

    Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk–benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein–drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein–drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants. PMID:22552919

  17. De Novo Proteins with Life-Sustaining Functions Are Structurally Dynamic.

    PubMed

    Murphy, Grant S; Greisman, Jack B; Hecht, Michael H

    2016-01-29

    Designing and producing novel proteins that fold into stable structures and provide essential biological functions are key goals in synthetic biology. In initial steps toward achieving these goals, we constructed a combinatorial library of de novo proteins designed to fold into 4-helix bundles. As described previously, screening this library for sequences that function in vivo to rescue conditionally lethal mutants of Escherichia coli (auxotrophs) yielded several de novo sequences, termed SynRescue proteins, which rescued four different E. coli auxotrophs. In an effort to understand the structural requirements necessary for auxotroph rescue, we investigated the biophysical properties of the SynRescue proteins, using both computational and experimental approaches. Results from circular dichroism, size-exclusion chromatography, and NMR demonstrate that the SynRescue proteins are α-helical and relatively stable. Surprisingly, however, they do not form well-ordered structures. Instead, they form dynamic structures that fluctuate between monomeric and dimeric states. These findings show that a well-ordered structure is not a prerequisite for life-sustaining functions, and suggests that dynamic structures may have been important in the early evolution of protein function. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Structure-guided wavelength tuning in far-red fluorescent proteins

    PubMed Central

    Ng, Ho-Leung; Lin, Michael Z.

    2017-01-01

    In recent years, protein engineers have succeeded in tuning the excitation spectra of natural fluorescent proteins from green wavelengths into orange and red wavelengths, resulting in the creation of a series of fluorescent proteins with emission in the far-red portions of the optical spectrum. These results have arisen from the synergistic combination of structural knowledge of fluorescent proteins, chemical intuition, and high-throughput screening methods. Here we review structural features found in autocatalytic far-red fluorescent proteins, and discuss how they add to our understanding of the biophysical mechanisms of wavelength tuning in biological chromophores. PMID:27468111

  19. Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions

    PubMed Central

    Harteis, Sabrina; Schneider, Sabine

    2014-01-01

    DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169

  20. G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.

    PubMed

    Lee, Hui Sun; Im, Wonpil

    2017-01-01

    Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.

  1. Local Structural Differences in Homologous Proteins: Specificities in Different SCOP Classes

    PubMed Central

    Joseph, Agnel Praveen; Valadié, Hélène; Srinivasan, Narayanaswamy; de Brevern, Alexandre G.

    2012-01-01

    The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also

  2. Soliton concepts and protein structure

    NASA Astrophysics Data System (ADS)

    Krokhotin, Andrei; Niemi, Antti J.; Peng, Xubiao

    2012-03-01

    Structural classification shows that the number of different protein folds is surprisingly small. It also appears that proteins are built in a modular fashion from a relatively small number of components. Here we propose that the modular building blocks are made of the dark soliton solution of a generalized discrete nonlinear Schrödinger equation. We find that practically all protein loops can be obtained simply by scaling the size and by joining together a number of copies of the soliton, one after another. The soliton has only two loop-specific parameters, and we compute their statistical distribution in the Protein Data Bank (PDB). We explicitly construct a collection of 200 sets of parameters, each determining a soliton profile that describes a different short loop. The ensuing profiles cover practically all those proteins in PDB that have a resolution which is better than 2.0 Å, with a precision such that the average root-mean-square distance between the loop and its soliton is less than the experimental B-factor fluctuation distance. We also present two examples that describe how the loop library can be employed both to model and to analyze folded proteins.

  3. Soliton concepts and protein structure.

    PubMed

    Krokhotin, Andrei; Niemi, Antti J; Peng, Xubiao

    2012-03-01

    Structural classification shows that the number of different protein folds is surprisingly small. It also appears that proteins are built in a modular fashion from a relatively small number of components. Here we propose that the modular building blocks are made of the dark soliton solution of a generalized discrete nonlinear Schrödinger equation. We find that practically all protein loops can be obtained simply by scaling the size and by joining together a number of copies of the soliton, one after another. The soliton has only two loop-specific parameters, and we compute their statistical distribution in the Protein Data Bank (PDB). We explicitly construct a collection of 200 sets of parameters, each determining a soliton profile that describes a different short loop. The ensuing profiles cover practically all those proteins in PDB that have a resolution which is better than 2.0 Å, with a precision such that the average root-mean-square distance between the loop and its soliton is less than the experimental B-factor fluctuation distance. We also present two examples that describe how the loop library can be employed both to model and to analyze folded proteins.

  4. SFG analysis of surface bound proteins: a route towards structure determination.

    PubMed

    Weidner, Tobias; Castner, David G

    2013-08-14

    The surface of a material is rapidly covered with proteins once that material is placed in a biological environment. The structure and function of these bound proteins play a key role in the interactions and communications of the material with the biological environment. Thus, it is crucial to gain a molecular level understanding of surface bound protein structure. While X-ray diffraction and solution phase NMR methods are well established for determining the structure of proteins in the crystalline or solution phase, there is not a corresponding single technique that can provide the same level of structural detail about proteins at surfaces or interfaces. However, recent advances in sum frequency generation (SFG) vibrational spectroscopy have significantly increased our ability to obtain structural information about surface bound proteins and peptides. A multi-technique approach of combining SFG with (1) protein engineering methods to selectively introduce mutations and isotopic labels, (2) other experimental methods such as time-of-flight secondary ion mass spectrometry (ToF-SIMS) and near edge X-ray absorption fine structure (NEXAFS) to provide complementary information, and (3) molecular dynamic (MD) simulations to extend the molecular level experimental results is a particularly promising route for structural characterization of surface bound proteins and peptides. By using model peptides and small proteins with well-defined structures, methods have been developed to determine the orientation of both backbone and side chains to the surface.

  5. SFG analysis of surface bound proteins: A route towards structure determination

    PubMed Central

    Weidner, Tobias; Castner, David G.

    2013-01-01

    The surface of a material is rapidly covered with proteins once that material is placed in a biological environment. The structure and function of these bound proteins play a key role in the interactions and communications of the material with the biological environment. Thus, it is crucial to gain a molecular level understanding of surface bound protein structure. While X-ray diffraction and solution phase NMR methods are well established for determining the structure of proteins in the crystalline or solution phase, there is not a corresponding single technique that can provide the same level of structural detail about proteins at surfaces or interfaces. However, recent advances in sum frequency generation (SFG) vibrational spectroscopy have significantly increased our ability to obtain structural information about surface bound proteins and peptides. A multi-technique approach of combining SFG with (1) protein engineering methods to selectively introduce mutations and isotopic labels, (2) other experimental methods such as time-of-flight secondary ion mass spectrometry (ToF-SIMS) and near edge x-ray absorption fine structure (NEXAFS) to provide complementary information, and (3) molecular dynamic (MD) simulations to extend the molecular level experimental results is a particularly promising route for structural characterization of surface bound proteins and peptides. By using model peptides and small proteins with well-defined structures, methods have been developed to determine the orientation of both backbone and side chains to the surface. PMID:23727992

  6. DNA Nanotubes for NMR Structure Determination of Membrane Proteins

    PubMed Central

    Bellot, Gaëtan; McClintock, Mark A.; Chou, James J; Shih, William M.

    2013-01-01

    Structure determination of integral membrane proteins by solution NMR represents one of the most important challenges of structural biology. A Residual-Dipolar-Coupling-based refinement approach can be used to solve the structure of membrane proteins up to 40 kDa in size, however, a weak-alignment medium that is detergent-resistant is required. Previously, availability of media suitable for weak alignment of membrane proteins was severely limited. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400nm-long six-helix bundles each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, towards collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes via counter ions and small DNA binding molecules. This detergent-resistant liquid-crystal media offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility, and structural programmability. Production of sufficient nanotubes for 4–5 NMR experiments can be completed in one week by a single individual. PMID:23518667

  7. Free-Energy Landscape of Protein-Ligand Interactions Coupled with Protein Structural Changes.

    PubMed

    Moritsugu, Kei; Terada, Tohru; Kidera, Akinori

    2017-02-02

    Protein-ligand interactions are frequently coupled with protein structural changes. Focusing on the coupling, we present the free-energy surface (FES) of the ligand-binding process for glutamine-binding protein (GlnBP) and its ligand, glutamine, in which glutamine binding accompanies large-scale domain closure. All-atom simulations were performed in explicit solvents by multiscale enhanced sampling (MSES), which adopts a multicopy and multiscale scheme to achieve enhanced sampling of systems with a large number of degrees of freedom. The structural ensemble derived from the MSES simulation yielded the FES of the coupling, described in terms of both the ligand's and protein's degrees of freedom at atomic resolution, and revealed the tight coupling between the two degrees of freedom. The derived FES led to the determination of definite structural states, which suggested the dominant pathways of glutamine binding to GlnBP: first, glutamine migrates via diffusion to form a dominant encounter complex with Arg75 on the large domain of GlnBP, through strong polar interactions. Subsequently, the closing motion of GlnBP occurs to form ligand interactions with the small domain, finally completing the native-specific complex structure. The formation of hydrogen bonds between glutamine and the small domain is considered to be a rate-limiting step, inducing desolvation of the protein-ligand interface to form the specific native complex. The key interactions to attain high specificity for glutamine, the "door keeper" existing between the two domains (Asp10-Lys115) and the "hydrophobic sandwich" formed between the ligand glutamine and Phe13/Phe50, have been successfully mapped on the pathway derived from the FES.

  8. Instability of the roll-streak structure induced by background turbulence in pretransitional Couette flow

    NASA Astrophysics Data System (ADS)

    Farrell, Brian F.; Ioannou, Petros J.; Nikolaidis, Marios-Andreas

    2017-03-01

    Although the roll-streak structure is ubiquitous in both observations and simulations of pretransitional wall-bounded shear flow, this structure is linearly stable if the idealization of laminar flow is made. Lacking an instability, the large transient growth of the roll-streak structure has been invoked to explain its appearance as resulting from chance occurrence in the background turbulence of perturbations configured to optimally excite it. However, there is an alternative interpretation for the role of free-stream turbulence in the genesis of the roll-streak structure, which is that the background turbulence interacts with the roll-streak structure to destabilize it. Statistical state dynamics (SSD) provides analysis methods for studying instabilities of this type that arise from interaction between the coherent and incoherent components of turbulence. SSD in the form of a closure at second order is used in this work to analyze the cooperative eigenmodes arising from interaction between the coherent streamwise invariant component and the incoherent background component of turbulence. In pretransitional Couette flow a manifold of stable modes with roll-streak form is found to exist in the presence of low-intensity background turbulence. The least stable mode of this manifold is destabilized at a critical value of a parameter controlling the background turbulence intensity and a finite-amplitude roll-streak structure arises from this instability through a bifurcation in this parameter. Although this bifurcation has analytical expression only in the infinite ensemble formulation of second order SSD, referred in this work as the S3T system, it is closely reflected in numerical simulations of both the dynamically similar quasilinear system, referred to as the restricted nonlinear (RNL) system, as well as in the full Navier-Stokes equations. This correspondence is verified using ensemble implementations of the RNL system and the Navier-Stokes equations. The S3T

  9. Three-dimensional (3D) structure prediction of the American and African oil-palms β-ketoacyl-[ACP] synthase-II protein by comparative modelling

    PubMed Central

    Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan

    2014-01-01

    Background: The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. Objective: The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. Materials and Methods: The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. Results: The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. Conclusion: The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates. PMID

  10. Yellow fluorescent protein phiYFPv (Phialidium): structure and structure-based mutagenesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pletneva, Nadya V.; Pletnev, Vladimir Z., E-mail: vzpletnev@gmail.com; Souslova, Ekaterina

    The yellow fluorescent protein phiYFPv with improved folding has been developed from the spectrally identical wild-type phiYFP found in the marine jellyfish Phialidium. The yellow fluorescent protein phiYFPv (λ{sub em}{sup max} ≃ 537 nm) with improved folding has been developed from the spectrally identical wild-type phiYFP found in the marine jellyfish Phialidium. The latter fluorescent protein is one of only two known cases of naturally occurring proteins that exhibit emission spectra in the yellow–orange range (535–555 nm). Here, the crystal structure of phiYFPv has been determined at 2.05 Å resolution. The ‘yellow’ chromophore formed from the sequence triad Thr65-Tyr66-Gly67 adoptsmore » the bicyclic structure typical of fluorophores emitting in the green spectral range. It was demonstrated that perfect antiparallel π-stacking of chromophore Tyr66 and the proximal Tyr203, as well as Val205, facing the chromophore phenolic ring are chiefly responsible for the observed yellow emission of phiYFPv at 537 nm. Structure-based site-directed mutagenesis has been used to identify the key functional residues in the chromophore environment. The obtained results have been utilized to improve the properties of phiYFPv and its homologous monomeric biomarker tagYFP.« less

  11. Structure-sequence based analysis for identification of conserved regions in proteins

    DOEpatents

    Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth

    2013-05-28

    Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.

  12. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  13. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  14. G-protein-coupled receptor structures were not built in a day.

    PubMed

    Blois, Tracy M; Bowie, James U

    2009-07-01

    Among the most exciting recent developments in structural biology is the structure determination of G-protein-coupled receptors (GPCRs), which comprise the largest class of membrane proteins in mammalian cells and have enormous importance for disease and drug development. The GPCR structures are perhaps the most visible examples of a nascent revolution in membrane protein structure determination. Like other major milestones in science, however, such as the sequencing of the human genome, these achievements were built on a hidden foundation of technological developments. Here, we describe some of the methods that are fueling the membrane protein structure revolution and have enabled the determination of the current GPCR structures, along with new techniques that may lead to future structures.

  15. Sequence repeats and protein structure

    NASA Astrophysics Data System (ADS)

    Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

    2012-11-01

    Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.

  16. Heat-induced Protein Structure and Subfractions in Relation to Protein Degradation Kinetics and Intestinal Availability in Dairy Cattle

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Doiron, K.; Yu, P; McKinnon, J

    2009-01-01

    The objectives of this study were to reveal protein structures of feed tissues affected by heat processing at a cellular level, using the synchrotron-based Fourier transform infrared microspectroscopy as a novel approach, and quantify protein structure in relation to protein digestive kinetics and nutritive value in the rumen and intestine in dairy cattle. The parameters assessed included (1) protein structure a-helix to e-sheet ratio; (2) protein subfractions profiles; (3) protein degradation kinetics and effective degradability; (4) predicted nutrient supply using the intestinally absorbed protein supply (DVE)/degraded protein balance (OEB) system for dairy cattle. In this study, Vimy flaxseed protein wasmore » used as a model feed protein and was autoclave-heated at 120C for 20, 40, and 60 min in treatments T1, T2, and T3, respectively. The results showed that using the synchrotron-based Fourier transform infrared microspectroscopy revealed and identified the heat-induced protein structure changes. Heating at 120C for 40 and 60 min increased the protein structure a-helix to e-sheet ratio. There were linear effects of heating time on the ratio. The heating also changed chemical profiles, which showed soluble CP decreased upon heating with concomitant increases in nonprotein nitrogen, neutral, and acid detergent insoluble nitrogen. The protein subfractions with the greatest changes were PB1, which showed a dramatic reduction, and PB2, which showed a dramatic increase, demonstrating a decrease in overall protein degradability. In situ results showed a reduction in rumen-degradable protein and in rumen-degradable dry matter without differences between the treatments. Intestinal digestibility, determined using a 3-step in vitro procedure, showed no changes to rumen undegradable protein. Modeling results showed that heating increased total intestinally absorbable protein (feed DVE value) and decreased degraded protein balance (feed OEB value), but there were no

  17. Structure and assembly of scalable porous protein cages

    NASA Astrophysics Data System (ADS)

    Sasaki, Eita; Böhringer, Daniel; van de Waterbeemd, Michiel; Leibundgut, Marc; Zschoche, Reinhard; Heck, Albert J. R.; Ban, Nenad; Hilvert, Donald

    2017-03-01

    Proteins that self-assemble into regular shell-like polyhedra are useful, both in nature and in the laboratory, as molecular containers. Here we describe cryo-electron microscopy (EM) structures of two versatile encapsulation systems that exploit engineered electrostatic interactions for cargo loading. We show that increasing the number of negative charges on the lumenal surface of lumazine synthase, a protein that naturally assembles into a ~1-MDa dodecahedron composed of 12 pentamers, induces stepwise expansion of the native protein shell, giving rise to thermostable ~3-MDa and ~6-MDa assemblies containing 180 and 360 subunits, respectively. Remarkably, these expanded particles assume unprecedented tetrahedrally and icosahedrally symmetric structures constructed entirely from pentameric units. Large keyhole-shaped pores in the shell, not present in the wild-type capsid, enable diffusion-limited encapsulation of complementarily charged guests. The structures of these supercharged assemblies demonstrate how programmed electrostatic effects can be effectively harnessed to tailor the architecture and properties of protein cages.

  18. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.

    PubMed

    Xu, Dong; Zhang, Yang

    2012-07-01

    Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.

  19. Dynamic protein interaction networks and new structural paradigms in signaling

    PubMed Central

    Csizmok, Veronika; Follis, Ariele Viacava; Kriwacki, Richard W.; Forman-Kay, Julie D.

    2017-01-01

    Understanding signaling and other complex biological processes requires elucidating the critical roles of intrinsically disordered proteins and regions (IDPs/IDRs), which represent ~30% of the proteome and enable unique regulatory mechanisms. In this review we describe the structural heterogeneity of disordered proteins that underpins these mechanisms and the latest progress in obtaining structural descriptions of ensembles of disordered proteins that are needed for linking structure and dynamics to function. We describe the diverse interactions of IDPs that can have unusual characteristics such as “ultrasensitivity” and “regulated folding and unfolding”. We also summarize the mounting data showing that large-scale assembly and protein phase separation occurs within a variety of signaling complexes and cellular structures. In addition, we discuss efforts to therapeutically target disordered proteins with small molecules. Overall, we interpret the remodeling of disordered state ensembles due to binding and post-translational modifications within an expanded framework for allostery that provides significant insights into how disordered proteins transmit biological information. PMID:26922996

  20. VoroMQA: Assessment of protein structure quality using interatomic contact areas.

    PubMed

    Olechnovič, Kliment; Venclovas, Česlovas

    2017-06-01

    In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge-based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue-residue or atom-atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation-based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single-model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa. Proteins 2017; 85:1131-1145. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  1. Phylogenetic continuum indicates "galaxies" in the protein universe: preliminary results on the natural group structures of proteins.

    PubMed

    Ladunga, I

    1992-04-01

    The markedly nonuniform, even systematic distribution of sequences in the protein "universe" has been analyzed by methods of protein taxonomy. Mapping of the natural hierarchical system of proteins has revealed some dense cores, i.e., well-defined clusterings of proteins that seem to be natural structural groupings, possibly seeds for a future protein taxonomy. The aim was not to force proteins into more or less man-made categories by discriminant analysis, but to find structurally similar groups, possibly of common evolutionary origin. Single-valued distance measures between pairs of superfamilies from the Protein Identification Resource were defined by two chi 2-like methods on tripeptide frequencies and the variable-length subsequence identity method derived from dot-matrix comparisons. Distance matrices were processed by several methods of cluster analysis to detect phylogenetic continuum between highly divergent proteins. Only well-defined clusters characterized by relatively unique structural, intracellular environmental, organismal, and functional attribute states were selected as major protein groups, including subsets of viral and Escherichia coli proteins, hormones, inhibitors, plant, ribosomal, serum and structural proteins, amino acid synthases, and clusters dominated by certain oxidoreductases and apolar and DNA-associated enzymes. The limited repertoire of functional patterns due to small genome size, the high rate of recombination, specific features of the bacterial membranes, or of the virus cycle canalize certain proteins of viruses and Gram-negative bacteria, respectively, to organismal groups.

  2. Protein structure based prediction of catalytic residues.

    PubMed

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  3. Structural stability of proteins in aqueous and nonpolar environments

    NASA Astrophysics Data System (ADS)

    Yasuda, Satoshi; Oshima, Hiraku; Kinoshita, Masahiro

    2012-10-01

    A protein folds into its native structure with the α-helix and/or β-sheet in aqueous solution under the physiological condition. The relative content of these secondary structures largely varies from protein to protein. However, such structural variability is not exhibited in nonaqueous environment. For example, there is a strong trend that alcohol induces a protein to form α-helices, and many of the membrane proteins within the lipid bilayer consists of α-helices. Here we investigate the structural stability of proteins in aqueous and nonpolar environments using our recently developed free-energy function F = (Λ - TS)/(kBT0) = Λ/(kBT0) - S/kB (T0 = 298 K and the absolute temperature T is set at T0) which is based on statistical thermodynamics. Λ/(kBT0) and S/kB are the energetic and entropic components, respectively, and kB is Boltzmann's constant. A smaller value of the positive quantity, -S, represents higher efficiency of the backbone and side-chain packing promoted by the entropic effect arising from the translational displacement of solvent molecules or the CH2, CH3, and CH groups which constitute nonpolar chains of lipid molecules. As for Λ, in aqueous solution, a transition to a more compact structure of a protein accompanies the break of protein-solvent hydrogen bonds: As the number of donors and acceptors buried without protein intramolecular hydrogen bonding increases, Λ becomes higher. In nonpolar solvent, lower Λ simply implies more intramolecular hydrogen bonds formed. We find the following. The α-helix and β-sheet are advantageous with respect to -S as well as Λ and to be formed as much as possible. In aqueous solution, the solvent-entropy effect on the structural stability is so strong that the close packing of side chains is dominantly important, and the α-helix and β-sheet contents are judiciously adjusted to accomplish it. In nonpolar solvent, the solvent-entropy effect is substantially weaker than in aqueous solution. Λ is

  4. Systematic Comparison of Crystal and NMR Protein Structures Deposited in the Protein Data Bank

    PubMed Central

    Sikic, Kresimir; Tomic, Sanja; Carugo, Oliviero

    2010-01-01

    Nearly all the macromolecular three-dimensional structures deposited in Protein Data Bank were determined by either crystallographic (X-ray) or Nuclear Magnetic Resonance (NMR) spectroscopic methods. This paper reports a systematic comparison of the crystallographic and NMR results deposited in the files of the Protein Data Bank, in order to find out to which extent these information can be aggregated in bioinformatics. A non-redundant data set containing 109 NMR – X-ray structure pairs of nearly identical proteins was derived from the Protein Data Bank. A series of comparisons were performed by focusing the attention towards both global features and local details. It was observed that: (1) the RMDS values between NMR and crystal structures range from about 1.5 Å to about 2.5 Å; (2) the correlation between conformational deviations and residue type reveals that hydrophobic amino acids are more similar in crystal and NMR structures than hydrophilic amino acids; (3) the correlation between solvent accessibility of the residues and their conformational variability in solid state and in solution is relatively modest (correlation coefficient = 0.462); (4) beta strands on average match better between NMR and crystal structures than helices and loops; (5) conformational differences between loops are independent of crystal packing interactions in the solid state; (6) very seldom, side chains buried in the protein interior are observed to adopt different orientations in the solid state and in solution. PMID:21293729

  5. A computational tool to predict the evolutionarily conserved protein-protein interaction hot-spot residues from the structure of the unbound protein.

    PubMed

    Agrawal, Neeraj J; Helk, Bernhard; Trout, Bernhardt L

    2014-01-21

    Identifying hot-spot residues - residues that are critical to protein-protein binding - can help to elucidate a protein's function and assist in designing therapeutic molecules to target those residues. We present a novel computational tool, termed spatial-interaction-map (SIM), to predict the hot-spot residues of an evolutionarily conserved protein-protein interaction from the structure of an unbound protein alone. SIM can predict the protein hot-spot residues with an accuracy of 36-57%. Thus, the SIM tool can be used to predict the yet unknown hot-spot residues for many proteins for which the structure of the protein-protein complexes are not available, thereby providing a clue to their functions and an opportunity to design therapeutic molecules to target these proteins. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  6. Heterochiral Knottin Protein: Folding and Solution Structure.

    PubMed

    Mong, Surin K; Cochran, Frank V; Yu, Hongtao; Graziano, Zachary; Lin, Yu-Shan; Cochran, Jennifer R; Pentelute, Bradley L

    2017-10-31

    Homochirality is a general feature of biological macromolecules, and Nature includes few examples of heterochiral proteins. Herein, we report on the design, chemical synthesis, and structural characterization of heterochiral proteins possessing loops of amino acids of chirality opposite to that of the rest of a protein scaffold. Using the protein Ecballium elaterium trypsin inhibitor II, we discover that selective β-alanine substitution favors the efficient folding of our heterochiral constructs. Solution nuclear magnetic resonance spectroscopy of one such heterochiral protein reveals a homogeneous global fold. Additionally, steered molecular dynamics simulation indicate β-alanine reduces the free energy required to fold the protein. We also find these heterochiral proteins to be more resistant to proteolysis than homochiral l-proteins. This work informs the design of heterochiral protein architectures containing stretches of both d- and l-amino acids.

  7. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions.

    PubMed

    Krissinel, E; Henrick, K

    2004-12-01

    The present paper describes the SSM algorithm of protein structure comparison in three dimensions, which includes an original procedure of matching graphs built on the protein's secondary-structure elements, followed by an iterative three-dimensional alignment of protein backbone Calpha atoms. The SSM results are compared with those obtained from other protein comparison servers, and the advantages and disadvantages of different scores that are used for structure recognition are discussed. A new score, balancing the r.m.s.d. and alignment length Nalign, is proposed. It is found that different servers agree reasonably well on the new score, while showing considerable differences in r.m.s.d. and Nalign.

  8. Structural features that predict real-value fluctuations of globular proteins.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  9. Structural features that predict real-value fluctuations of globular proteins

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-01-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193

  10. Protein Flexibility Facilitates Quaternary Structure Assembly and Evolution

    PubMed Central

    Marsh, Joseph A.; Teichmann, Sarah A.

    2014-01-01

    The intrinsic flexibility of proteins allows them to undergo large conformational fluctuations in solution or upon interaction with other molecules. Proteins also commonly assemble into complexes with diverse quaternary structure arrangements. Here we investigate how the flexibility of individual protein chains influences the assembly and evolution of protein complexes. We find that flexibility appears to be particularly conducive to the formation of heterologous (i.e., asymmetric) intersubunit interfaces. This leads to a strong association between subunit flexibility and homomeric complexes with cyclic and asymmetric quaternary structure topologies. Similarly, we also observe that the more nonhomologous subunits that assemble together within a complex, the more flexible those subunits tend to be. Importantly, these findings suggest that subunit flexibility should be closely related to the evolutionary history of a complex. We confirm this by showing that evolutionarily more recent subunits are generally more flexible than evolutionarily older subunits. Finally, we investigate the very different explorations of quaternary structure space that have occurred in different evolutionary lineages. In particular, the increased flexibility of eukaryotic proteins appears to enable the assembly of heteromeric complexes with more unique components. PMID:24866000

  11. Protein Delivery into Plant Cells: Toward In vivo Structural Biology

    PubMed Central

    Cedeño, Cesyen; Pauwels, Kris; Tompa, Peter

    2017-01-01

    Understanding the biologically relevant structural and functional behavior of proteins inside living plant cells is only possible through the combination of structural biology and cell biology. The state-of-the-art structural biology techniques are typically applied to molecules that are isolated from their native context. Although most experimental conditions can be easily controlled while dealing with an isolated, purified protein, a serious shortcoming of such in vitro work is that we cannot mimic the extremely complex intracellular environment in which the protein exists and functions. Therefore, it is highly desirable to investigate proteins in their natural habitat, i.e., within live cells. This is the major ambition of in-cell NMR, which aims to approach structure-function relationship under true in vivo conditions following delivery of labeled proteins into cells under physiological conditions. With a multidisciplinary approach that includes recombinant protein production, confocal fluorescence microscopy, nuclear magnetic resonance (NMR) spectroscopy and different intracellular protein delivery strategies, we explore the possibility to develop in-cell NMR studies in living plant cells. While we provide a comprehensive framework to set-up in-cell NMR, we identified the efficient intracellular introduction of isotope-labeled proteins as the major bottleneck. Based on experiments with the paradigmatic intrinsically disordered proteins (IDPs) Early Response to Dehydration protein 10 and 14, we also established the subcellular localization of ERD14 under abiotic stress. PMID:28469623

  12. Conditional protein splicing: a new tool to control protein structure and function in vitro and in vivo.

    PubMed

    Mootz, Henning D; Blum, Elyse S; Tyszkiewicz, Amy B; Muir, Tom W

    2003-09-03

    Protein splicing is a naturally occurring process in which an intervening intein domain excises itself out of a precursor polypeptide in an autocatalytic fashion with concomitant linkage of the two flanking extein sequences by a native peptide bond. We have recently reported an engineered split VMA intein whose splicing activity in trans between two polypeptides can be triggered by the small molecule rapamycin. In this report, we show that this conditional protein splicing (CPS) system can be used in mammalian cells. Two model constructs harboring maltose-binding protein (MBP) and a His-tag as exteins were expressed from a constitutive promoter after transient transfection. The splicing product MBP-His was detected by Western blotting and immunoprecipitation in cells treated with rapamycin or a nontoxic analogue thereof. No background splicing in the absence of the small-molecule inducer was observed over a 24-h time course. Product formation could be detected within 10 min of addition of rapamycin, indicating the advantage of the posttranslational nature of CPS for quick responses. The level of protein splicing was dose dependent and could be competitively attenuated with the small molecule ascomycin. In related studies, the geometric flexibility of the CPS components was investigated with a series of purified proteins. The FKBP and FRB domains, which are dimerized by rapamycin and thereby induce the reconstitution of the split intein, were fused to the extein sequences of the split intein halves. CPS was still triggered by rapamycin when FKBP and FRB occupied one or both of the extein positions. This finding suggests yet further applications of CPS in the area of proteomics. In summary, CPS holds great promise to become a powerful new tool to control protein structure and function in vitro and in living cells.

  13. Crystal structure of the protein At3g01520, a eukaryotic universal stress protein-like protein from Arabidopsis thaliana in complex with AMP.

    PubMed

    Kim, Do Jin; Bitto, Eduard; Bingman, Craig A; Kim, Hyun-Jung; Han, Byung Woo; Phillips, George N

    2015-07-01

    Members of the universal stress protein (USP) family are conserved in a phylogenetically diverse range of prokaryotes, fungi, protists, and plants and confer abilities to respond to a wide range of environmental stresses. Arabidopsis thaliana contains 44 USP domain-containing proteins, and USP domain is found either in a small protein with unknown physiological function or in an N-terminal portion of a multi-domain protein, usually a protein kinase. Here, we report the first crystal structure of a eukaryotic USP-like protein encoded from the gene At3g01520. The crystal structure of the protein At3g01520 was determined by the single-wavelength anomalous dispersion method and refined to an R factor of 21.8% (Rfree = 26.1%) at 2.5 Å resolution. The crystal structure includes three At3g01520 protein dimers with one AMP molecule bound to each protomer, comprising a Rossmann-like α/β overall fold. The bound AMP and conservation of residues in the ATP-binding loop suggest that the protein At3g01520 also belongs to the ATP-binding USP subfamily members. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  14. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel

  15. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  16. A generative, probabilistic model of local protein structure.

    PubMed

    Boomsma, Wouter; Mardia, Kanti V; Taylor, Charles C; Ferkinghoff-Borg, Jesper; Krogh, Anders; Hamelryck, Thomas

    2008-07-01

    Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.

  17. Structure of the ordered hydration of amino acids in proteins: analysis of crystal structures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biedermannová, Lada, E-mail: lada.biedermannova@ibt.cas.cz; Schneider, Bohdan

    2015-10-27

    The hydration of protein crystal structures was studied at the level of individual amino acids. The dependence of the number of water molecules and their preferred spatial localization on various parameters, such as solvent accessibility, secondary structure and side-chain conformation, was determined. Crystallography provides unique information about the arrangement of water molecules near protein surfaces. Using a nonredundant set of 2818 protein crystal structures with a resolution of better than 1.8 Å, the extent and structure of the hydration shell of all 20 standard amino-acid residues were analyzed as function of the residue conformation, secondary structure and solvent accessibility. Themore » results show how hydration depends on the amino-acid conformation and the environment in which it occurs. After conformational clustering of individual residues, the density distribution of water molecules was compiled and the preferred hydration sites were determined as maxima in the pseudo-electron-density representation of water distributions. Many hydration sites interact with both main-chain and side-chain amino-acid atoms, and several occurrences of hydration sites with less canonical contacts, such as carbon–donor hydrogen bonds, OH–π interactions and off-plane interactions with aromatic heteroatoms, are also reported. Information about the location and relative importance of the empirically determined preferred hydration sites in proteins has applications in improving the current methods of hydration-site prediction in molecular replacement, ab initio protein structure prediction and the set-up of molecular-dynamics simulations.« less

  18. Protein Structural Perturbation and Aggregation on Homogeneous Surfaces

    PubMed Central

    Sethuraman, Ananthakrishnan; Belfort, Georges

    2005-01-01

    We have demonstrated that globular proteins, such as hen egg lysozyme in phosphate buffered saline at room temperature, lose native structural stability and activity when adsorbed onto well-defined homogeneous solid surfaces. This structural loss is evident by α-helix to turns/random during the first 30 min and followed by a slow α-helix to β-sheet transition. Increase in intramolecular and intermolecular β-sheet content suggests conformational rearrangement and aggregation between different protein molecules, respectively. Amide I band attenuated total reflection/Fourier transformed infrared (ATR/FTIR) spectroscopy was used to quantify the secondary structure content of lysozyme adsorbed on six different self-assembled alkanethiol monolayer surfaces with –CH3, –OPh, –CF3, –CN, –OCH3, and –OH exposed functional end groups. Activity measurements of adsorbed lysozyme were in good agreement with the structural perturbations. Both surface chemistry (type of functional groups, wettability) and adsorbate concentration (i.e., lateral interactions) are responsible for the observed structural changes during adsorption. A kinetic model is proposed to describe secondary structural changes that occur in two dynamic phases. The results presented in this article demonstrate the utility of the ATR/FTIR spectroscopic technique for in situ characterization of protein secondary structures during adsorption on flat surfaces. PMID:15542559

  19. Thermal green protein, an extremely stable, nonaggregating fluorescent protein created by structure-guided surface engineering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Close, Devin W.; Paul, Craig Don; Langan, Patricia S.

    In this paper, we describe the engineering and X-ray crystal structure of Thermal Green Protein (TGP), an extremely stable, highly soluble, non-aggregating green fluorescent protein. TGP is a soluble variant of the fluorescent protein eCGP123, which despite being highly stable, has proven to be aggregation-prone. The X-ray crystal structure of eCGP123, also determined within the context of this paper, was used to carry out rational surface engineering to improve its solubility, leading to TGP. The approach involved simultaneously eliminating crystal lattice contacts while increasing the overall negative charge of the protein. Despite intentional disruption of lattice contacts and introduction ofmore » high entropy glutamate side chains, TGP crystallized readily in a number of different conditions and the X-ray crystal structure of TGP was determined to 1.9 Å resolution. The structural reasons for the enhanced stability of TGP and eCGP123 are discussed. We demonstrate the utility of using TGP as a fusion partner in various assays and significantly, in amyloid assays in which the standard fluorescent protein, EGFP, is undesirable because of aberrant oligomerization.« less

  20. Thermal green protein, an extremely stable, nonaggregating fluorescent protein created by structure-guided surface engineering

    DOE PAGES

    Close, Devin W.; Paul, Craig Don; Langan, Patricia S.; ...

    2015-05-08

    In this paper, we describe the engineering and X-ray crystal structure of Thermal Green Protein (TGP), an extremely stable, highly soluble, non-aggregating green fluorescent protein. TGP is a soluble variant of the fluorescent protein eCGP123, which despite being highly stable, has proven to be aggregation-prone. The X-ray crystal structure of eCGP123, also determined within the context of this paper, was used to carry out rational surface engineering to improve its solubility, leading to TGP. The approach involved simultaneously eliminating crystal lattice contacts while increasing the overall negative charge of the protein. Despite intentional disruption of lattice contacts and introduction ofmore » high entropy glutamate side chains, TGP crystallized readily in a number of different conditions and the X-ray crystal structure of TGP was determined to 1.9 Å resolution. The structural reasons for the enhanced stability of TGP and eCGP123 are discussed. We demonstrate the utility of using TGP as a fusion partner in various assays and significantly, in amyloid assays in which the standard fluorescent protein, EGFP, is undesirable because of aberrant oligomerization.« less

  1. Ab Initio Protein Structure Assembly Using Continuous Structure Fragments and Optimized Knowledge-based Force Field

    PubMed Central

    Xu, Dong; Zhang, Yang

    2012-01-01

    Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field. PMID:22411565

  2. Structure of Haze Forming Proteins in White Wines: Vitis vinifera Thaumatin-Like Proteins

    PubMed Central

    Marangon, Matteo; Van Sluyter, Steven C.; Waters, Elizabeth J.; Menz, Robert I.

    2014-01-01

    Grape thaumatin-like proteins (TLPs) play roles in plant-pathogen interactions and can cause protein haze in white wine unless removed prior to bottling. Different isoforms of TLPs have different hazing potential and aggregation behavior. Here we present the elucidation of the molecular structures of three grape TLPs that display different hazing potential. The three TLPs have very similar structures despite belonging to two different classes (F2/4JRU is a thaumatin-like protein while I/4L5H and H2/4MBT are VVTL1), and having different unfolding temperatures (56 vs. 62°C), with protein F2/4JRU being heat unstable and forming haze, while I/4L5H does not. These differences in properties are attributable to the conformation of a single loop and the amino acid composition of its flanking regions. PMID:25463627

  3. Structure of haze forming proteins in white wines: Vitis vinifera thaumatin-like proteins.

    PubMed

    Marangon, Matteo; Van Sluyter, Steven C; Waters, Elizabeth J; Menz, Robert I

    2014-01-01

    Grape thaumatin-like proteins (TLPs) play roles in plant-pathogen interactions and can cause protein haze in white wine unless removed prior to bottling. Different isoforms of TLPs have different hazing potential and aggregation behavior. Here we present the elucidation of the molecular structures of three grape TLPs that display different hazing potential. The three TLPs have very similar structures despite belonging to two different classes (F2/4JRU is a thaumatin-like protein while I/4L5H and H2/4MBT are VVTL1), and having different unfolding temperatures (56 vs. 62°C), with protein F2/4JRU being heat unstable and forming haze, while I/4L5H does not. These differences in properties are attributable to the conformation of a single loop and the amino acid composition of its flanking regions.

  4. Statistical mechanics of protein structural transitions: Insights from the island model

    PubMed Central

    Kobayashi, Yukio

    2016-01-01

    The so-called island model of protein structural transition holds that hydrophobic interactions are the key to both the folding and function of proteins. Herein, the genesis and statistical mechanical basis of the island model of transitions are reviewed, by presenting the results of simulations of such transitions. Elucidating the physicochemical mechanism of protein structural formation is the foundation for understanding the hierarchical structure of life at the microscopic level. Based on the results obtained to date using the island model, remaining problems and future work in the field of protein structures are discussed, referencing Professor Saitô’s views on the hierarchic structure of science. PMID:28409078

  5. Tuning of protein-surfactant interaction to modify the resultant structure.

    PubMed

    Mehan, Sumit; Aswal, Vinod K; Kohlbrecher, Joachim

    2015-09-01

    Small-angle neutron scattering and dynamic light scattering studies have been carried out to examine the interaction of bovine serum albumin (BSA) protein with different surfactants under varying solution conditions. We show that the interaction of anionic BSA protein (pH7) with surfactant and the resultant structure are strongly modified by the charge head group of the surfactant, ionic strength of the solution, and mixed surfactants. The protein-surfactant interaction is maximum when two components are oppositely charged, followed by components being similarly charged through the site-specific binding, and no interaction in the case of a nonionic surfactant. This interaction of protein with ionic surfactants is characterized by the fractal structure representing a bead-necklace structure of micellelike clusters adsorbed along the unfolded protein chain. The interaction is enhanced with ionic strength only in the case of site-specific binding of an anionic surfactant with an anionic protein, whereas it is almost unchanged for other complexes of cationic and nonionic surfactants with anionic proteins. Interestingly, the interaction of BSA protein with ionic surfactants is significantly suppressed in the presence of nonionic surfactant. These results with mixed surfactants thus can be used to fold back the unfolded protein as well as to prevent surfactant-induced protein unfolding. For different solution conditions, the results are interpreted in terms of a change in fractal dimension, the overall size of the protein-surfactant complex, and the number of micelles attached to the protein. The interplay of electrostatic and hydrophobic interactions is found to govern the resultant structure of complexes.

  6. Tuning of protein-surfactant interaction to modify the resultant structure

    NASA Astrophysics Data System (ADS)

    Mehan, Sumit; Aswal, Vinod K.; Kohlbrecher, Joachim

    2015-09-01

    Small-angle neutron scattering and dynamic light scattering studies have been carried out to examine the interaction of bovine serum albumin (BSA) protein with different surfactants under varying solution conditions. We show that the interaction of anionic BSA protein (p H 7 ) with surfactant and the resultant structure are strongly modified by the charge head group of the surfactant, ionic strength of the solution, and mixed surfactants. The protein-surfactant interaction is maximum when two components are oppositely charged, followed by components being similarly charged through the site-specific binding, and no interaction in the case of a nonionic surfactant. This interaction of protein with ionic surfactants is characterized by the fractal structure representing a bead-necklace structure of micellelike clusters adsorbed along the unfolded protein chain. The interaction is enhanced with ionic strength only in the case of site-specific binding of an anionic surfactant with an anionic protein, whereas it is almost unchanged for other complexes of cationic and nonionic surfactants with anionic proteins. Interestingly, the interaction of BSA protein with ionic surfactants is significantly suppressed in the presence of nonionic surfactant. These results with mixed surfactants thus can be used to fold back the unfolded protein as well as to prevent surfactant-induced protein unfolding. For different solution conditions, the results are interpreted in terms of a change in fractal dimension, the overall size of the protein-surfactant complex, and the number of micelles attached to the protein. The interplay of electrostatic and hydrophobic interactions is found to govern the resultant structure of complexes.

  7. Structure refinement of membrane proteins via molecular dynamics simulations.

    PubMed

    Dutagaci, Bercem; Heo, Lim; Feig, Michael

    2018-07-01

    A refinement protocol based on physics-based techniques established for water soluble proteins is tested for membrane protein structures. Initial structures were generated by homology modeling and sampled via molecular dynamics simulations in explicit lipid bilayer and aqueous solvent systems. Snapshots from the simulations were selected based on scoring with either knowledge-based or implicit membrane-based scoring functions and averaged to obtain refined models. The protocol resulted in consistent and significant refinement of the membrane protein structures similar to the performance of refinement methods for soluble proteins. Refinement success was similar between sampling in the presence of lipid bilayers and aqueous solvent but the presence of lipid bilayers may benefit the improvement of lipid-facing residues. Scoring with knowledge-based functions (DFIRE and RWplus) was found to be as good as scoring using implicit membrane-based scoring functions suggesting that differences in internal packing is more important than orientations relative to the membrane during the refinement of membrane protein homology models. © 2018 Wiley Periodicals, Inc.

  8. The Potato leafroll virus structural proteins manipulate overlapping, yet distinct protein interaction networks during infection

    USDA-ARS?s Scientific Manuscript database

    Potato leafroll virus (PLRV) produces a readthrough protein (RTP) via translational readthrough of the coat protein amber stop codon. The RTP functions as a structural component of the virion and as a non-incorporated protein in concert with numerous insect and plant proteins to regulate virus movem...

  9. Pre-calculated protein structure alignments at the RCSB PDB website.

    PubMed

    Prlic, Andreas; Bliven, Spencer; Rose, Peter W; Bluhm, Wolfgang F; Bizon, Chris; Godzik, Adam; Bourne, Philip E

    2010-12-01

    With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user. A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org andreas@sdsc.edu; pbourne@ucsd.edu.

  10. Overcoming bottlenecks in the membrane protein structural biology pipeline.

    PubMed

    Hardy, David; Bill, Roslyn M; Jawhari, Anass; Rothnie, Alice J

    2016-06-15

    Membrane proteins account for a third of the eukaryotic proteome, but are greatly under-represented in the Protein Data Bank. Unfortunately, recent technological advances in X-ray crystallography and EM cannot account for the poor solubility and stability of membrane protein samples. A limitation of conventional detergent-based methods is that detergent molecules destabilize membrane proteins, leading to their aggregation. The use of orthologues, mutants and fusion tags has helped improve protein stability, but at the expense of not working with the sequence of interest. Novel detergents such as glucose neopentyl glycol (GNG), maltose neopentyl glycol (MNG) and calixarene-based detergents can improve protein stability without compromising their solubilizing properties. Styrene maleic acid lipid particles (SMALPs) focus on retaining the native lipid bilayer of a membrane protein during purification and biophysical analysis. Overcoming bottlenecks in the membrane protein structural biology pipeline, primarily by maintaining protein stability, will facilitate the elucidation of many more membrane protein structures in the near future. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.

  11. Protein local structure alignment under the discrete Fréchet distance.

    PubMed

    Zhu, Binhai

    2007-12-01

    Protein structure alignment is a fundamental problem in computational and structural biology. While there has been lots of experimental/heuristic methods and empirical results, very few results are known regarding the algorithmic/complexity aspects of the problem, especially on protein local structure alignment. A well-known measure to characterize the similarity of two polygonal chains is the famous Fréchet distance, and with the application of protein-related research, a related discrete Fréchet distance has been used recently. In this paper, following the recent work of Jiang et al. we investigate the protein local structural alignment problem using bounded discrete Fréchet distance. Given m proteins (or protein backbones, which are 3D polygonal chains), each of length O(n), our main results are summarized as follows: * If the number of proteins, m, is not part of the input, then the problem is NP-complete; moreover, under bounded discrete Fréchet distance it is NP-hard to approximate the maximum size common local structure within a factor of n(1-epsilon). These results hold both when all the proteins are static and when translation/rotation are allowed. * If the number of proteins, m, is a constant, then there is a polynomial time solution for the problem.

  12. A scoring function based on solvation thermodynamics for protein structure prediction

    PubMed Central

    Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

    2012-01-01

    We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed. PMID:27493529

  13. Compact structure and proteins of pasta retard in vitro digestive evolution of branched starch molecular structure.

    PubMed

    Zou, Wei; Sissons, Mike; Warren, Frederick J; Gidley, Michael J; Gilbert, Robert G

    2016-11-05

    The roles that the compact structure and proteins in pasta play in retarding evolution of starch molecular structure during in vitro digestion are explored, using four types of cooked samples: whole pasta, pasta powder, semolina (with proteins) and extracted starch without proteins. These were subjected to in vitro digestion with porcine α-amylase, collecting samples at different times and characterizing the weight distribution of branched starch molecules using size-exclusion chromatography. Measurement of α-amylase activity showed that a protein (or proteins) from semolina or pasta powder interacted with α-amylase, causing reduced enzymatic activity and retarding digestion of branched starch molecules with hydrodynamic radius (Rh)<100nm; this protein(s) was susceptible to proteolysis. Thus the compact structure of pasta protects the starch and proteins in the interior of the whole pasta, reducing the enzymatic degradation of starch molecules, especially for molecules with Rh>100nm. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Automated structure determination of proteins with the SAIL-FLYA NMR method.

    PubMed

    Takeda, Mitsuhiro; Ikeya, Teppei; Güntert, Peter; Kainosho, Masatsune

    2007-01-01

    The labeling of proteins with stable isotopes enhances the NMR method for the determination of 3D protein structures in solution. Stereo-array isotope labeling (SAIL) provides an optimal stereospecific and regiospecific pattern of stable isotopes that yields sharpened lines, spectral simplification without loss of information, and the ability to collect rapidly and evaluate fully automatically the structural restraints required to solve a high-quality solution structure for proteins up to twice as large as those that can be analyzed using conventional methods. Here, we describe a protocol for the preparation of SAIL proteins by cell-free methods, including the preparation of S30 extract and their automated structure analysis using the FLYA algorithm and the program CYANA. Once efficient cell-free expression of the unlabeled or uniformly labeled target protein has been achieved, the NMR sample preparation of a SAIL protein can be accomplished in 3 d. A fully automated FLYA structure calculation can be completed in 1 d on a powerful computer system.

  15. Crystal structure of secretory protein Hcp3 from Pseudomonas aeruginosa.

    PubMed

    Osipiuk, Jerzy; Xu, Xiaohui; Cui, Hong; Savchenko, Alexei; Edwards, Aled; Joachimiak, Andrzej

    2011-03-01

    The Type VI secretion pathway transports proteins across the cell envelope of Gram-negative bacteria. Pseudomonas aeruginosa, an opportunistic Gram-negative bacterial pathogen infecting humans, uses the type VI secretion pathway to export specific effector proteins crucial for its pathogenesis. The HSI-I virulence locus encodes for several proteins that has been proposed to participate in protein transport including the Hcp1 protein, which forms hexameric rings that assemble into nanotubes in vitro. Two Hcp1 paralogues have been identified in the P. aeruginosa genome, Hsp2 and Hcp3. Here, we present the structure of the Hcp3 protein from P. aeruginosa. The overall structure of the monomer resembles Hcp1 despite the lack of amino-acid sequence similarity between the two proteins. The monomers assemble into hexamers similar to Hcp1. However, instead of forming nanotubes in head-to-tail mode like Hcp1, Hcp3 stacks its rings in head-to-head mode forming double-ring structures.

  16. The synthesis of recombinant membrane proteins in yeast for structural studies.

    PubMed

    Routledge, Sarah J; Mikaliunaite, Lina; Patel, Anjana; Clare, Michelle; Cartwright, Stephanie P; Bawa, Zharain; Wilks, Martin D B; Low, Floren; Hardy, David; Rothnie, Alice J; Bill, Roslyn M

    2016-02-15

    Historically, recombinant membrane protein production has been a major challenge meaning that many fewer membrane protein structures have been published than those of soluble proteins. However, there has been a recent, almost exponential increase in the number of membrane protein structures being deposited in the Protein Data Bank. This suggests that empirical methods are now available that can ensure the required protein supply for these difficult targets. This review focuses on methods that are available for protein production in yeast, which is an important source of recombinant eukaryotic membrane proteins. We provide an overview of approaches to optimize the expression plasmid, host cell and culture conditions, as well as the extraction and purification of functional protein for crystallization trials in preparation for structural studies. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Structure and Sequence Search on Aptamer-Protein Docking

    NASA Astrophysics Data System (ADS)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  18. Identification of structural domains in proteins by a graph heuristic.

    PubMed

    Wernisch, L; Hunting, M; Wodak, S J

    1999-05-15

    A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non-contiguous chain segments. The core algorithm uses the Kernighan-Lin graph heuristic to partition the protein into residue sets which display minimum interactions between them. These interactions are deduced from the weighted Voronoi diagram. The generated partitions are accepted or rejected on the basis of optimized criteria, representing basic expected physical properties of structural domains. The graph heuristic approach is shown to be very effective, it approximates closely the exact solution provided by a branch and bound algorithm for a number of test proteins. In addition, the overall performance of STRUDL is assessed on a set of 787 representative proteins from the Protein Data Bank by comparison to domain definitions in the CATH protein classification. The domains assigned by STRUDL agree with the CATH assignments in at least 81% of the tested proteins. This result is comparable to that obtained previously using PUU (Holm and Sander, Proteins 1994;9:256-268), the only other available algorithm designed to identify domains with any number of non-contiguous chain segments. A detailed discussion of the structures for which our assignments differ from those in CATH brings to light some clear inconsistencies between the concept of structural domains based on minimizing inter-domain interactions and that of delimiting structural motifs that represent acceptable folding topologies or architectures. Considering both concepts as complementary and combining them in a layered approach might be the way forward.

  19. Evaluation of variability in high-resolution protein structures by global distance scoring.

    PubMed

    Anzai, Risa; Asami, Yoshiki; Inoue, Waka; Ueno, Hina; Yamada, Koya; Okada, Tetsuji

    2018-01-01

    Systematic analysis of the statistical and dynamical properties of proteins is critical to understanding cellular events. Extraction of biologically relevant information from a set of high-resolution structures is important because it can provide mechanistic details behind the functional properties of protein families, enabling rational comparison between families. Most of the current structural comparisons are pairwise-based, which hampers the global analysis of increasing contents in the Protein Data Bank. Additionally, pairing of protein structures introduces uncertainty with respect to reproducibility because it frequently accompanies other settings for superimposition. This study introduces intramolecular distance scoring for the global analysis of proteins, for each of which at least several high-resolution structures are available. As a pilot study, we have tested 300 human proteins and showed that the method is comprehensively used to overview advances in each protein and protein family at the atomic level. This method, together with the interpretation of the model calculations, provide new criteria for understanding specific structural variation in a protein, enabling global comparison of the variability in proteins from different species.

  20. Protein Secondary Structures (alpha-helix and beta-sheet) at a Cellular Levle and Protein Fractions in Relation to Rumen Degradation Behaviours of Protein: A New Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu,P.

    2007-01-01

    Studying the secondary structure of proteins leads to an understanding of the components that make up a whole protein, and such an understanding of the structure of the whole protein is often vital to understanding its digestive behaviour and nutritive value in animals. The main protein secondary structures are the {alpha}-helix and {beta}-sheet. The percentage of these two structures in protein secondary structures influences protein nutritive value, quality and digestive behaviour. A high percentage of {beta}-sheet structure may partly cause a low access to gastrointestinal digestive enzymes, which results in a low protein value. The objectives of the present studymore » were to use advanced synchrotron-based Fourier transform IR (S-FTIR) microspectroscopy as a new approach to reveal the molecular chemistry of the protein secondary structures of feed tissues affected by heat-processing within intact tissue at a cellular level, and to quantify protein secondary structures using multicomponent peak modelling Gaussian and Lorentzian methods, in relation to protein digestive behaviours and nutritive value in the rumen, which was determined using the Cornell Net Carbohydrate Protein System. The synchrotron-based molecular chemistry research experiment was performed at the National Synchrotron Light Source at Brookhaven National Laboratory, US Department of Energy. The results showed that, with S-FTIR microspectroscopy, the molecular chemistry, ultrastructural chemical make-up and nutritive characteristics could be revealed at a high ultraspatial resolution ({approx}10 {mu}m). S-FTIR microspectroscopy revealed that the secondary structure of protein differed between raw and roasted golden flaxseeds in terms of the percentages and ratio of {alpha}-helixes and {beta}-sheets in the mid-IR range at the cellular level. By using multicomponent peak modelling, the results show that the roasting reduced (P <0.05) the percentage of {alpha}-helixes (from 47.1% to 36.1%: S

  1. Bio-Inspired Bright Structurally Colored Colloidal Amorphous Array Enhanced by Controlling Thickness and Black Background.

    PubMed

    Iwata, Masanori; Teshima, Midori; Seki, Takahiro; Yoshioka, Shinya; Takeoka, Yukikazu

    2017-07-01

    Inspired by Steller's jay, which displays angle-independent structural colors, angle-independent structurally colored materials are created, which are composed of amorphous arrays of submicrometer-sized fine spherical silica colloidal particles. When the colloidal amorphous arrays are thick, they do not appear colorful but almost white. However, the saturation of the structural color can be increased by (i) appropriately controlling the thickness of the array and (ii) placing the black background substrate. This is similar in the case of the blue feather of Steller's jay. Based on the knowledge gained through the biomimicry of structural colored materials, colloidal amorphous arrays on the surface of a black particle as the core particle are also prepared as colorful photonic pigments. Moreover, a structural color on-off system is successfully built by controlling the background brightness of the colloidal amorphous arrays. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Co-operative intra-protein structural response due to protein-protein complexation revealed through thermodynamic quantification: study of MDM2-p53 binding

    NASA Astrophysics Data System (ADS)

    Samanta, Sudipta; Mukherjee, Sanchita

    2017-10-01

    The p53 protein activation protects the organism from propagation of cells with damaged DNA having oncogenic mutations. In normal cells, activity of p53 is controlled by interaction with MDM2. The well understood p53-MDM2 interaction facilitates design of ligands that could potentially disrupt or prevent the complexation owing to its emergence as an important objective for cancer therapy. However, thermodynamic quantification of the p53-peptide induced structural changes of the MDM2-protein remains an area to be explored. This study attempts to understand the conformational free energy and entropy costs due to this complex formation from the histograms of dihedral angles generated from molecular dynamics simulations. Residue-specific quantification illustrates that, hydrophobic residues of the protein contribute maximum to the conformational thermodynamic changes. Thermodynamic quantification of structural changes of the protein unfold the fact that, p53 binding provides a source of inter-element cooperativity among the protein secondary structural elements, where the highest affected structural elements (α2 and α4) found at the binding site of the protein affects faraway structural elements (β1 and Loop1) of the protein. The communication perhaps involves water mediated hydrogen bonded network formation. Further, we infer that in inhibitory F19A mutation of P53, though Phe19 is important in the recognition process, it has less prominent contribution in the stability of the complex. Collectively, this study provides vivid microscopic understanding of the interaction within the protein complex along with exploring mutation sites, which will contribute further to engineer the protein function and binding affinity.

  3. Characterization of structural proteins of hirame rhabdovirus, HRV

    USGS Publications Warehouse

    Nishizawa, Toyohiko; Yoshimizu, Mamoru; Winton, James; Ahne, Winfried; Kimura, Takahisa

    1991-01-01

    Structural proteins of hirame rhabdovirus (HRV) were analyzed by SDS-polyacrylarnide gel electrophoresis, western blotting, 2-dimensional gel electrophoresis, and Triton X-100 treatment. Purified HRV virions were composed of: polymerase (L), glycoprotein (G), nucleoprotein (N), and 2 matrix proteins (M1 and M2). Based upon their relative mobilities, the estimated molecular weights of the proteins were: L, 156 KDa; G, 68 KDa; N, 46.4 KDa; M1, 26.4 KDa; and M2, 19.9 KDa. The electrophorehc pattern formed by the structural proteins of HRV was clearly different from that formed by pike fry rhabdovirus, spring viremia of carp virus, eel virus of America, and eel virus European X which belong to the Vesiculovirus genus; however, it resembled the pattern formed by structural proteins of viral hemorrhagic septicemia virus (VHSV) and infectious hematopoietic necrosis virus (IHNV) which are members of the Lyssavirus genus. Among HRV, IHNV, and VHSV, differences were observed in the relative mobilities of the G, N, M1, and M2 proteins. Western blot analysis revealed that the G. N, and M2 proteins of HRV shared antigenic determinants with IHNV and VHSV, but not with any of the 4 fish vesiculoviruses tested. Cross-reactions between the M1 proteins of HRV, IHNV, or VHSV were not detected in this assay. Two-dimensional gel electrophoresis was used to show that HRV differed from IHNV or VHSV in the isoelectric point (PI) of the M1 and M2 proteins. In this system, 2 forms of the M1 protein of HRV and IHNV were observed.These subspecies of M1 had the same relative mobility but different p1 values. Treatment of purified virions with 2% Triton X-100 in Tris buffer containing NaCl removed the G, M1, and M2 proteins of IHNV, but HRV virions were more stable under these conditions.

  4. (PS)2: protein structure prediction server version 3.0.

    PubMed

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. MD simulations of papillomavirus DNA-E2 protein complexes hints at a protein structural code for DNA deformation.

    PubMed

    Falconi, M; Oteri, F; Eliseo, T; Cicero, D O; Desideri, A

    2008-08-01

    The structural dynamics of the DNA binding domains of the human papillomavirus strain 16 and the bovine papillomavirus strain 1, complexed with their DNA targets, has been investigated by modeling, molecular dynamics simulations, and nuclear magnetic resonance analysis. The simulations underline different dynamical features of the protein scaffolds and a different mechanical interaction of the two proteins with DNA. The two protein structures, although very similar, show differences in the relative mobility of secondary structure elements. Protein structural analyses, principal component analysis, and geometrical and energetic DNA analyses indicate that the two transcription factors utilize a different strategy in DNA recognition and deformation. Results show that the protein indirect DNA readout is not only addressable to the DNA molecule flexibility but it is finely tuned by the mechanical and dynamical properties of the protein scaffold involved in the interaction.

  6. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  7. Dynamic New World: Refining Our View of Protein Structure, Function and Evolution

    PubMed Central

    Mannige, Ranjan V.

    2014-01-01

    Proteins are crucial to the functioning of all lifeforms. Traditional understanding posits that a single protein occupies a single structure (“fold”), which performs a single function. This view is radically challenged with the recognition that high structural dynamism—the capacity to be extra “floppy”—is more prevalent in functional proteins than previously assumed. As reviewed here, this dynamic take on proteins affects our understanding of protein “structure”, function, and evolution, and even gives us a glimpse into protein origination. Specifically, this review will discuss historical developments concerning protein structure, and important new relationships between dynamism and aspects of protein sequence, structure, binding modes, binding promiscuity, evolvability, and origination. Along the way, suggestions will be provided for how key parts of textbook definitions—that so far have excluded membership to intrinsically disordered proteins (IDPs)—could be modified to accommodate our more dynamic understanding of proteins. PMID:28250374

  8. Amaranth, quinoa and chia protein isolates: Physicochemical and structural properties.

    PubMed

    López, Débora N; Galante, Micaela; Robson, María; Boeris, Valeria; Spelzini, Darío

    2018-04-01

    An increasing use of vegetable protein is required to support the production of protein-rich foods which can replace animal proteins in the human diet. Amaranth, chia and quinoa seeds contain proteins which have biological and functional properties that provide nutritional benefits due to their reasonably well-balanced aminoacid content. This review analyses these vegetable proteins and focuses on recent research on protein classification and isolation as well as structural characterization by means of fluorescence spectroscopy, surface hydrophobicity and differential scanning calorimetry. Isolation procedures have a profound influence on the structural properties of the proteins and, therefore, on their in vitro digestibility. The present article provides a comprehensive overview of the properties and characterization of these proteins. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Applying graph theory to protein structures: an atlas of coiled coils.

    PubMed

    Heal, Jack W; Bartlett, Gail J; Wood, Christopher W; Thomson, Andrew R; Woolfson, Derek N

    2018-05-02

    To understand protein structure, folding and function fully and to design proteins de novo reliably, we must learn from natural protein structures that have been characterised experimentally. The number of protein structures available is large and growing exponentially, which makes this task challenging. Indeed, computational resources are becoming increasingly important for classifying and analysing this resource. Here, we use tools from graph theory to define an atlas classification scheme for automatically categorising certain protein substructures. Focusing on the α-helical coiled coils, which are ubiquitous protein-structure and protein-protein interaction motifs, we present a suite of computational resources designed for analysing these assemblies. iSOCKET enables interactive analysis of side-chain packing within proteins to identify coiled coils automatically and with considerable user control. Applying a graph theory-based atlas classification scheme to structures identified by iSOCKET gives the Atlas of Coiled Coils, a fully automated, updated overview of extant coiled coils. The utility of this approach is illustrated with the first formal classification of an emerging subclass of coiled coils called α-helical barrels. Furthermore, in the Atlas, the known coiled-coil universe is presented alongside a partial enumeration of the 'dark matter' of coiled-coil structures; i.e., those coiled-coil architectures that are theoretically possible but have not been observed to date, and thus present defined targets for protein design. iSOCKET is available as part of the open-source GitHub repository associated with this work (https://github.com/woolfson-group/isocket). This repository also contains all the data generated when classifying the protein graphs. The Atlas of Coiled Coils is available at: http://coiledcoils.chm.bris.ac.uk/atlas/app.

  10. Recent Advances and Applications in Synchrotron X-Ray Protein Footprinting for Protein Structure and Dynamics Elucidation.

    PubMed

    Gupta, Sayan; Feng, Jun; Chance, Mark; Ralston, Corie

    2016-01-01

    Synchrotron X-ray Footprinting is a powerful in situ hydroxyl radical labeling method for analysis of protein structure, interactions, folding and conformation change in solution. In this method, water is ionized by high flux density broad band synchrotron X-rays to produce a steady-state concentration of hydroxyl radicals, which then react with solvent accessible side-chains. The resulting stable modification products are analyzed by liquid chromatography coupled to mass spectrometry. A comparative reactivity rate between known and unknown states of a protein provides local as well as global information on structural changes, which is then used to develop structural models for protein function and dynamics. In this review we describe the XF-MS method, its unique capabilities and its recent technical advances at the Advanced Light Source. We provide a comparison of other hydroxyl radical and mass spectrometry based methods with XFMS. We also discuss some of the latest developments in its usage for studying bound water, transmembrane proteins and photosynthetic protein components, and the synergy of the method with other synchrotron based structural biology methods.

  11. Structure Prediction and Analysis of DNA Transposon and LINE Retrotransposon Proteins*

    PubMed Central

    Abrusán, György; Zhang, Yang; Szilágyi, András

    2013-01-01

    Despite the considerable amount of research on transposable elements, no large-scale structural analyses of the TE proteome have been performed so far. We predicted the structures of hundreds of proteins from a representative set of DNA and LINE transposable elements and used the obtained structural data to provide the first general structural characterization of TE proteins and to estimate the frequency of TE domestication and horizontal transfer events. We show that 1) ORF1 and Gag proteins of retrotransposons contain high amounts of structural disorder; thus, despite their very low conservation, the presence of disordered regions and probably their chaperone function is conserved. 2) The distribution of SCOP classes in DNA transposons and LINEs indicates that the proteins of DNA transposons are more ancient, containing folds that already existed when the first cellular organisms appeared. 3) DNA transposon proteins have lower contact order than randomly selected reference proteins, indicating rapid folding, most likely to avoid protein aggregation. 4) Structure-based searches for TE homologs indicate that the overall frequency of TE domestication events is low, whereas we found a relatively high number of cases where horizontal transfer, frequently involving parasites, is the most likely explanation for the observed homology. PMID:23530042

  12. Structure and Function of Lipopolysaccharide Binding Protein

    NASA Astrophysics Data System (ADS)

    Schumann, Ralf R.; Leong, Steven R.; Flaggs, Gail W.; Gray, Patrick W.; Wright, Samuel D.; Mathison, John C.; Tobias, Peter S.; Ulevitch, Richard J.

    1990-09-01

    The primary structure of lipopolysaccharide binding protein (LBP), a trace plasma protein that binds to the lipid A moiety of bacterial lipopolysaccharides (LPSs), was deduced by sequencing cloned complementary DNA. LBP shares sequence identity with another LPS binding protein found in granulocytes, bactericidal/permeability-increasing protein, and with cholesterol ester transport protein of the plasma. LBP may control the response to LPS under physiologic conditions by forming high-affinity complexes with LPS that bind to monocytes and macrophages, which then secrete tumor necrosis factor. The identification of this pathway for LPS-induced monocyte stimulation may aid in the development of treatments for diseases in which Gram-negative sepsis or endotoxemia are involved.

  13. Water polygons in high-resolution protein crystal structures.

    PubMed

    Lee, Jonas; Kim, Sung-Hou

    2009-07-01

    We have analyzed the interstitial water (ISW) structures in 1500 protein crystal structures deposited in the Protein Data Bank that have greater than 1.5 A resolution with less than 90% sequence similarity with each other. We observed varieties of polygonal water structures composed of three to eight water molecules. These polygons may represent the time- and space-averaged structures of "stable" water oligomers present in liquid water, and their presence as well as relative population may be relevant in understanding physical properties of liquid water at a given temperature. On an average, 13% of ISWs are localized enough to be visible by X-ray diffraction. Of those, averages of 78% are water molecules in the first water layer on the protein surface. Of the localized ISWs beyond the first layer, almost half of them form water polygons such as trigons, tetragons, as well as expected pentagons, hexagons, higher polygons, partial dodecahedrons, and disordered networks. Most of the octagons and nanogons are formed by fusion of smaller polygons. The trigons are most commonly observed. We suggest that our observation provides an experimental basis for including these water polygon structures in correlating and predicting various water properties in liquid state.

  14. The Effects of Lesson Screen Background Color on Declarative and Structural Knowledge

    ERIC Educational Resources Information Center

    Clariana, Roy B.; Prestera, Gustavo E.

    2009-01-01

    This experimental investigation replicates previous investigations of the effects of left margin screen background color hue to signal lesson sections on declarative knowledge and extends those investigations by adding a measure of structural knowledge. Participants (N = 80) were randomly assigned to receive 1 of 4 computer-based lesson treatments…

  15. Data-assisted protein structure modeling by global optimization in CASP12.

    PubMed

    Joo, Keehyoung; Heo, Seungryong; Joung, InSuk; Hong, Seung Hwan; Lee, Sung Jong; Lee, Jooyoung

    2018-03-01

    In CASP12, 2 types of data-assisted protein structure modeling were experimented. Either SAXS experimental data or cross-linking experimental data was provided for a selected number of CASP12 targets that the CASP12 predictor could utilize for better protein structure modeling. We devised 2 separate energy terms for SAXS data and cross-linking data to drive the model structures into more native-like structures that satisfied the given experimental data as much as possible. In CASP11, we successfully performed protein structure modeling using simulated sparse and ambiguously assigned NOE data and/or correct residue-residue contact information, where the only energy term that folded the protein into its native structure was the term which was originated from the given experimental data. However, the 2 types of experimental data provided in CASP12 were far from being sufficient enough to fold the target protein into its native structure because SAXS data provides only the overall shape of the molecule and the cross-linking contact information provides only very low-resolution distance information. For this reason, we combined the SAXS or cross-linking energy term with our regular modeling energy function that includes both the template energy term and the de novo energy terms. By optimizing the newly formulated energy function, we obtained protein models that fit better with provided SAXS data than the X-ray structure of the target. However, the improvement of the model relative to the 1 modeled without the SAXS data, was not significant. Consistent structural improvement was achieved by incorporating cross-linking data into the protein structure modeling. © 2018 Wiley Periodicals, Inc.

  16. WIWS: a protein structure bioinformatics Web service collection.

    PubMed

    Hekkelman, M L; Te Beek, T A H; Pettifer, S R; Thorne, D; Attwood, T K; Vriend, G

    2010-07-01

    The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of programmatically accessible web services. We report here a collection of WHAT IF-based protein structure bioinformatics web services: these relate to structure quality, the use of symmetry in crystal structures, structure correction and optimization, adding hydrogens and optimizing hydrogen bonds and a series of geometric calculations. The freely accessible web services are based on the industry standard WS-I profile and the EMBRACE technical guidelines, and are available via both REST and SOAP paradigms. The web services run on a dedicated computational cluster; their function and availability is monitored daily.

  17. Tactile Teaching: Exploring Protein Structure/Function Using Physical Models

    ERIC Educational Resources Information Center

    Herman, Tim; Morris, Jennifer; Colton, Shannon; Batiza, Ann; Patrick, Michael; Franzen, Margaret; Goodsell, David S.

    2006-01-01

    The technology now exists to construct physical models of proteins based on atomic coordinates of solved structures. We review here our recent experiences in using physical models to teach concepts of protein structure and function at both the high school and the undergraduate levels. At the high school level, physical models are used in a…

  18. Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins.

    PubMed

    Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi

    2017-10-12

    Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.

  19. Fast large-scale clustering of protein structures using Gauss integrals.

    PubMed

    Harder, Tim; Borg, Mikael; Boomsma, Wouter; Røgen, Peter; Hamelryck, Thomas

    2012-02-15

    Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.

  20. Vertebrate Membrane Proteins: Structure, Function, and Insights from Biophysical Approaches

    PubMed Central

    MÜLLER, DANIEL J.; WU, NAN; PALCZEWSKI, KRZYSZTOF

    2008-01-01

    Membrane proteins are key targets for pharmacological intervention because they are vital for cellular function. Here, we analyze recent progress made in the understanding of the structure and function of membrane proteins with a focus on rhodopsin and development of atomic force microscopy techniques to study biological membranes. Membrane proteins are compartmentalized to carry out extra- and intracellular processes. Biological membranes are densely populated with membrane proteins that occupy approximately 50% of their volume. In most cases membranes contain lipid rafts, protein patches, or paracrystalline formations that lack the higher-order symmetry that would allow them to be characterized by diffraction methods. Despite many technical difficulties, several crystal structures of membrane proteins that illustrate their internal structural organization have been determined. Moreover, high-resolution atomic force microscopy, near-field scanning optical microscopy, and other lower resolution techniques have been used to investigate these structures. Single-molecule force spectroscopy tracks interactions that stabilize membrane proteins and those that switch their functional state; this spectroscopy can be applied to locate a ligand-binding site. Recent development of this technique also reveals the energy landscape of a membrane protein, defining its folding, reaction pathways, and kinetics. Future development and application of novel approaches during the coming years should provide even greater insights to the understanding of biological membrane organization and function. PMID:18321962

  1. CONFOLD2: improved contact-driven ab initio protein structure modeling.

    PubMed

    Adhikari, Badri; Cheng, Jianlin

    2018-01-25

    Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .

  2. Structural and Functional Analyses of a Sterol Carrier Protein in Spodoptera litura

    PubMed Central

    Xu, Rui; Zheng, Sichun; He, Hongwu; Wan, Jian; Feng, Qili

    2014-01-01

    Backgrounds In insects, cholesterol is one of the membrane components in cells and a precursor of ecdysteroid biosynthesis. Because insects lack two key enzymes, squalene synthase and lanosterol synthase, in the cholesterol biosynthesis pathway, they cannot autonomously synthesize cholesterol de novo from simple compounds and therefore have to obtain sterols from their diet. Sterol carrier protein (SCP) is a cholesterol-binding protein responsible for cholesterol absorption and transport. Results In this study, a model of the three-dimensional structure of SlSCPx-2 in Spodoptera litura, a destructive polyphagous agricultural pest insect in tropical and subtropical areas, was constructed. Docking of sterol and fatty acid ligands to SlSCPx-2 and ANS fluorescent replacement assay showed that SlSCPx-2 was able to bind with relatively high affinities to cholesterol, stearic acid, linoleic acid, stigmasterol, oleic acid, palmitic acid and arachidonate, implying that SlSCPx may play an important role in absorption and transport of these cholesterol and fatty acids from host plants. Site-directed mutation assay of SlSCPx-2 suggests that amino acid residues F53, W66, F89, F110, I115, T128 and Q131 are critical for the ligand-binding activity of the SlSCPx-2 protein. Virtual ligand screening resulted in identification of several lead compounds which are potential inhibitors of SlSCPx-2. Bioassay for inhibitory effect of five selected compounds showed that AH-487/41731687, AG-664/14117324, AG-205/36813059 and AG-205/07775053 inhibited the growth of S. litura larvae. Conclusions Compounds AH-487/41731687, AG-664/14117324, AG-205/36813059 and AG-205/07775053 selected based on structural modeling showed binding affinity to SlSCPx-2 protein and inhibitory effect on the growth of S. litura larvae. PMID:24454688

  3. NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins.

    PubMed

    Borguesan, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio

    2017-03-01

    The exponential growth in the number of experimentally determined three-dimensional protein structures provide a new and relevant knowledge about the conformation of amino acids in proteins. Only a few of probability densities of amino acids are publicly available for use in structure validation and prediction methods. NIAS (Neighbors Influence of Amino acids and Secondary structures) is a web-based tool used to extract information about conformational preferences of amino acid residues and secondary structures in experimental-determined protein templates. This information is useful, for example, to characterize folds and local motifs in proteins, molecular folding, and can help the solution of complex problems such as protein structure prediction, protein design, among others. The NIAS-Server and supplementary data are available at http://sbcb.inf.ufrgs.br/nias .

  4. Heterogeneous nucleation of hydroxyapatite on protein: structural effect of silk sericin

    PubMed Central

    Takeuchi, Akari; Ohtsuki, Chikara; Miyazaki, Toshiki; Kamitakahara, Masanobu; Ogata, Shin-ichi; Yamazaki, Masao; Furutani, Yoshiaki; Kinoshita, Hisao; Tanihara, Masao

    2005-01-01

    Acidic proteins play an important role during mineral formation in biological systems, but the mechanism of mineral formation is far from understood. In this paper, we report on the relationship between the structure of a protein and hydroxyapatite deposition under biomimetic conditions. Sericin, a type of silk protein, was adopted as a suitable protein for studying structural effect on hydroxyapatite deposition, since it forms a hydroxyapatite layer on its surface in a metastable calcium phosphate solution, and its structure has been reported. Sericin effectively induced hydroxyapatite nucleation when it has high molecular weight and a β sheet structure. This indicates that the specific structure of a protein can effectively induce heterogeneous nucleation of hydroxyapatite in a biomimetic solution, i.e. a metastable calcium phosphate solution. This finding is useful in understanding biomineralization, as well as for the design of organic polymers that can effectively induce hydroxyapatite nucleation. PMID:16849195

  5. Structural Conservation of the Myoviridae Phage Tail Sheath Protein Fold

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aksyuk, Anastasia A.; Kurochkina, Lidia P.; Fokine, Andrei

    2012-02-21

    Bacteriophage phiKZ is a giant phage that infects Pseudomonas aeruginosa, a human pathogen. The phiKZ virion consists of a 1450 {angstrom} diameter icosahedral head and a 2000 {angstrom}-long contractile tail. The structure of the whole virus was previously reported, showing that its tail organization in the extended state is similar to the well-studied Myovirus bacteriophage T4 tail. The crystal structure of a tail sheath protein fragment of phiKZ was determined to 2.4 {angstrom} resolution. Furthermore, crystal structures of two prophage tail sheath proteins were determined to 1.9 and 3.3 {angstrom} resolution. Despite low sequence identity between these proteins, all ofmore » these structures have a similar fold. The crystal structure of the phiKZ tail sheath protein has been fitted into cryo-electron-microscopy reconstructions of the extended tail sheath and of a polysheath. The structural rearrangement of the phiKZ tail sheath contraction was found to be similar to that of phage T4.« less

  6. The value of protein structure classification information—Surveying the scientific literature

    PubMed Central

    Fox, Naomi K.; Brenner, Steven E.

    2015-01-01

    ABSTRACT The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP–extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012–2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non‐SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings. Proteins 2015; 83:2025–2038. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26313554

  7. Structural bioinformatics: methods, concepts and applications to blood coagulation proteins.

    PubMed

    Villoutreix, Bruno O

    2002-06-01

    Structural and theoretical analyses of proteins are central to the understanding of complex molecular mechanisms and are fundamental to the drug discovery process. Computational techniques yield useful insights into an ever-wider range of biomolecular systems. Protein three-dimensional structures and molecular functions can be predicted in some circumstances, while experimental structures can be analyzed in depth via such computational approaches. Non-covalent binding of biomolecules can be understood by considering structural, thermodynamic and kinetic issues, and theoretical simulations of such events can be attempted. The central role of electrostatic interactions with regard to protein function, structure and stability has been investigated and some electrostatic properties can be modeled theoretically. Computer methods thus help to prioritize, design, analyze and rationalize biochemical experiments. Cardiovascular diseases and associated blood coagulation disorders are leading causes of death worldwide. Blood coagulation involves more than 30 proteins that interact specifically with various degrees of affinity. Many of these molecules can also bind transiently to phospholipid surfaces. Numerous point mutations in the genes of coagulation proteins and regulators have been identified. Understanding the coagulation cascade, its regulation and the impact of mutations is required for the development of new therapies and diagnostic tools. In this review, we describe concepts and methods pertaining to the field of structural bioinformatics. We provide examples of applications of these approaches to blood coagulation proteins and show that such studies can give insights about molecular mechanisms contributing to cardiovascular disease susceptibility.

  8. The Structure and Function of Non-Collagenous Bone Proteins

    NASA Technical Reports Server (NTRS)

    Hook, Magnus

    1997-01-01

    The long-term goal for this program is to determine the structural and functional relationships of bone proteins and proteins that interact with bone. This information will used to design useful pharmacological compounds that will have a beneficial effect in osteoporotic patients and in the osteoporotic-like effects experienced on long duration space missions. The first phase of this program, funded under a cooperative research agreement with NASA through the Texas Medical Center, aimed to develop powerful recombinant expression systems and purification methods for production of large amounts of target proteins. Proteins expressed in sufficient'amount and purity would be characterized by a variety of structural methods, and made available for crystallization studies. In order to increase the likelihood of crystallization and subsequent high resolution solution of structures, we undertook to develop expression of normal and mutant forms of proteins by bacterial and mammalian cells. In addition to the main goals of this program, we would also be able to provide reagents for other related studies, including development of anti-fibrotic and anti-metastatic therapeutics.

  9. The history of the CATH structural classification of protein domains.

    PubMed

    Sillitoe, Ian; Dawson, Natalie; Thornton, Janet; Orengo, Christine

    2015-12-01

    This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  10. GRID: a high-resolution protein structure refinement algorithm.

    PubMed

    Chitsaz, Mohsen; Mayo, Stephen L

    2013-03-05

    The energy-based refinement of protein structures generated by fold prediction algorithms to atomic-level accuracy remains a major challenge in structural biology. Energy-based refinement is mainly dependent on two components: (1) sufficiently accurate force fields, and (2) efficient conformational space search algorithms. Focusing on the latter, we developed a high-resolution refinement algorithm called GRID. It takes a three-dimensional protein structure as input and, using an all-atom force field, attempts to improve the energy of the structure by systematically perturbing backbone dihedrals and side-chain rotamer conformations. We compare GRID to Backrub, a stochastic algorithm that has been shown to predict a significant fraction of the conformational changes that occur with point mutations. We applied GRID and Backrub to 10 high-resolution (≤ 2.8 Å) crystal structures from the Protein Data Bank and measured the energy improvements obtained and the computation times required to achieve them. GRID resulted in energy improvements that were significantly better than those attained by Backrub while expending about the same amount of computational resources. GRID resulted in relaxed structures that had slightly higher backbone RMSDs compared to Backrub relative to the starting crystal structures. The average RMSD was 0.25 ± 0.02 Å for GRID versus 0.14 ± 0.04 Å for Backrub. These relatively minor deviations indicate that both algorithms generate structures that retain their original topologies, as expected given the nature of the algorithms. Copyright © 2012 Wiley Periodicals, Inc.

  11. Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

    PubMed Central

    del Val, Coral; White, Stephen H.

    2014-01-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  12. Non-Structural Proteins of Arthropod-Borne Bunyaviruses: Roles and Functions

    PubMed Central

    Eifan, Saleh; Schnettler, Esther; Dietrich, Isabelle; Kohl, Alain; Blomström, Anne-Lie

    2013-01-01

    Viruses within the Bunyaviridae family are tri-segmented, negative-stranded RNA viruses. The family includes several emerging and re-emerging viruses of humans, animals and plants, such as Rift Valley fever virus, Crimean-Congo hemorrhagic fever virus, La Crosse virus, Schmallenberg virus and tomato spotted wilt virus. Many bunyaviruses are arthropod-borne, so-called arboviruses. Depending on the genus, bunyaviruses encode, in addition to the RNA-dependent RNA polymerase and the different structural proteins, one or several non-structural proteins. These non-structural proteins are not always essential for virus growth and replication but can play an important role in viral pathogenesis through their interaction with the host innate immune system. In this review, we will summarize current knowledge and understanding of insect-borne bunyavirus non-structural protein function(s) in vertebrate, plant and arthropod. PMID:24100888

  13. Target Highlights in CASP9: Experimental Target Structures for the Critical Assessment of Techniques for Protein Structure Prediction

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Bartual, Sergio G.; Bazan, J. Fernando; Berman, Helen; Casteel, Darren E.; Christodoulou, Evangelos; Everett, John K.; Hausmann, Jens; Heidebrecht, Tatjana; Hills, Tanya; Hui, Raymond; Hunt, John F.; Jayaraman, Seetharaman; Joachimiak, Andrzej; Kennedy, Michael A.; Kim, Choel; Lingel, Andreas; Michalska, Karolina; Montelione, Gaetano T.; Otero, José M.; Perrakis, Anastassis; Pizarro, Juan C.; van Raaij, Mark J.; Ramelot, Theresa A.; Rousseau, Francois; Tong, Liang; Wernimont, Amy K.; Young, Jasmine; Schwede, Torsten

    2011-01-01

    One goal of the CASP Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, i.e. the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this manuscript, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fibre protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ (PKGIβ) dimerization/docking domain, the ectodomain of the JTB (Jumping Translocation Breakpoint) transmembrane receptor, Autotaxin (ATX) in complex with an inhibitor, the DNA-Binding J-Binding Protein 1 (JBP1) domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the Phycobilisome (PBS) core-membrane linker (LCM) phycobiliprotein ApcE from Synechocystis, the Heat shock protein 90 (Hsp90) activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae. PMID:22020785

  14. Modeling Structure and Dynamics of Protein Complexes with SAXS Profiles

    PubMed Central

    Schneidman-Duhovny, Dina; Hammel, Michal

    2018-01-01

    Small-angle X-ray scattering (SAXS) is an increasingly common and useful technique for structural characterization of molecules in solution. A SAXS experiment determines the scattering intensity of a molecule as a function of spatial frequency, termed SAXS profile. SAXS profiles can be utilized in a variety of molecular modeling applications, such as comparing solution and crystal structures, structural characterization of flexible proteins, assembly of multi-protein complexes, and modeling of missing regions in the high-resolution structure. Here, we describe protocols for modeling atomic structures based on SAXS profiles. The first protocol is for comparing solution and crystal structures including modeling of missing regions and determination of the oligomeric state. The second protocol performs multi-state modeling by finding a set of conformations and their weights that fit the SAXS profile starting from a single-input structure. The third protocol is for protein-protein docking based on the SAXS profile of the complex. We describe the underlying software, followed by demonstrating their application on interleukin 33 (IL33) with its primary receptor ST2 and DNA ligase IV-XRCC4 complex. PMID:29605933

  15. On Ramachandran angles, closed strings and knots in protein structure

    NASA Astrophysics Data System (ADS)

    Chen, Si; Niemi, Antti J.

    2016-08-01

    The Ramachandran angles (φ,\\psi ) of a protein backbone form the vertices of a piecewise geodesic curve on the surface of a torus. When the ends of the curve are connected to each other similarly, by a geodesic, the result is a closed string that in general wraps around the torus a number of times both in the meridional and the longitudinal directions. The two wrapping numbers are global characteristics of the protein structure. A statistical analysis of the wrapping numbers in terms of crystallographic x-ray structures in the protein data bank (PDB) reveals that proteins have no net chirality in the ϕ direction but in the ψ direction, proteins prefer to display chirality. A comparison between the wrapping numbers and the concept of folding index discloses a non-linearity in their relationship. Thus these three integer valued invariants can be used in tandem, to scrutinize and classify the global loop structure of individual PDB proteins, in terms of the overall fold topology.

  16. Effect of trehalose on protein structure

    PubMed Central

    Jain, Nishant Kumar; Roy, Ipsita

    2009-01-01

    Trehalose is a ubiquitous molecule that occurs in lower and higher life forms but not in mammals. Till about 40 years ago, trehalose was visualized as a storage molecule, aiding the release of glucose for carrying out cellular functions. This perception has now changed dramatically. The role of trehalose has expanded, and this molecule has now been implicated in a variety of situations. Trehalose is synthesized as a stress-responsive factor when cells are exposed to environmental stresses like heat, cold, oxidation, desiccation, and so forth. When unicellular organisms are exposed to stress, they adapt by synthesizing huge amounts of trehalose, which helps them in retaining cellular integrity. This is thought to occur by prevention of denaturation of proteins by trehalose, which would otherwise degrade under stress. This explanation may be rational, since recently, trehalose has been shown to slow down the rate of polyglutamine-mediated protein aggregation and the resultant pathogenesis by stabilizing an aggregation-prone model protein. In recent years, trehalose has also proved useful in the cryopreservation of sperm and stem cells and in the development of a highly reliable organ preservation solution. This review aims to highlight the changing perception of the role of trehalose over the last 10 years and to propose common mechanisms that may be involved in all the myriad ways in which trehalose stabilizes protein structures. These will take into account the structure of trehalose molecule and its interactions with its environment, and the explanations will focus on the role of trehalose in preventing protein denaturation. PMID:19177348

  17. RECURSIVE PROTEIN MODELING: A DIVIDE AND CONQUER STRATEGY FOR PROTEIN STRUCTURE PREDICTION AND ITS CASE STUDY IN CASP9

    PubMed Central

    CHENG, JIANLIN; EICKHOLT, JESSE; WANG, ZHENG; DENG, XIN

    2013-01-01

    After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can signicantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction. PMID:22809379

  18. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  19. Impact of genetic variation on three dimensional structure and function of proteins

    PubMed Central

    Bhattacharya, Roshni; Rose, Peter W.; Burley, Stephen K.

    2017-01-01

    The Protein Data Bank (PDB; http://wwpdb.org) was established in 1971 as the first open access digital data resource in biology with seven protein structures as its initial holdings. The global PDB archive now contains more than 126,000 experimentally determined atomic level three-dimensional (3D) structures of biological macromolecules (proteins, DNA, RNA), all of which are freely accessible via the Internet. Knowledge of the 3D structure of the gene product can help in understanding its function and role in disease. Of particular interest in the PDB archive are proteins for which 3D structures of genetic variant proteins have been determined, thus revealing atomic-level structural differences caused by the variation at the DNA level. Herein, we present a systematic and qualitative analysis of such cases. We observe a wide range of structural and functional changes caused by single amino acid differences, including changes in enzyme activity, aggregation propensity, structural stability, binding, and dissociation, some in the context of large assemblies. Structural comparison of wild type and mutated proteins, when both are available, provide insights into atomic-level structural differences caused by the genetic variation. PMID:28296894

  20. DNA nanotubes for NMR structure determination of membrane proteins.

    PubMed

    Bellot, Gaëtan; McClintock, Mark A; Chou, James J; Shih, William M

    2013-04-01

    Finding a way to determine the structures of integral membrane proteins using solution nuclear magnetic resonance (NMR) spectroscopy has proved to be challenging. A residual-dipolar-coupling-based refinement approach can be used to resolve the structure of membrane proteins up to 40 kDa in size, but to do this you need a weak-alignment medium that is detergent-resistant and it has thus far been difficult to obtain such a medium suitable for weak alignment of membrane proteins. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400-nm-long six-helix bundles, each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, toward collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes using counter ions and small DNA-binding molecules. This detergent-resistant liquid-crystal medium offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility and structural programmability. Production of sufficient nanotubes for four or five NMR experiments can be completed in 1 week by a single individual.

  1. Fusion proteins as alternate crystallization paths to difficult structure problems

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.; Rueker, Florian; Ho, Joseph X.; Lim, Kap; Keeling, Kim; Gilliland, Gary; Ji, Xinhua

    1994-01-01

    The three-dimensional structure of a peptide fusion product with glutathione transferase from Schistosoma japonicum (SjGST) has been solved by crystallographic methods to 2.5 A resolution. Peptides or proteins can be fused to SjGST and expressed in a plasmid for rapid synthesis in Escherichia coli. Fusion proteins created by this commercial method can be purified rapidly by chromatography on immobilized glutathione. The potential utility of using SjGST fusion proteins as alternate paths to the crystallization and structure determination of proteins is demonstrated.

  2. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles

    PubMed Central

    Brender, Jeffrey R.; Zhang, Yang

    2015-01-01

    The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533

  3. Sucralose Destabilization of Protein Structure.

    PubMed

    Chen, Lee; Shukla, Nimesh; Cho, Inha; Cohn, Erin; Taylor, Erika A; Othon, Christina M

    2015-04-16

    Sucralose is a commonly employed artificial sweetener that behaves very differently than its natural disaccharide counterpart, sucrose, in terms of its interaction with biomolecules. The presence of sucralose in solution is found to destabilize the native structure of two model protein systems: the globular protein bovine serum albumin and an enzyme staphylococcal nuclease. The melting temperature of these proteins decreases as a linear function of sucralose concentration. We correlate this destabilization to the increased polarity of the molecule. The strongly polar nature is manifested as a large dielectric friction exerted on the excited-state rotational diffusion of tryptophan using time-resolved fluorescence anisotropy. Tryptophan exhibits rotational diffusion proportional to the measured bulk viscosity for sucrose solutions over a wide range of concentrations, consistent with a Stokes-Einstein model. For sucralose solutions, however, the diffusion is dependent on the concentration, strongly diverging from the viscosity predictions, and results in heterogeneous rotational diffusion.

  4. Energetically Unfavorable Amide Conformations for N6-Acetyllysine Side Chains in Refined Protein Structures

    PubMed Central

    Genshaft, Alexander; Moser, Joe-Ann S.; D'Antonio, Edward L.; Bowman, Christine M.; Christianson, David W.

    2013-01-01

    The reversible acetylation of lysine to form N6-acetyllysine in the regulation of protein function is a hallmark of epigenetics. Acetylation of the positively charged amino group of the lysine side chain generates a neutral N-alkylacetamide moiety that serves as a molecular “switch” for the modulation of protein function and protein-protein interactions. We now report the analysis of 381 N6-acetyllysine side chain amide conformations as found in 79 protein crystal structures and 11 protein NMR structures deposited in the Protein Data Bank (PDB) of the Research Collaboratory for Structural Bioinformatics. We find that only 74.3% of N6-acetyllysine residues in protein crystal structures and 46.5% in protein NMR structures contain amide groups with energetically preferred trans or generously trans conformations. Surprisingly, 17.6% of N6-acetyllysine residues in protein crystal structures and 5.3% in protein NMR structures contain amide groups with energetically unfavorable cis or generously cis conformations. Even more surprisingly, 8.1% of N6-acetyllysine residues in protein crystal structures and 48.2% in NMR structures contain amide groups with energetically prohibitive twisted conformations that approach the transition state structure for cis-trans isomerization. In contrast, 109 unique N-alkylacetamide groups contained in 84 highly-accurate small molecule crystal structures retrieved from the Cambridge Structural Database exclusively adopt energetically preferred trans conformations. Therefore, we conclude that cis and twisted N6-acetyllysine amides in protein structures deposited in the PDB are erroneously modeled due to their energetically unfavorable or prohibitive conformations. PMID:23401043

  5. Identifying DNA-binding proteins using structural motifs and the electrostatic potential

    PubMed Central

    Shanahan, Hugh P.; Garcia, Mario A.; Jones, Susan; Thornton, Janet M.

    2004-01-01

    Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix–turn-helix (HTH), helix–hairpin–helix (HhH) and helix–loop–helix (HLH). We find that the combination of these variables detect 78% of proteins with an HTH motif, which is a substantial improvement over previous work based purely on structural templates and is comparable to more complex methods of identifying DNA-binding proteins. Similar true positive fractions are achieved for the HhH and HLH motifs. We see evidence of wide evolutionary diversity for DNA-binding proteins with an HTH motif, and much smaller diversity for those with an HhH or HLH motif. PMID:15356290

  6. RaptorX server: a resource for template-based protein structure modeling.

    PubMed

    Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo

    2014-01-01

    Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.

  7. Structure and function of seed storage proteins in faba bean (Vicia faba L.).

    PubMed

    Liu, Yujiao; Wu, Xuexia; Hou, Wanwei; Li, Ping; Sha, Weichao; Tian, Yingying

    2017-05-01

    The protein subunit is the most important basic unit of protein, and its study can unravel the structure and function of seed storage proteins in faba bean. In this study, we identified six specific protein subunits in Faba bean (cv. Qinghai 13) combining liquid chromatography (LC), liquid chromatography-electronic spray ionization mass (LC-ESI-MS/MS) and bio-information technology. The results suggested a diversity of seed storage proteins in faba bean, and a total of 16 proteins (four GroEL molecular chaperones and 12 plant-specific proteins) were identified from 97-, 96-, 64-, 47-, 42-, and 38-kD-specific protein subunits in faba bean based on the peptide sequence. We also analyzed the composition and abundance of the amino acids, the physicochemical characteristics, secondary structure, three-dimensional structure, transmembrane domain, and possible subcellular localization of these identified proteins in faba bean seed, and finally predicted function and structure. The three-dimensional structures were generated based on homologous modeling, and the protein function was analyzed based on the annotation from the non-redundant protein database (NR database, NCBI) and function analysis of optimal modeling. The objective of this study was to identify the seed storage proteins in faba bean and confirm the structure and function of these proteins. Our results can be useful for the study of protein nutrition and achieve breeding goals for optimal protein quality in faba bean.

  8. Structural organization of G-protein-coupled receptors

    NASA Astrophysics Data System (ADS)

    Lomize, Andrei L.; Pogozheva, Irina D.; Mosberg, Henry I.

    1999-07-01

    Atomic-resolution structures of the transmembrane 7-α-helical domains of 26 G-protein-coupled receptors (GPCRs) (including opsins, cationic amine, melatonin, purine, chemokine, opioid, and glycoprotein hormone receptors and two related proteins, retinochrome and Duffy erythrocyte antigen) were calculated by distance geometry using interhelical hydrogen bonds formed by various proteins from the family and collectively applied as distance constraints, as described previously [Pogozheva et al., Biophys. J., 70 (1997) 1963]. The main structural features of the calculated GPCR models are described and illustrated by examples. Some of the features reflect physical interactions that are responsible for the structural stability of the transmembrane α-bundle: the formation of extensive networks of interhelical H-bonds and sulfur-aromatic clusters that are spatially organized as 'polarity gradients' the close packing of side-chains throughout the transmembrane domain; and the formation of interhelical disulfide bonds in some receptors and a plausible Zn2+ binding center in retinochrome. Other features of the models are related to biological function and evolution of GPCRs: the formation of a common 'minicore' of 43 evolutionarily conserved residues; a multitude of correlated replacements throughout the transmembrane domain; an Na+-binding site in some receptors, and excellent complementarity of receptor binding pockets to many structurally dissimilar, conformationally constrained ligands, such as retinal, cyclic opioid peptides, and cationic amine ligands. The calculated models are in good agreement with numerous experimental data.

  9. Reflections on protein splicing: structures, functions and mechanisms

    PubMed Central

    Anraku, Yasuhiro; Satow, Yoshinori

    2009-01-01

    Twenty years ago, evidence that one gene produces two enzymes via protein splicing emerged from structural and expression studies of the VMA1 gene in Saccharomyces cerevisiae. VMA1 consists of a single open reading frame and contains two independent genetic information for Vma1p (a catalytic 70-kDa subunit of the vacuolar H+-ATPase) and VDE (a 50-kDa DNA endonuclease) as an in-frame spliced insert in the gene. Protein splicing is a posttranslational cellular process, in which an intervening polypeptide termed as the VMA1 intein is self-catalytically excised out from a nascent 120-kDa VMA1 precursor and two flanking polypeptides of the N- and C-exteins are ligated to produce the mature Vma1p. Subsequent studies have demonstrated that protein splicing is not unique to the VMA1 precursor and there are many operons in nature, which implement genetic information editing at protein level. To elucidate its structure-directed chemical mechanisms, a series of biochemical and crystal structural studies has been carried out with the use of various VMA1 recombinants. This article summarizes a VDE-mediated self-catalytic mechanism for protein splicing that is triggered and terminated solely via thiazolidine intermediates with tetrahedral configurations formed within the splicing sites where proton ingress and egress are driven by balanced protonation and deprotonation. PMID:19907126

  10. Detection of functionally important regions in "hypothetical proteins" of known structure.

    PubMed

    Nimrod, Guy; Schushan, Maya; Steinberg, David M; Ben-Tal, Nir

    2008-12-10

    Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.

  11. Supramolecular Structures with Blood Plasma Proteins, Sugars and Nanosilica

    NASA Astrophysics Data System (ADS)

    Turov, V. V.; Gun'ko, V. M.; Galagan, N. P.; Rugal, A. A.; Barvinchenko, V. M.; Gorbyk, P. P.

    Supramolecular structures with blood plasma proteins (albumin, immunoglobulin and fibrinogen (HPF)), protein/water/silica and protein/water/ silica/sugar (glucose, fructose and saccharose) were studied by NMR, adsorption, IR and UV spectroscopy methods. Hydration parameters, amounts of weakly and strongly bound waters and interfacial energy (γ S) were determined over a wide range of component concentrations. The γ S(C protein,C silica) graphs were used to estimate the energy of protein-protein, protein-surface and particle-particle interactions. It was shown that interfacial energy of self-association (γ as) of protein molecules depends on a type of proteins. A large fraction of water bound to proteins can be displaced by sugars, and the effect of disaccharide (saccharose) was greater than that of monosugars. Changes in the structural parameters of cavities in HPF molecules and complexes with HPF/silica nanoparticles filled by bound water were analysed using NMR-cryoporometry showing that interaction of proteins with silica leads to a significant decrease in the amounts of water bound to both protein and silica surfaces. Bionanocomposites with BSA/nanosilica/sugar can be used to influence states of living cells and tissues after cryopreservation or other treatments. It was shown that interaction of proteins with silica leads to strong decrease in the volume of all types of internal cavities filled by water.

  12. T-RMSD: a web server for automated fine-grained protein structural classification.

    PubMed

    Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric

    2013-07-01

    This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.

  13. T-RMSD: a web server for automated fine-grained protein structural classification

    PubMed Central

    Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric

    2013-01-01

    This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd. PMID:23716642

  14. Fundamental Characteristics of AAA+ Protein Family Structure and Function

    PubMed Central

    2016-01-01

    Many complex cellular events depend on multiprotein complexes known as molecular machines to efficiently couple the energy derived from adenosine triphosphate hydrolysis to the generation of mechanical force. Members of the AAA+ ATPase superfamily (ATPases Associated with various cellular Activities) are critical components of many molecular machines. AAA+ proteins are defined by conserved modules that precisely position the active site elements of two adjacent subunits to catalyze ATP hydrolysis. In many cases, AAA+ proteins form a ring structure that translocates a polymeric substrate through the central channel using specialized loops that project into the central channel. We discuss the major features of AAA+ protein structure and function with an emphasis on pivotal aspects elucidated with archaeal proteins. PMID:27703410

  15. Fundamental Characteristics of AAA+ Protein Family Structure and Function.

    PubMed

    Miller, Justin M; Enemark, Eric J

    2016-01-01

    Many complex cellular events depend on multiprotein complexes known as molecular machines to efficiently couple the energy derived from adenosine triphosphate hydrolysis to the generation of mechanical force. Members of the AAA+ ATPase superfamily (ATPases Associated with various cellular Activities) are critical components of many molecular machines. AAA+ proteins are defined by conserved modules that precisely position the active site elements of two adjacent subunits to catalyze ATP hydrolysis. In many cases, AAA+ proteins form a ring structure that translocates a polymeric substrate through the central channel using specialized loops that project into the central channel. We discuss the major features of AAA+ protein structure and function with an emphasis on pivotal aspects elucidated with archaeal proteins.

  16. HBNG: Graph theory based visualization of hydrogen bond networks in protein structures.

    PubMed

    Tiwari, Abhishek; Tiwari, Vivek

    2007-07-09

    HBNG is a graph theory based tool for visualization of hydrogen bond network in 2D. Digraphs generated by HBNG facilitate visualization of cooperativity and anticooperativity chains and rings in protein structures. HBNG takes hydrogen bonds list files (output from HBAT, HBEXPLORE, HBPLUS and STRIDE) as input and generates a DOT language script and constructs digraphs using freeware AT and T Graphviz tool. HBNG is useful in the enumeration of favorable topologies of hydrogen bond networks in protein structures and determining the effect of cooperativity and anticooperativity on protein stability and folding. HBNG can be applied to protein structure comparison and in the identification of secondary structural regions in protein structures. Program is available from the authors for non-commercial purposes.

  17. Use of 13Cα Chemical-Shifts in Protein Structure Determination

    PubMed Central

    Vila, Jorge A.; Ripoll, Daniel R.; Scheraga, Harold A.

    2008-01-01

    A physics-based method, aimed at determining protein structures by using NOE-derived distances together with observed and computed 13C chemical shifts, is proposed. The approach makes use of 13Cα chemical shifts, computed at the density functional level of theory, to obtain torsional constraints for all backbone and side-chain torsional angles without making a priori use of the occupancy of any region of the Ramachandran map by the amino acid residues. The torsional constraints are not fixed but are changed dynamically in each step of the procedure, following an iterative self-consistent approach intended to identify a set of conformations for which the computed 13Cα chemical shifts match the experimental ones. A test is carried out on a 76-amino acid all-α-helical protein, namely the B. Subtilis acyl carrier protein. It is shown that, starting from randomly generated conformations, the final protein models are more accurate than an existing NMR-derived structure model of this protein, in terms of both the agreement between predicted and observed 13Cα chemical shifts and some stereochemical quality indicators, and of similar accuracy as one of the protein models solved at a high level of resolution. The results provide evidence that this methodology can be used not only for structure determination but also for additional protein structure refinement of NMR-derived models deposited in the Protein Data Bank. PMID:17516673

  18. Treatment of Second-Order Structures of Proteins Using Oxygen Radio Frequency Plasma

    NASA Astrophysics Data System (ADS)

    Hayashi, Nobuya; Nakahigashi, Akari; Liu, Hao; Goto, Masaaki

    2010-08-01

    Decomposition characteristics of second-order structures of proteins are determined using an oxygen radio frequency (RF) plasma sterilizer in order to prevent infectious proteins from contaminating medical equipment in hospitals. The removal of casein protein as a test protein with a concentration of 50 mg/cm2 on the plane substrate requires approximately 8 h when singlet atomic oxygen is irradiated. The peak intensity of Fourier transform infrared spectroscopy (FTIR) spectra of the β-sheet structures decreases at approximately the same rate as those of the α-helix and first-order structures of proteins. Active oxygen has a sufficient oxidation energy to dissociate hydrogen bonds within the β-sheet structure.

  19. Advances in structural and functional analysis of membrane proteins by electron crystallography

    PubMed Central

    Wisedchaisri, Goragot; Reichow, Steve L.; Gonen, Tamir

    2011-01-01

    Summary Electron crystallography is a powerful technique for the study of membrane protein structure and function in the lipid environment. When well-ordered two-dimensional crystals are obtained the structure of both protein and lipid can be determined and lipid-protein interactions analyzed. Protons and ionic charges can be visualized by electron crystallography and the protein of interest can be captured for structural analysis in a variety of physiologically distinct states. This review highlights the strengths of electron crystallography and the momentum that is building up in automation and the development of high throughput tools and methods for structural and functional analysis of membrane proteins by electron crystallography. PMID:22000511

  20. Structure of human Niemann–Pick C1 protein

    PubMed Central

    Li, Xiaochun; Wang, Jiawei; Coutavas, Elias; Shi, Hang; Hao, Qi; Blobel, Günter

    2016-01-01

    Niemann–Pick C1 protein (NPC1) is a late-endosomal membrane protein involved in trafficking of LDL-derived cholesterol, Niemann–Pick disease type C, and Ebola virus infection. NPC1 contains 13 transmembrane segments (TMs), five of which are thought to represent a “sterol-sensing domain” (SSD). Although present also in other key regulatory proteins of cholesterol biosynthesis, uptake, and signaling, the structure and mechanism of action of the SSD are unknown. Here we report a crystal structure of a large fragment of human NPC1 at 3.6 Å resolution, which reveals internal twofold pseudosymmetry along TM 2–13 and two structurally homologous domains that protrude 60 Å into the endosomal lumen. Strikingly, NPC1's SSD forms a cavity that is accessible from both the luminal bilayer leaflet and the endosomal lumen; computational modeling suggests that this cavity is large enough to accommodate one cholesterol molecule. We propose a model for NPC1 function in cholesterol sensing and transport. PMID:27307437

  1. Local Crystalline Structure in an Amorphous Protein Dense Phase

    PubMed Central

    Greene, Daniel G.; Modla, Shannon; Wagner, Norman J.; Sandler, Stanley I.; Lenhoff, Abraham M.

    2015-01-01

    Proteins exhibit a variety of dense phases ranging from gels, aggregates, and precipitates to crystalline phases and dense liquids. Although the structure of the crystalline phase is known in atomistic detail, little attention has been paid to noncrystalline protein dense phases, and in many cases the structures of these phases are assumed to be fully amorphous. In this work, we used small-angle neutron scattering, electron microscopy, and electron tomography to measure the structure of ovalbumin precipitate particles salted out with ammonium sulfate. We found that the ovalbumin phase-separates into core-shell particles with a core radius of ∼2 μm and shell thickness of ∼0.5 μm. Within this shell region, nanostructures comprised of crystallites of ovalbumin self-assemble into a well-defined bicontinuous network with branches ∼12 nm thick. These results demonstrate that the protein gel is comprised in part of nanocrystalline protein. PMID:26488663

  2. A probabilistic model for detecting rigid domains in protein structures.

    PubMed

    Nguyen, Thach; Habeck, Michael

    2016-09-01

    Large-scale conformational changes in proteins are implicated in many important biological functions. These structural transitions can often be rationalized in terms of relative movements of rigid domains. There is a need for objective and automated methods that identify rigid domains in sets of protein structures showing alternative conformational states. We present a probabilistic model for detecting rigid-body movements in protein structures. Our model aims to approximate alternative conformational states by a few structural parts that are rigidly transformed under the action of a rotation and a translation. By using Bayesian inference and Markov chain Monte Carlo sampling, we estimate all parameters of the model, including a segmentation of the protein into rigid domains, the structures of the domains themselves, and the rigid transformations that generate the observed structures. We find that our Gibbs sampling algorithm can also estimate the optimal number of rigid domains with high efficiency and accuracy. We assess the power of our method on several thousand entries of the DynDom database and discuss applications to various complex biomolecular systems. The Python source code for protein ensemble analysis is available at: https://github.com/thachnguyen/motion_detection : mhabeck@gwdg.de. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been

  4. Water polygons in high-resolution protein crystal structures

    PubMed Central

    Lee, Jonas; Kim, Sung-Hou

    2009-01-01

    We have analyzed the interstitial water (ISW) structures in 1500 protein crystal structures deposited in the Protein Data Bank that have greater than 1.5 Å resolution with less than 90% sequence similarity with each other. We observed varieties of polygonal water structures composed of three to eight water molecules. These polygons may represent the time- and space-averaged structures of “stable” water oligomers present in liquid water, and their presence as well as relative population may be relevant in understanding physical properties of liquid water at a given temperature. On an average, 13% of ISWs are localized enough to be visible by X-ray diffraction. Of those, averages of 78% are water molecules in the first water layer on the protein surface. Of the localized ISWs beyond the first layer, almost half of them form water polygons such as trigons, tetragons, as well as expected pentagons, hexagons, higher polygons, partial dodecahedrons, and disordered networks. Most of the octagons and nanogons are formed by fusion of smaller polygons. The trigons are most commonly observed. We suggest that our observation provides an experimental basis for including these water polygon structures in correlating and predicting various water properties in liquid state. PMID:19551896

  5. Structural basis of a rationally rewired protein-protein interface critical to bacterial signaling

    PubMed Central

    Podgornaia, Anna I.; Casino, Patricia; Marina, Alberto; Laub, Michael T.

    2013-01-01

    Summary Two-component signal transduction systems typically involve a sensor histidine kinase that specifically phosphorylates a single, cognate response regulator. This protein-protein interaction relies on molecular recognition via a small set of residues in each protein. To better understand how these residues determine the specificity of kinase-substrate interactions, we rationally rewired the interaction interface of a Thermotoga maritima two-component system, HK853-RR468, to match that found in a different two-component system, E. coli PhoR-PhoB. The rewired proteins interacted robustly with each other, but no longer interacted with the parent proteins. Analysis of the crystal structures of the wild-type and mutant protein complexes, along with a systematic mutagenesis study, reveals how individual mutations contribute to the rewiring of interaction specificity. Our approach and conclusions have implications for studies of other protein-protein interactions, protein evolution, and the design of novel protein interfaces. PMID:23954504

  6. Perspective: Structural fluctuation of protein and Anfinsen's thermodynamic hypothesis

    NASA Astrophysics Data System (ADS)

    Hirata, Fumio; Sugita, Masatake; Yoshida, Masasuke; Akasaka, Kazuyuki

    2018-01-01

    The thermodynamics hypothesis, casually referred to as "Anfinsen's dogma," is described theoretically in terms of a concept of the structural fluctuation of protein or the first moment (average structure) and the second moment (variance and covariance) of the structural distribution. The new theoretical concept views the unfolding and refolding processes of protein as a shift of the structural distribution induced by a thermodynamic perturbation, with the variance-covariance matrix varying. Based on the theoretical concept, a method to characterize the mechanism of folding (or unfolding) is proposed. The transition state, if any, between two stable states is interpreted as a gap in the distribution, which is created due to an extensive reorganization of hydrogen bonds among back-bone atoms of protein and with water molecules in the course of conformational change. Further perspective to applying the theory to the computer-aided drug design, and to the material science, is briefly discussed.

  7. Modeling Protein Excited-state Structures from "Over-length" Chemical Cross-links.

    PubMed

    Ding, Yue-He; Gong, Zhou; Dong, Xu; Liu, Kan; Liu, Zhu; Liu, Chao; He, Si-Min; Dong, Meng-Qiu; Tang, Chun

    2017-01-27

    Chemical cross-linking coupled with mass spectroscopy (CXMS) provides proximity information for the cross-linked residues and is used increasingly for modeling protein structures. However, experimentally identified cross-links are sometimes incompatible with the known structure of a protein, as the distance calculated between the cross-linked residues far exceeds the maximum length of the cross-linker. The discrepancies may persist even after eliminating potentially false cross-links and excluding intermolecular ones. Thus the "over-length" cross-links may arise from alternative excited-state conformation of the protein. Here we present a method and associated software DynaXL for visualizing the ensemble structures of multidomain proteins based on intramolecular cross-links identified by mass spectrometry with high confidence. Representing the cross-linkers and cross-linking reactions explicitly, we show that the protein excited-state structure can be modeled with as few as two over-length cross-links. We demonstrate the generality of our method with three systems: calmodulin, enzyme I, and glutamine-binding protein, and we show that these proteins alternate between different conformations for interacting with other proteins and ligands. Taken together, the over-length chemical cross-links contain valuable information about protein dynamics, and our findings here illustrate the relationship between dynamic domain movement and protein function. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  8. Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.

    PubMed

    Pascual-García, Alberto; Abia, David; Ortiz, Angel R; Bastolla, Ugo

    2009-03-01

    Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we

  9. Protein folding, protein structure and the origin of life: Theoretical methods and solutions of dynamical problems

    NASA Technical Reports Server (NTRS)

    Weaver, D. L.

    1982-01-01

    Theoretical methods and solutions of the dynamics of protein folding, protein aggregation, protein structure, and the origin of life are discussed. The elements of a dynamic model representing the initial stages of protein folding are presented. The calculation and experimental determination of the model parameters are discussed. The use of computer simulation for modeling protein folding is considered.

  10. Building a Better Fragment Library for De Novo Protein Structure Prediction

    PubMed Central

    de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

    2015-01-01

    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595

  11. FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.

    PubMed

    Budowski-Tal, Inbal; Nov, Yuval; Kolodny, Rachel

    2010-02-23

    Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.

  12. Acyl carrier protein structural classification and normal mode analysis

    PubMed Central

    Cantu, David C; Forrester, Michael J; Charov, Katherine; Reilly, Peter J

    2012-01-01

    All acyl carrier protein primary and tertiary structures were gathered into the ThYme database. They are classified into 16 families by amino acid sequence similarity, with members of the different families having sequences with statistically highly significant differences. These classifications are supported by tertiary structure superposition analysis. Tertiary structures from a number of families are very similar, suggesting that these families may come from a single distant ancestor. Normal vibrational mode analysis was conducted on experimentally determined freestanding structures, showing greater fluctuations at chain termini and loops than in most helices. Their modes overlap more so within families than between different families. The tertiary structures of three acyl carrier protein families that lacked any known structures were predicted as well. PMID:22374859

  13. Cell-free protein synthesis for structure determination by X-ray crystallography.

    PubMed

    Watanabe, Miki; Miyazono, Ken-ichi; Tanokura, Masaru; Sawasaki, Tatsuya; Endo, Yaeta; Kobayashi, Ichizo

    2010-01-01

    Structure determination has been difficult for those proteins that are toxic to the cells and cannot be prepared in a large amount in vivo. These proteins, even when biologically very interesting, tend to be left uncharacterized in the structural genomics projects. Their cell-free synthesis can bypass the toxicity problem. Among the various cell-free systems, the wheat-germ-based system is of special interest due to the following points: (1) Because the gene is placed under a plant translational signal, its toxic expression in a bacterial host is reduced. (2) It has only little codon preference and, especially, little discrimination between methionine and selenomethionine (SeMet), which allows easy preparation of selenomethionylated proteins for crystal structure determination by SAD and MAD methods. (3) Translation is uncoupled from transcription, so that the toxicity of the translation product on DNA and its transcription, if any, can be bypassed. We have shown that the wheat-germ-based cell-free protein synthesis is useful for X-ray crystallography of one of the 4-bp cutter restriction enzymes, which are expected to be very toxic to all forms of cells retaining the genome. Our report on its structure represents the first report of structure determination by X-ray crystallography using protein overexpressed with the wheat-germ-based cell-free protein expression system. This will be a method of choice for cytotoxic proteins when its cost is not a problem. Its use will become popular when the crystal structure determination technology has evolved to require only a tiny amount of protein.

  14. How the Sequence of a Gene Specifies Structural Symmetry in Proteins

    PubMed Central

    Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin

    2015-01-01

    Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668

  15. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  16. Illuminating structural proteins in viral "dark matter" with metaproteomics.

    PubMed

    Brum, Jennifer R; Ignacio-Espinoza, J Cesar; Kim, Eun-Hae; Trubl, Gareth; Jones, Robert M; Roux, Simon; VerBerkmoes, Nathan C; Rich, Virginia I; Sullivan, Matthew B

    2016-03-01

    Viruses are ecologically important, yet environmental virology is limited by dominance of unannotated genomic sequences representing taxonomic and functional "viral dark matter." Although recent analytical advances are rapidly improving taxonomic annotations, identifying functional dark matter remains problematic. Here, we apply paired metaproteomics and dsDNA-targeted metagenomics to identify 1,875 virion-associated proteins from the ocean. Over one-half of these proteins were newly functionally annotated and represent abundant and widespread viral metagenome-derived protein clusters (PCs). One primarily unannotated PC dominated the dataset, but structural modeling and genomic context identified this PC as a previously unidentified capsid protein from multiple uncultivated tailed virus families. Furthermore, four of the five most abundant PCs in the metaproteome represent capsid proteins containing the HK97-like protein fold previously found in many viruses that infect all three domains of life. The dominance of these proteins within our dataset, as well as their global distribution throughout the world's oceans and seas, supports prior hypotheses that this HK97-like protein fold is the most abundant biological structure on Earth. Together, these culture-independent analyses improve virion-associated protein annotations, facilitate the investigation of proteins within natural viral communities, and offer a high-throughput means of illuminating functional viral dark matter.

  17. Functional diversification of structurally alike NLR proteins in plants.

    PubMed

    Chakraborty, Joydeep; Jain, Akansha; Mukherjee, Dibya; Ghosh, Suchismita; Das, Sampa

    2018-04-01

    In due course of evolution many pathogens alter their effector molecules to modulate the host plants' metabolism and immune responses triggered upon proper recognition by the intracellular nucleotide-binding oligomerization domain containing leucine-rich repeat (NLR) proteins. Likewise, host plants have also evolved with diversified NLR proteins as a survival strategy to win the battle against pathogen invasion. NLR protein indeed detects pathogen derived effector proteins leading to the activation of defense responses associated with programmed cell death (PCD). In this interactive process, genome structure and plasticity play pivotal role in the development of innate immunity. Despite being quite conserved with similar biological functions in all eukaryotes, the intracellular NLR immune receptor proteins happen to be structurally distinct. Recent studies have made progress in identifying transcriptional regulatory complexes activated by NLR proteins. In this review, we attempt to decipher the intracellular NLR proteins mediated surveillance across the evolutionarily diverse taxa, highlighting some of the recent updates on NLR protein compartmentalization, molecular interactions before and after activation along with insights into the finer role of these receptor proteins to combat invading pathogens upon their recognition. Latest information on NLR sensors, helpers and NLR proteins with integrated domains in the context of plant pathogen interactions are also discussed. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. Illuminating structural proteins in viral "dark matter" with metaproteomics

    DOE PAGES

    Brum, Jennifer R.; Ignacio-Espinoza, J. Cesar; Kim, Eun -Hae; ...

    2016-02-16

    Viruses are ecologically important, yet environmental virology is limited by dominance of unannotated genomic sequences representing taxonomic and functional "viral dark matter." Although recent analytical advances are rapidly improving taxonomic annotations, identifying functional darkmatter remains problematic. Here, we apply paired metaproteomics and dsDNA-targeted metagenomics to identify 1,875 virion-associated proteins from the ocean. Over one-half of these proteins were newly functionally annotated and represent abundant and widespread viral metagenome-derived protein clusters (PCs). One primarily unannotated PC dominated the dataset, but structural modeling and genomic context identified this PC as a previously unidentified capsid protein from multiple uncultivated tailed virus families. Furthermore,more » four of the five most abundant PCs in the metaproteome represent capsid proteins containing the HK97-like protein fold previously found in many viruses that infect all three domains of life. The dominance of these proteins within our dataset, as well as their global distribution throughout the world's oceans and seas, supports prior hypotheses that this HK97-like protein fold is the most abundant biological structure on Earth. Altogether, these culture-independent analyses improve virion-associated protein annotations, facilitate the investigation of proteins within natural viral communities, and offer a high-throughput means of illuminating functional viral dark matter.« less

  19. JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure

    PubMed Central

    Neshich, Goran; Rocchia, Walter; Mancini, Adauto L.; Yamagishi, Michel E. B.; Kuser, Paula R.; Fileto, Renato; Baudet, Christian; Pinto, Ivan P.; Montagner, Arnaldo J.; Palandrani, Juliana F.; Krauchenco, Joao N.; Torres, Renato C.; Souza, Savio; Togawa, Roberto C.; Higa, Roberto H.

    2004-01-01

    JavaProtein Dossier (JPD) is a new concept, database and visualization tool providing one of the largest collections of the physicochemical parameters describing proteins' structure, stability, function and interaction with other macromolecules. By collecting as many descriptors/parameters as possible within a single database, we can achieve a better use of the available data and information. Furthermore, data grouping allows us to generate different parameters with the potential to provide new insights into the sequence–structure–function relationship. In JPD, residue selection can be performed according to multiple criteria. JPD can simultaneously display and analyze all the physicochemical parameters of any pair of structures, using precalculated structural alignments, allowing direct parameter comparison at corresponding amino acid positions among homologous structures. In order to focus on the physicochemical (and consequently pharmacological) profile of proteins, visualization tools (showing the structure and structural parameters) also had to be optimized. Our response to this challenge was the use of Java technology with its exceptional level of interactivity. JPD is freely accessible (within the Gold Sting Suite) at http://sms.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS, http://trantor.bioc.columbia.edu/SMS and http://www.es.embnet.org/SMS/ (Option: JavaProtein Dossier). PMID:15215458

  20. From protein structure to function via single crystal optical spectroscopy

    PubMed Central

    Ronda, Luca; Bruno, Stefano; Bettati, Stefano; Storici, Paola; Mozzarelli, Andrea

    2015-01-01

    The more than 100,000 protein structures determined by X-ray crystallography provide a wealth of information for the characterization of biological processes at the molecular level. However, several crystallographic “artifacts,” including conformational selection, crystallization conditions and radiation damages, may affect the quality and the interpretation of the electron density maps, thus limiting the relevance of structure determinations. Moreover, for most of these structures, no functional data have been obtained in the crystalline state, thus posing serious questions on their validity in infereing protein mechanisms. In order to solve these issues, spectroscopic methods have been applied for the determination of equilibrium and kinetic properties of proteins in the crystalline state. These methods are UV-vis spectrophotometry, spectrofluorimetry, IR, EPR, Raman, and resonance Raman spectroscopy. Some of these approaches have been implemented with on-line instruments at X-ray synchrotron beamlines. Here, we provide an overview of investigations predominantly carried out in our laboratory by single crystal polarized absorption UV-vis microspectrophotometry, the most applied technique for the functional characterization of proteins in the crystalline state. Studies on hemoglobins, pyridoxal 5′-phosphate dependent enzymes and green fluorescent protein in the crystalline state have addressed key biological issues, leading to either straightforward structure-function correlations or limitations to structure-based mechanisms. PMID:25988179

  1. PSPP: A Protein Structure Prediction Pipeline for Computing Clusters

    DTIC Science & Technology

    2009-07-01

    Evanseck JD, et al. (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B 102...dimensional (3-D) protein structures are critical for the understanding of molecular mechanisms of living systems. Traditionally, X-ray crystallography...disordered proteins are often responsible for molecular recognition, molecular assembly, protein modifica- tion, and entropic chain activities in organisms [26

  2. Structural mechanisms of chaperone mediated protein disaggregation

    PubMed Central

    Sousa, Rui

    2014-01-01

    The ClpB/Hsp104 and Hsp70 classes of molecular chaperones use ATP hydrolysis to dissociate protein aggregates and complexes, and to move proteins through membranes. ClpB/Hsp104 are members of the AAA+ family of proteins which form ring-shaped hexamers. Loops lining the pore in the ring engage substrate proteins as extended polypeptides. Interdomain rotations and conformational changes in these loops coupled to ATP hydrolysis unfold and pull proteins through the pore. This provides a mechanism that progressively disrupts local secondary and tertiary structure in substrates, allowing these chaperones to dissociate stable aggregates such as β-sheet rich prions or coiled coil SNARE complexes. While the ClpB/Hsp104 mechanism appears to embody a true power-stroke in which an ATP powered conformational change in one protein is directly coupled to movement or structural change in another, the mechanism of force generation by Hsp70s is distinct and less well understood. Both active power-stroke and purely passive mechanisms in which Hsp70 captures spontaneous fluctuations in a substrate have been proposed, while a third proposed mechanism—entropic pulling—may be able to generate forces larger than seen in ATP-driven molecular motors without the conformational coupling required for a power-stroke. The disaggregase activity of these chaperones is required for thermotolerance, but unrestrained protein complex/aggregate dissociation is potentially detrimental. Disaggregating chaperones are strongly auto-repressed, and are regulated by co-chaperones which recruit them to protein substrates and activate the disaggregases via mechanisms involving either sequential transfer of substrate from one chaperone to another and/or simultaneous interaction of substrate with multiple chaperones. By effectively subjecting substrates to multiple levels of selection by multiple chaperones, this may insure that these potent disaggregases are only activated in the appropriate context. PMID

  3. The Puf family of RNA-binding proteins in plants: phylogeny, structural modeling, activity and subcellular localization

    PubMed Central

    2010-01-01

    Background Puf proteins have important roles in controlling gene expression at the post-transcriptional level by promoting RNA decay and repressing translation. The Pumilio homology domain (PUM-HD) is a conserved region within Puf proteins that binds to RNA with sequence specificity. Although Puf proteins have been well characterized in animal and fungal systems, little is known about the structural and functional characteristics of Puf-like proteins in plants. Results The Arabidopsis and rice genomes code for 26 and 19 Puf-like proteins, respectively, each possessing eight or fewer Puf repeats in their PUM-HD. Key amino acids in the PUM-HD of several of these proteins are conserved with those of animal and fungal homologs, whereas other plant Puf proteins demonstrate extensive variability in these amino acids. Three-dimensional modeling revealed that the predicted structure of this domain in plant Puf proteins provides a suitable surface for binding RNA. Electrophoretic gel mobility shift experiments showed that the Arabidopsis AtPum2 PUM-HD binds with high affinity to BoxB of the Drosophila Nanos Response Element I (NRE1) RNA, whereas a point mutation in the core of the NRE1 resulted in a significant reduction in binding affinity. Transient expression of several of the Arabidopsis Puf proteins as fluorescent protein fusions revealed a dynamic, punctate cytoplasmic pattern of localization for most of these proteins. The presence of predicted nuclear export signals and accumulation of AtPuf proteins in the nucleus after treatment of cells with leptomycin B demonstrated that shuttling of these proteins between the cytosol and nucleus is common among these proteins. In addition to the cytoplasmically enriched AtPum proteins, two AtPum proteins showed nuclear targeting with enrichment in the nucleolus. Conclusions The Puf family of RNA-binding proteins in plants consists of a greater number of members than any other model species studied to date. This, along with the

  4. A tool for calculating binding-site residues on proteins from PDB structures.

    PubMed

    Hu, Jing; Yan, Changhui

    2009-08-03

    In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. The developed tool is very useful for the research on protein binding site analysis and prediction.

  5. A correlation between the cosmic microwave background and large-scale structure in the Universe.

    PubMed

    Boughn, Stephen; Crittenden, Robert

    2004-01-01

    Observations of distant supernovae and the fluctuations in the cosmic microwave background (CMB) indicate that the expansion of the Universe may be accelerating under the action of a 'cosmological constant' or some other form of 'dark energy'. This dark energy now appears to dominate the Universe and not only alters its expansion rate, but also affects the evolution of fluctuations in the density of matter, slowing down the gravitational collapse of material (into, for example, clusters of galaxies) in recent times. Additional fluctuations in the temperature of CMB photons are induced as they pass through large-scale structures and these fluctuations are necessarily correlated with the distribution of relatively nearby matter. Here we report the detection of correlations between recent CMB data and two probes of large-scale structure: the X-ray background and the distribution of radio galaxies. These correlations are consistent with those predicted by dark energy, indicating that we are seeing the imprint of dark energy on the growth of structure in the Universe.

  6. Structural basis of regulation and substrate specificity of protein kinase CK2 deduced from the modeling of protein-protein interactions

    PubMed Central

    Rekha, Nambudiry; Srinivasan, N

    2003-01-01

    Background Protein Kinase Casein Kinase 2 (PKCK2) is an ubiquitous Ser/Thr kinase expressed in all eukaryotes. It phosphorylates a number of proteins involved in various cellular processes. PKCK2 holoenzyme is catalytically active tetramer, composed of two homologous or identical and constitutively active catalytic (α) and two identical regulatory (β) subunits. The tetramer cannot phosphorylate some substrates that can be phosphorylated by PKCK2α in isolation. The present work explores the structural basis of this feature using computational analysis and modeling. Results We have initially built a model of PKCK2α bound to a substrate peptide with a conformation identical to that of the substrates in the available crystal structures of other kinases complexed with the substrates/ pseudosubstrates. In this model however, the fourth acidic residue in the consensus pattern of the substrate, S/T-X-X-D/E where S/T is the phosphorylation site, did not result in interaction with the active form of PKCK2α and is highly solvent exposed. Interaction of the acidic residue is observed if the substrate peptide adopts conformations as seen in β turn, α helix, or 310 helices. This type of conformation is observed and accommodated well by PKCK2α in calmodulin where the phosphorylation site is at the central helix. PP2A carries sequence patterns for PKCK2α phosphorylation. While the possibility of PP2A being phosphorylated by PKCK2 has been raised in the literature we use the model of PP2A to generate a model of PP2A-PKCK2α complex. PKCK2β undergoes phosphorylation by holoenzyme at the N-terminal region, and is accommodated very well in the limited space available at the substrate-binding site of the holoenzyme while the space is insufficient to accommodate the binding of PP2A or calmodulin in the holoenzyme. Conclusion Charge and shape complimentarity seems to play a role in substrate recognition and binding to PKCK2α, along with the consensus pattern. The detailed

  7. Proton assisted recoupling and protein structure determination

    NASA Astrophysics Data System (ADS)

    de Paëpe, Gaël; Lewandowski, Józef R.; Loquet, Antoine; Böckmann, Anja; Griffin, Robert G.

    2008-12-01

    We introduce a homonuclear version of third spin assisted recoupling, a second-order mechanism that can be used for polarization transfer between 13C or 15N spins in magic angle spinning (MAS) NMR experiments, particularly at high spinning frequencies employed in contemporary high field MAS experiments. The resulting sequence, which we refer to as proton assisted recoupling (PAR), relies on a cross-term between 1H-13C (or 1H-15N) couplings to mediate zero quantum 13C-13C (or 15N-15N recoupling). In particular, using average Hamiltonian theory we derive an effective Hamiltonian for PAR and show that the transfer is mediated by trilinear terms of the form C1+/-C2-/+HZ for 13C-13C recoupling experiments (or N1+/-N2-/+HZ for 15N-15N). We use analytical and numerical simulations to explain the structure of the PAR optimization maps and to delineate the PAR matching conditions. We also detail the PAR polarization transfer dependence with respect to the local molecular geometry and explain the observed reduction in dipolar truncation. Finally, we demonstrate the utility of PAR in structural studies of proteins with 13C-13C spectra of uniformly 13C, 15N labeled microcrystalline Crh, a 85 amino acid model protein that forms a domain swapped dimer (MW=2×10.4 kDa). The spectra, which were acquired at high MAS frequencies (ωr2π>20 kHz) and magnetic fields (750-900 MHz 1H frequencies) using moderate rf fields, exhibit numerous cross peaks corresponding to long (up to 6-7 A˚) 13C-13C distances which are particularly useful in protein structure determination. Using results from PAR spectra we calculate the structure of the Crh protein.

  8. Toward a structure determination method for biomineral-associated protein using combined solid- state NMR and computational structure prediction.

    PubMed

    Masica, David L; Ash, Jason T; Ndao, Moise; Drobny, Gary P; Gray, Jeffrey J

    2010-12-08

    Protein-biomineral interactions are paramount to materials production in biology, including the mineral phase of hard tissue. Unfortunately, the structure of biomineral-associated proteins cannot be determined by X-ray crystallography or solution nuclear magnetic resonance (NMR). Here we report a method for determining the structure of biomineral-associated proteins. The method combines solid-state NMR (ssNMR) and ssNMR-biased computational structure prediction. In addition, the algorithm is able to identify lattice geometries most compatible with ssNMR constraints, representing a quantitative, novel method for investigating crystal-face binding specificity. We use this method to determine most of the structure of human salivary statherin interacting with the mineral phase of tooth enamel. Computation and experiment converge on an ensemble of related structures and identify preferential binding at three crystal surfaces. The work represents a significant advance toward determining structure of biomineral-adsorbed protein using experimentally biased structure prediction. This method is generally applicable to proteins that can be chemically synthesized. Copyright © 2010 Elsevier Ltd. All rights reserved.

  9. Recognition of functional sites in protein structures.

    PubMed

    Shulman-Peleg, Alexandra; Nussinov, Ruth; Wolfson, Haim J

    2004-06-04

    Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.

  10. Advances in structural and functional analysis of membrane proteins by electron crystallography.

    PubMed

    Wisedchaisri, Goragot; Reichow, Steve L; Gonen, Tamir

    2011-10-12

    Electron crystallography is a powerful technique for the study of membrane protein structure and function in the lipid environment. When well-ordered two-dimensional crystals are obtained the structure of both protein and lipid can be determined and lipid-protein interactions analyzed. Protons and ionic charges can be visualized by electron crystallography and the protein of interest can be captured for structural analysis in a variety of physiologically distinct states. This review highlights the strengths of electron crystallography and the momentum that is building up in automation and the development of high throughput tools and methods for structural and functional analysis of membrane proteins by electron crystallography. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Structural classification of small, disulfide-rich protein domains.

    PubMed

    Cheek, Sara; Krishna, S Sri; Grishin, Nick V

    2006-05-26

    Disulfide-rich domains are small protein domains whose global folds are stabilized primarily by the formation of disulfide bonds and, to a much lesser extent, by secondary structure and hydrophobic interactions. Disulfide-rich domains perform a wide variety of roles functioning as growth factors, toxins, enzyme inhibitors, hormones, pheromones, allergens, etc. These domains are commonly found both as independent (single-domain) proteins and as domains within larger polypeptides. Here, we present a comprehensive structural classification of approximately 3000 small, disulfide-rich protein domains. We find that these domains can be arranged into 41 fold groups on the basis of structural similarity. Our fold groups, which describe broader structural relationships than existing groupings of these domains, bring together representatives with previously unacknowledged similarities; 18 of the 41 fold groups include domains from several SCOP folds. Within the fold groups, the domains are assembled into families of homologs. We define 98 families of disulfide-rich domains, some of which include newly detected homologs, particularly among knottin-like domains. On the basis of this classification, we have examined cases of convergent and divergent evolution of functions performed by disulfide-rich proteins. Disulfide bonding patterns in these domains are also evaluated. Reducible disulfide bonding patterns are much less frequent, while symmetric disulfide bonding patterns are more common than expected from random considerations. Examples of variations in disulfide bonding patterns found within families and fold groups are discussed.

  12. Integrating Structure to Protein-Protein Interaction Networks That Drive Metastasis to Brain and Lung in Breast Cancer

    PubMed Central

    Engin, H. Billur; Guney, Emre; Keskin, Ozlem; Oliva, Baldo; Gursoy, Attila

    2013-01-01

    Blocking specific protein interactions can lead to human diseases. Accordingly, protein interactions and the structural knowledge on interacting surfaces of proteins (interfaces) have an important role in predicting the genotype-phenotype relationship. We have built the phenotype specific sub-networks of protein-protein interactions (PPIs) involving the relevant genes responsible for lung and brain metastasis from primary tumor in breast cancer. First, we selected the PPIs most relevant to metastasis causing genes (seed genes), by using the “guilt-by-association” principle. Then, we modeled structures of the interactions whose complex forms are not available in Protein Databank (PDB). Finally, we mapped mutations to interface structures (real and modeled), in order to spot the interactions that might be manipulated by these mutations. Functional analyses performed on these sub-networks revealed the potential relationship between immune system-infectious diseases and lung metastasis progression, but this connection was not observed significantly in the brain metastasis. Besides, structural analyses showed that some PPI interfaces in both metastasis sub-networks are originating from microbial proteins, which in turn were mostly related with cell adhesion. Cell adhesion is a key mechanism in metastasis, therefore these PPIs may be involved in similar molecular pathways that are shared by infectious disease and metastasis. Finally, by mapping the mutations and amino acid variations on the interface regions of the proteins in the metastasis sub-networks we found evidence for some mutations to be involved in the mechanisms differentiating the type of the metastasis. PMID:24278371

  13. DNAproDB: an interactive tool for structural analysis of DNA–protein complexes

    PubMed Central

    Sagendorf, Jared M.

    2017-01-01

    Abstract Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu. PMID:28431131

  14. Structure-based drug design for G protein-coupled receptors.

    PubMed

    Congreve, Miles; Dias, João M; Marshall, Fiona H

    2014-01-01

    Our understanding of the structural biology of G protein-coupled receptors has undergone a transformation over the past 5 years. New protein-ligand complexes are described almost monthly in high profile journals. Appreciation of how small molecules and natural ligands bind to their receptors has the potential to impact enormously how medicinal chemists approach this major class of receptor targets. An outline of the key topics in this field and some recent examples of structure- and fragment-based drug design are described. A table is presented with example views of each G protein-coupled receptor for which there is a published X-ray structure, including interactions with small molecule antagonists, partial and full agonists. The possible implications of these new data for drug design are discussed. © 2014 Elsevier B.V. All rights reserved.

  15. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity.

    PubMed

    Camproux, A C; Tufféry, P

    2005-08-05

    Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.

  16. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  17. Models of protein-ligand crystal structures: trust, but verify.

    PubMed

    Deller, Marc C; Rupp, Bernhard

    2015-09-01

    X-ray crystallography provides the most accurate models of protein-ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein-ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein-ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein-ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein-ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein-ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein-ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.

  18. High-Resolution Protein Structure Determination by Serial Femtosecond Crystallography

    PubMed Central

    Boutet, Sébastien; Lomb, Lukas; Williams, Garth J.; Barends, Thomas R. M.; Aquila, Andrew; Doak, R. Bruce; Weierstall, Uwe; DePonte, Daniel P.; Steinbrener, Jan; Shoeman, Robert L.; Messerschmidt, Marc; Barty, Anton; White, Thomas A.; Kassemeyer, Stephan; Kirian, Richard A.; Seibert, M. Marvin; Montanez, Paul A.; Kenney, Chris; Herbst, Ryan; Hart, Philip; Pines, Jack; Haller, Gunther; Gruner, Sol M.; Philipp, Hugh T.; Tate, Mark W.; Hromalik, Marianne; Koerner, Lucas J.; van Bakel, Niels; Morse, John; Ghonsalves, Wilfred; Arnlund, David; Bogan, Michael J.; Caleman, Carl; Fromme, Raimund; Hampton, Christina Y.; Hunter, Mark S.; Johansson, Linda C.; Katona, Gergely; Kupitz, Christopher; Liang, Mengning; Martin, Andrew V.; Nass, Karol; Redecke, Lars; Stellato, Francesco; Timneanu, Nicusor; Wang, Dingjie; Zatsepin, Nadia A.; Schafer, Donald; Defever, James; Neutze, Richard; Fromme, Petra; Spence, John C. H.; Chapman, Henry N.; Schlichting, Ilme

    2013-01-01

    Structure determination of proteins and other macromolecules has historically required the growth of high-quality crystals sufficiently large to diffract x-rays efficiently while withstanding radiation damage. We applied serial femtosecond crystallography (SFX) using an x-ray free-electron laser (XFEL) to obtain high-resolution structural information from microcrystals (less than 1 micrometer by 1 micrometer by 3 micrometers) of the well-characterized model protein lysozyme. The agreement with synchrotron data demonstrates the immediate relevance of SFX for analyzing the structure of the large group of difficult-to-crystallize molecules. PMID:22653729

  19. Common structural features of cholesterol binding sites in crystallized soluble proteins

    PubMed Central

    Bukiya, Anna N.; Dopico, Alejandro M.

    2017-01-01

    Cholesterol-protein interactions are essential for the architectural organization of cell membranes and for lipid metabolism. While cholesterol-sensing motifs in transmembrane proteins have been identified, little is known about cholesterol recognition by soluble proteins. We reviewed the structural characteristics of binding sites for cholesterol and cholesterol sulfate from crystallographic structures available in the Protein Data Bank. This analysis unveiled key features of cholesterol-binding sites that are present in either all or the majority of sites: i) the cholesterol molecule is generally positioned between protein domains that have an organized secondary structure; ii) the cholesterol hydroxyl/sulfo group is often partnered by Asn, Gln, and/or Tyr, while the hydrophobic part of cholesterol interacts with Leu, Ile, Val, and/or Phe; iii) cholesterol hydrogen-bonding partners are often found on α-helices, while amino acids that interact with cholesterol’s hydrophobic core have a slight preference for β-strands and secondary structure-lacking protein areas; iv) the steroid’s C21 and C26 constitute the “hot spots” most often seen for steroid-protein hydrophobic interactions; v) common “cold spots” are C8–C10, C13, and C17, at which contacts with the proteins were not detected. Several common features we identified for soluble protein-steroid interaction appear evolutionarily conserved. PMID:28420706

  20. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    PubMed Central

    Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  1. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    PubMed

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  2. Solid state NMR: The essential technology for helical membrane protein structural characterization

    PubMed Central

    Cross, Timothy A.; Ekanayake, Vindana; Paulino, Joana; Wright, Anna

    2014-01-01

    NMR spectroscopy of helical membrane proteins has been very challenging on multiple fronts. The expression and purification of these proteins while maintaining functionality has consumed countless graduate student hours. Sample preparations have depended on whether solution or solid-state NMR spectroscopy was to be performed – neither have been easy. In recent years it has become increasingly apparent that membrane mimic environments influence the structural result. Indeed, in these recent years we have rediscovered that Nobel laureate, Christian Anfinsen, did not say that protein structure was exclusively dictated by the amino acid sequence, but rather by the sequence in a given environment (Anfinsen, 1973) [106]. The environment matters, molecular interactions with the membrane environment are significant and many examples of distorted, non-native membrane protein structures have recently been documented in the literature. However, solid-state NMR structures of helical membrane proteins in proteoliposomes and bilayers are proving to be native structures that permit a high resolution characterization of their functional states. Indeed, solid-state NMR is uniquely able to characterize helical membrane protein structures in lipid environments without detergents. Recent progress in expression, purification, reconstitution, sample preparation and in the solid-state NMR spectroscopy of both oriented samples and magic angle spinning samples has demonstrated that helical membrane protein structures can be achieved in a timely fashion. Indeed, this is a spectacular opportunity for the NMR community to have a major impact on biomedical research through the solid-state NMR spectroscopy of these proteins. PMID:24412099

  3. Solid state NMR: The essential technology for helical membrane protein structural characterization

    NASA Astrophysics Data System (ADS)

    Cross, Timothy A.; Ekanayake, Vindana; Paulino, Joana; Wright, Anna

    2014-02-01

    NMR spectroscopy of helical membrane proteins has been very challenging on multiple fronts. The expression and purification of these proteins while maintaining functionality has consumed countless graduate student hours. Sample preparations have depended on whether solution or solid-state NMR spectroscopy was to be performed - neither have been easy. In recent years it has become increasingly apparent that membrane mimic environments influence the structural result. Indeed, in these recent years we have rediscovered that Nobel laureate, Christian Anfinsen, did not say that protein structure was exclusively dictated by the amino acid sequence, but rather by the sequence in a given environment (Anfinsen, 1973) [106]. The environment matters, molecular interactions with the membrane environment are significant and many examples of distorted, non-native membrane protein structures have recently been documented in the literature. However, solid-state NMR structures of helical membrane proteins in proteoliposomes and bilayers are proving to be native structures that permit a high resolution characterization of their functional states. Indeed, solid-state NMR is uniquely able to characterize helical membrane protein structures in lipid environments without detergents. Recent progress in expression, purification, reconstitution, sample preparation and in the solid-state NMR spectroscopy of both oriented samples and magic angle spinning samples has demonstrated that helical membrane protein structures can be achieved in a timely fashion. Indeed, this is a spectacular opportunity for the NMR community to have a major impact on biomedical research through the solid-state NMR spectroscopy of these proteins.

  4. The value of protein structure classification information-Surveying the scientific literature

    DOE PAGES

    Fox, Naomi K.; Brenner, Steven E.; Chandonia, John -Marc

    2015-08-27

    The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from themore » resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.« less

  5. The value of protein structure classification information-Surveying the scientific literature

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fox, Naomi K.; Brenner, Steven E.; Chandonia, John -Marc

    The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from themore » resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.« less

  6. Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features

    PubMed Central

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-01-01

    Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205

  7. The utility of protein structure as a predictor of site-wise dN/dS varies widely among HIV-1 proteins.

    PubMed

    Meyer, Austin G; Wilke, Claus O

    2015-10-06

    Protein structure acts as a general constraint on the evolution of viral proteins. One widely recognized structural constraint explaining evolutionary variation among sites is the relative solvent accessibility (RSA) of residues in the folded protein. In influenza virus, the distance from functional sites has been found to explain an additional portion of the evolutionary variation in the external antigenic proteins. However, to what extent RSA and distance from a reference site in the protein can be used more generally to explain protein adaptation in other viruses and in the different proteins of any given virus remains an open question. To address this question, we have carried out an analysis of the distribution and structural predictors of site-wise dN/dS in HIV-1. Our results indicate that the distribution of dN/dS in HIV follows a smooth gamma distribution, with no special enrichment or depletion of sites with dN/dS at or above one. The variation in dN/dS can be partially explained by RSA and distance from a reference site in the protein, but these structural constraints do not act uniformly among the different HIV-1 proteins. Structural constraints are highly predictive in just one of the three enzymes and one of three structural proteins in HIV-1. For these two proteins, the protease enzyme and the gp120 structural protein, structure explains between 30 and 40% of the variation in dN/dS. Finally, for the gp120 protein of the receptor-binding complex, we also find that glycosylation sites explain just 2% of the variation in dN/dS and do not explain gp120 evolution independently of either RSA or distance from the apical surface. © 2015 The Author(s).

  8. Ab Initio Protein Structure Prediction Using Chunk-TASSER

    PubMed Central

    Zhou, Hongyi; Skolnick, Jeffrey

    2007-01-01

    We have developed an ab initio protein structure prediction method called chunk-TASSER that uses ab initio folded supersecondary structure chunks of a given target as well as threading templates for obtaining contact potentials and distance restraints. The predicted chunks, selected on the basis of a new fragment comparison method, are folded by a fragment insertion method. Full-length models are built and refined by the TASSER methodology, which searches conformational space via parallel hyperbolic Monte Carlo. We employ an optimized reduced force field that includes knowledge-based statistical potentials and restraints derived from the chunks as well as threading templates. The method is tested on a dataset of 425 hard target proteins ≤250 amino acids in length. The average TM-scores of the best of top five models per target are 0.266, 0.336, and 0.362 by the threading algorithm SP3, original TASSER and chunk-TASSER, respectively. For a subset of 80 proteins with predicted α-helix content ≥50%, these averages are 0.284, 0.356, and 0.403, respectively. The percentages of proteins with the best of top five models having TM-score ≥0.4 (a statistically significant threshold for structural similarity) are 3.76, 20.94, and 28.94% by SP3, TASSER, and chunk-TASSER, respectively, overall, while for the subset of 80 predominantly helical proteins, these percentages are 2.50, 23.75, and 41.25%. Thus, chunk-TASSER shows a significant improvement over TASSER for modeling hard targets where no good template can be identified. We also tested chunk-TASSER on 21 medium/hard targets <200 amino-acids-long from CASP7. Chunk-TASSER is ∼11% (10%) better than TASSER for the total TM-score of the first (best of top five) models. Chunk-TASSER is fully automated and can be used in proteome scale protein structure prediction. PMID:17496016

  9. CARd-3D: Carbon Distribution in 3D Structure Program for Globular Proteins

    PubMed Central

    Ekambaram, Rajasekaran; Kannaiyan, Akila; Marimuthu, Vijayasarathy; Swaminathan, Vinobha Chinnaiah; Renganathan, Senthil; Perumal, Ananda Gopu

    2014-01-01

    Spatial arrangement of carbon in protein structure is analyzed here. Particularly, the carbon fractions around individual atoms are compared. It is hoped that it follows the principle of 31.45% carbon around individual atoms. The results reveal that globular protein's atoms follow this principle. A comparative study on monomer versus dimer reveal that carbon is better distributed in dimeric form than in its monomeric form. Similar study on solid versus liquid structures reveals that the liquid (NMR) structure has better carbon distribution over the corresponding solid (X-Ray) structure. The carbon fraction distributions in fiber and toxin protein are compared. Fiber proteins follow the principle of carbon fraction distribution. At the same time it has another broad spectrum of carbon distribution than in globular proteins. The toxin protein follows an abnormal carbon fraction distribution. The carbon fraction distribution plays an important role in deciding the structure and shape of proteins. It is hoped to help in understanding the protein folding and function. PMID:24748753

  10. Critical Features of Fragment Libraries for Protein Structure Prediction

    PubMed Central

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  11. Critical Features of Fragment Libraries for Protein Structure Prediction.

    PubMed

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  12. Association of protein structure, protein and carbohydrate subfractions with bioenergy profiles and biodegradation functions in modeled forage

    NASA Astrophysics Data System (ADS)

    Ji, Cuiying; Zhang, Xuewei; Yu, Peiqiang

    2016-03-01

    The objectives of this study were to detect unique aspects and association of forage protein inherent structure, biological compounds, protein and carbohydrate subfractions, bioenergy profiles, and biodegradation features. In this study, common available alfalfa hay from two different sourced-origins (FSO vs. CSO) was used as a modeled forage for inherent structure profile, bioenergy, biodegradation and their association between their structure and bio-functions. The molecular spectral profiles were determined using non-invasive molecular spectroscopy. The parameters included: protein structure amide I group, amide II group and their ratios; protein subfractions (PA1, PA2, PB1, PB2, PC); carbohydrate fractions (CA1, CA2, CA3, CA4, CB1, CB2, CC); biodegradable and undegradable fractions of protein (RDPA2, RDPB1, RDPB2, RDP; RUPA2 RUPB1, RUPB2, RUPC, RUP); biodegradable and undegradable fractions of carbohydrate (RDCA4, RDCB1, RDCB2, RDCB3, RDCHO; RUCA4, RUCB1; RUCB2; RUCB3 RUCC, RUCHO) and bioenergy profiles (tdNDF, tdFA, tdCP, tdNFC, TDN1 ×, DE3 ×, ME3 ×, NEL3 ×; NEm, NEg). The results show differences in protein and carbohydrate (CHO) subfractions in the moderately degradable true protein fraction (PB1: 502 vs. 420 g/kg CP, P = 0.09), slowly degraded true protein fraction (PB2: 45 vs. 96 g/kg CP, P = 0.02), moderately degradable CHO fraction (CB2: 283 vs. 223 g/kg CHO, P = 0.06) and slowly degraded CHO fraction (CB3: 369 vs. 408 g/kg CHO) between the two sourced origins. As to biodegradable (RD) fractions of protein and CHO in rumen, there were differences in RD of PB1 (417 vs. 349 g/kg CP, P = 0.09), RD of PB2 (29 vs. 62 g/kg CP, P = 0.02), RD of CB2 (251 vs. 198 g/kg DM, P = 0.06), RD of CB3 (236 vs. 261 g/kg CHO, P = 0.08). As to bioenergy profile, there were differences in total digestible nutrient (TDN: 551 vs. 537 g/kg DM, P = 0.06), and metabolic bioenergy (P = 0.095). As to protein molecular structure, there were differences in protein structure 1st

  13. Elucidating Peptide and Protein Structure and Dynamics: UV Resonance Raman Spectroscopy

    PubMed Central

    Oladepo, Sulayman A.; Xiong, Kan; Hong, Zhenmin; Asher, Sanford A.

    2011-01-01

    UV resonance Raman spectroscopy (UVRR) is a powerful method that has the requisite selectivity and sensitivity to incisively monitor biomolecular structure and dynamics in solution. In this perspective, we highlight applications of UVRR for studying peptide and protein structure and the dynamics of protein and peptide folding. UVRR spectral monitors of protein secondary structure, such as the Amide III3 band and the Cα-H band frequencies and intensities can be used to determine Ramachandran Ψ angle distributions for peptide bonds. These incisive, quantitative glimpses into conformation can be combined with kinetic T-jump methodologies to monitor the dynamics of biomolecular conformational transitions. The resulting UVRR structural insight is impressive in that it allows differentiation of, for example, different α-helix-like states that enable differentiating π- and 310- states from pure α-helices. These approaches can be used to determine the Gibbs free energy landscape of individual peptide bonds along the most important protein (un)folding coordinate. Future work will find spectral monitors that probe peptide bond activation barriers that control protein (un)folding mechanisms. In addition, UVRR studies of sidechain vibrations will probe the role of side chains in determining protein secondary, tertiary and quaternary structures. PMID:21379371

  14. Structure of the WipA protein reveals a novel tyrosine protein phosphatase effector from Legionella pneumophila.

    PubMed

    Pinotsis, Nikos; Waksman, Gabriel

    2017-06-02

    Legionnaires' disease is a severe form of pneumonia caused by the bacterium Legionella pneumophila. L. pneumophila pathogenicity relies on secretion of more than 300 effector proteins by a type IVb secretion system. Among these Legionella effectors, WipA has been primarily studied because of its dependence on a chaperone complex, IcmSW, for translocation through the secretion system, but its role in pathogenicity has remained unknown. In this study, we present the crystal structure of a large fragment of WipA, WipA435. Surprisingly, this structure revealed a serine/threonine phosphatase fold that unexpectedly targets tyrosine-phosphorylated peptides. The structure also revealed a sequence insertion that folds into an α-helical hairpin, the tip of which adopts a canonical coiled-coil structure. The purified protein was a dimer whose dimer interface involves interactions between the coiled coil of one WipA molecule and the phosphatase domain of another. Given the ubiquity of protein-protein interaction mediated by interactions between coiled-coils, we hypothesize that WipA can thereby transition from a homodimeric state to a heterodimeric state in which the coiled-coil region of WipA is engaged in a protein-protein interaction with a tyrosine-phosphorylated host target. In conclusion, these findings help advance our understanding of the molecular mechanisms of an effector involved in Legionella virulence and may inform approaches to elucidate the function of other effectors. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  15. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  16. Sequence co-evolution gives 3D contacts and structures of protein complexes

    PubMed Central

    Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S

    2014-01-01

    Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213

  17. PROFESS: a PROtein Function, Evolution, Structure and Sequence database

    PubMed Central

    Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

    2010-01-01

    The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718

  18. Sequence composition and environment effects on residue fluctuations in protein structures

    NASA Astrophysics Data System (ADS)

    Ruvinsky, Anatoly M.; Vakser, Ilya A.

    2010-10-01

    Structure fluctuations in proteins affect a broad range of cell phenomena, including stability of proteins and their fragments, allosteric transitions, and energy transfer. This study presents a statistical-thermodynamic analysis of relationship between the sequence composition and the distribution of residue fluctuations in protein-protein complexes. A one-node-per-residue elastic network model accounting for the nonhomogeneous protein mass distribution and the interatomic interactions through the renormalized inter-residue potential is developed. Two factors, a protein mass distribution and a residue environment, were found to determine the scale of residue fluctuations. Surface residues undergo larger fluctuations than core residues in agreement with experimental observations. Ranking residues over the normalized scale of fluctuations yields a distinct classification of amino acids into three groups: (i) highly fluctuating-Gly, Ala, Ser, Pro, and Asp, (ii) moderately fluctuating-Thr, Asn, Gln, Lys, Glu, Arg, Val, and Cys, and (iii) weakly fluctuating-Ile, Leu, Met, Phe, Tyr, Trp, and His. The structural instability in proteins possibly relates to the high content of the highly fluctuating residues and a deficiency of the weakly fluctuating residues in irregular secondary structure elements (loops), chameleon sequences, and disordered proteins. Strong correlation between residue fluctuations and the sequence composition of protein loops supports this hypothesis. Comparing fluctuations of binding site residues (interface residues) with other surface residues shows that, on average, the interface is more rigid than the rest of the protein surface and Gly, Ala, Ser, Cys, Leu, and Trp have a propensity to form more stable docking patches on the interface. The findings have broad implications for understanding mechanisms of protein association and stability of protein structures.

  19. How large B-factors can be in protein crystal structures.

    PubMed

    Carugo, Oliviero

    2018-02-23

    Protein crystal structures are potentially over-interpreted since they are routinely refined without any restraint on the upper limit of atomic B-factors. Consequently, some of their atoms, undetected in the electron density maps, are allowed to reach extremely large B-factors, even above 100 square Angstroms, and their final positions are purely speculative and not based on any experimental evidence. A strategy to define B-factors upper limits is described here, based on the analysis of protein crystal structures deposited in the Protein Data Bank prior 2008, when the tendency to allow B-factor to arbitrary inflate was limited. This B-factor upper limit (B_max) is determined by extrapolating the relationship between crystal structure average B-factor and percentage of crystal volume occupied by solvent (pcVol) to pcVol =100%, when, ab absurdo, the crystal contains only liquid solvent, the structure of which is, by definition, undetectable in electron density maps. It is thus possible to highlight structures with average B-factors larger than B_max, which should be considered with caution by the users of the information deposited in the Protein Data Bank, in order to avoid scientifically deleterious over-interpretations.

  20. Structure of the Newcastle disease virus F protein in the post-fusion conformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Swanson, Kurt; Wen, Xiaolin; Leser, George P.

    2010-11-17

    The paramyxovirus F protein is a class I viral membrane fusion protein which undergoes a significant refolding transition during virus entry. Previous studies of the Newcastle disease virus, human parainfluenza virus 3 and parainfluenza virus 5 F proteins revealed differences in the pre- and post-fusion structures. The NDV Queensland (Q) F structure lacked structural elements observed in the other two structures, which are key to the refolding and fusogenic activity of F. Here we present the NDV Australia-Victoria (AV) F protein post-fusion structure and provide EM evidence for its folding to a pre-fusion form. The NDV AV F structure containsmore » heptad repeat elements missing in the previous NDV Q F structure, forming a post-fusion six-helix bundle (6HB) similar to the post-fusion hPIV3 F structure. Electrostatic and temperature factor analysis of the F structures points to regions of these proteins that may be functionally important in their membrane fusion activity.« less

  1. Structural Integrity of Proteins under Applied Bias during Solid-State Nanopore Translocation

    NASA Astrophysics Data System (ADS)

    Hasan, Mohammad R.; Khanzada, Raja Raheel; Mahmood, Mohammed A. I.; Ashfaq, Adnan; Iqbal, Samir M.

    2015-03-01

    The translocation behavior of proteins through solid-state nanopores can be used as a new way to detect and identify proteins. The ionic current through a nanopore that flows under applied bias gets perturbed when a biomolecule traverses the Nanopore. It is important for a protein detection scheme to know of any changes in the three-dimensional structure of the molecule during the process. Here we report the data on structural integrity of protein during translocation through nanopore under different applied biases. Nanoscale Molecular Dynamic was used to establish a framework to study the changes in protein structures as these travelled across the nanopore. The analysis revealed the contributions of structural changes of protein to its ionic current signature. As a model, thrombin protein crystalline structure was imported and positioned inside a 6 nm diameter pore in a 6 nm thick silicon nitride membrane. The protein was solvated in 1 M KCl at 295 K and the system was equilibrated for 20 ns to attain its minimum energy state. The simulation was performed at different electric fields from 0 to 1 kCal/(mol.Å.e). RMSD, radial distribution function, movement of the center of mass and velocity of the protein were calculated. The results showed linear increments in the velocity and perturbations in ionic current profile with increasing electric potential. Support Acknowledged from NSF through ECCS-1201878.

  2. Structure and expression of a novel compact myelin protein – Small VCP-interacting protein (SVIP)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Jiawen; Peng, Dungeng; Voehler, Markus

    2013-10-11

    Highlights: •SVIP (small p97/VCP-interacting protein) co-localizes with myelin basic protein (MBP) in compact myelin. •We determined that SVIP is an intrinsically disordered protein (IDP). •The helical content of SVIP increases dramatically during its interaction with negatively charged lipid membrane. •This study provides structural insight into interactions between SVIP and myelin membranes. -- Abstract: SVIP (small p97/VCP-interacting protein) was initially identified as one of many cofactors regulating the valosin containing protein (VCP), an AAA+ ATPase involved in endoplasmic-reticulum-associated protein degradation (ERAD). Our previous study showed that SVIP is expressed exclusively in the nervous system. In the present study, SVIP and VCPmore » were seen to be co-localized in neuronal cell bodies. Interestingly, we also observed that SVIP co-localizes with myelin basic protein (MBP) in compact myelin, where VCP was absent. Furthermore, using nuclear magnetic resonance (NMR) and circular dichroism (CD) spectroscopic measurements, we determined that SVIP is an intrinsically disordered protein (IDP). However, upon binding to the surface of membranes containing a net negative charge, the helical content of SVIP increases dramatically. These findings provide structural insight into interactions between SVIP and myelin membranes.« less

  3. Structure Refinement of Protein Low Resolution Models Using the GNEIMO Constrained Dynamics Method

    PubMed Central

    Park, In-Hee; Gangupomu, Vamshi; Wagner, Jeffrey; Jain, Abhinandan; Vaidehi, Nagara-jan

    2012-01-01

    The challenge in protein structure prediction using homology modeling is the lack of reliable methods to refine the low resolution homology models. Unconstrained all-atom molecular dynamics (MD) does not serve well for structure refinement due to its limited conformational search. We have developed and tested the constrained MD method, based on the Generalized Newton-Euler Inverse Mass Operator (GNEIMO) algorithm for protein structure refinement. In this method, the high-frequency degrees of freedom are replaced with hard holonomic constraints and a protein is modeled as a collection of rigid body clusters connected by flexible torsional hinges. This allows larger integration time steps and enhances the conformational search space. In this work, we have demonstrated the use of a constraint free GNEIMO method for protein structure refinement that starts from low-resolution decoy sets derived from homology methods. In the eight proteins with three decoys for each, we observed an improvement of ~2 Å in the RMSD to the known experimental structures of these proteins. The GNEIMO method also showed enrichment in the population density of native-like conformations. In addition, we demonstrated structural refinement using a “Freeze and Thaw” clustering scheme with the GNEIMO framework as a viable tool for enhancing localized conformational search. We have derived a robust protocol based on the GNEIMO replica exchange method for protein structure refinement that can be readily extended to other proteins and possibly applicable for high throughput protein structure refinement. PMID:22260550

  4. Origins of coevolution between residues distant in protein 3D structures

    PubMed Central

    Ovchinnikov, Sergey; Kamisetty, Hetunandan; Baker, David

    2017-01-01

    Residue pairs that directly coevolve in protein families are generally close in protein 3D structures. Here we study the exceptions to this general trend—directly coevolving residue pairs that are distant in protein structures—to determine the origins of evolutionary pressure on spatially distant residues and to understand the sources of error in contact-based structure prediction. Over a set of 4,000 protein families, we find that 25% of directly coevolving residue pairs are separated by more than 5 Å in protein structures and 3% by more than 15 Å. The majority (91%) of directly coevolving residue pairs in the 5–15 Å range are found to be in contact in at least one homologous structure—these exceptions arise from structural variation in the family in the region containing the residues. Thirty-five percent of the exceptions greater than 15 Å are at homo-oligomeric interfaces, 19% arise from family structural variation, and 27% are in repeat proteins likely reflecting alignment errors. Of the remaining long-range exceptions (<1% of the total number of coupled pairs), many can be attributed to close interactions in an oligomeric state. Overall, the results suggest that directly coevolving residue pairs not in repeat proteins are spatially proximal in at least one biologically relevant protein conformation within the family; we find little evidence for direct coupling between residues at spatially separated allosteric and functional sites or for increased direct coupling between residue pairs on putative allosteric pathways connecting them. PMID:28784799

  5. G protein-coupled odorant receptors: From sequence to structure.

    PubMed

    de March, Claire A; Kim, Soo-Kyung; Antonczak, Serge; Goddard, William A; Golebiowski, Jérôme

    2015-09-01

    Odorant receptors (ORs) are the largest subfamily within class A G protein-coupled receptors (GPCRs). No experimental structural data of any OR is available to date and atomic-level insights are likely to be obtained by means of molecular modeling. In this article, we critically align sequences of ORs with those GPCRs for which a structure is available. Here, an alignment consistent with available site-directed mutagenesis data on various ORs is proposed. Using this alignment, the choice of the template is deemed rather minor for identifying residues that constitute the wall of the binding cavity or those involved in G protein recognition. © 2015 The Protein Society.

  6. Construction of Matryoshka-type structures from supercharged protein nanocages.

    PubMed

    Beck, Tobias; Tetter, Stephan; Künzle, Matthias; Hilvert, Donald

    2015-01-12

    Designing nanoscaled hierarchical structures with increasing levels of complexity is challenging. Here we show that electrostatic interactions between two complementarily supercharged protein nanocages can be effectively utilized to create nested Matryoshka-type structures. Cage-within-cage complexes containing spatially ordered iron oxide nanoparticles spontaneously self-assemble upon mixing positively supercharged ferritin compartments with AaLS-13, a larger shell-forming protein with a negatively supercharged lumen. Exploiting engineered Coulombic interactions and protein dynamics in this way opens up new avenues for creating hierarchically organized supramolecular assemblies for application as delivery vehicles, reaction chambers, and artificial organelles. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Protein Corona in Response to Flow: Effect on Protein Concentration and Structure.

    PubMed

    Jayaram, Dhanya T; Pustulka, Samantha M; Mannino, Robert G; Lam, Wilbur A; Payne, Christine K

    2018-04-09

    Nanoparticles used in cellular applications encounter free serum proteins that adsorb onto the surface of the nanoparticle, forming a protein corona. This protein layer controls the interaction of nanoparticles with cells. For nanomedicine applications, it is important to consider how intravenous injection and the subsequent shear flow will affect the protein corona. Our goal was to determine if shear flow changed the composition of the protein corona and if these changes affected cellular binding. Colorimetric assays of protein concentration and gel electrophoresis demonstrate that polystyrene nanoparticles subjected to flow have a greater concentration of serum proteins adsorbed on the surface, especially plasminogen. Plasminogen, in the absence of nanoparticles, undergoes changes in structure in response to flow, characterized by fluorescence and circular dichroism spectroscopy. The protein-nanoparticle complexes formed from fetal bovine serum after flow had decreased cellular binding, as measured with flow cytometry. In addition to the relevance for nanomedicine, these results also highlight the technical challenges of protein corona studies. The composition of the protein corona was highly dependent on the initial mixing step: rocking, vortexing, or flow. Overall, these results reaffirm the importance of the protein corona in nanoparticle-cell interactions and point toward the challenges of predicting corona composition based on nanoparticle properties. Copyright © 2018 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  8. Covalent Bonding of Chlorogenic Acid Induces Structural Modifications on Sunflower Proteins.

    PubMed

    Karefyllakis, Dimitris; Salakou, Stavroula; Bitter, J Harry; van der Goot, Atze J; Nikiforidis, Constantinos V

    2018-02-19

    Proteins and phenols coexist in the confined space of plant cells leading to reactions between them, which result in new covalently bonded complex molecules. This kind of reactions has been widely observed during storage and processing of plant materials. However, the nature of the new complex molecules and their physicochemical properties are largely unknown. Therefore, we investigated the structural characteristics of covalently bonded complexes between sunflower protein isolate (SFPI, protein content 85 wt %) and the dominant phenol in the confined space of a sunflower seed cell (chlorogenic acid, CGA). It was shown that the efficiency of bond formation goes through a maximum as a function of the SFPI:CGA ratio. Moreover, the bonding of CGA with proteins resulted in changes in the secondary and tertiary structure of the protein. It was also shown that the phenol bound strongly to the protein, which resulted in new crosslinks between the polypeptide chains. As a result, secondary structures like α-helices and β-sheets diminished, which in turn resulted in more disordered domains and a subsequent modification of the tertiary structure of the proteins. These findings are relevant for establishing future protocols for extraction of high-quality proteins and phenols when utilizing plant material and offer insight into the impact of processing that these ingredients endure. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.

    PubMed

    Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl

    2012-07-13

    Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results

  10. What are the structural features that drive partitioning of proteins in aqueous two-phase systems?

    PubMed

    Wu, Zhonghua; Hu, Gang; Wang, Kui; Zaslavsky, Boris Yu; Kurgan, Lukasz; Uversky, Vladimir N

    2017-01-01

    Protein partitioning in aqueous two-phase systems (ATPSs) represents a convenient, inexpensive, and easy to scale-up protein separation technique. Since partition behavior of a protein dramatically depends on an ATPS composition, it would be highly beneficial to have reliable means for (even qualitative) prediction of partitioning of a target protein under different conditions. Our aim was to understand which structural features of proteins contribute to partitioning of a query protein in a given ATPS. We undertook a systematic empirical analysis of relations between 57 numerical structural descriptors derived from the corresponding amino acid sequences and crystal structures of 10 well-characterized proteins and the partition behavior of these proteins in 29 different ATPSs. This analysis revealed that just a few structural characteristics of proteins can accurately determine behavior of these proteins in a given ATPS. However, partition behavior of proteins in different ATPSs relies on different structural features. In other words, we could not find a unique set of protein structural features derived from their crystal structures that could be used for the description of the protein partition behavior of all proteins in all ATPSs analyzed in this study. We likely need to gain better insight into relationships between protein-solvent interactions and protein structure peculiarities, in particular given limitations of the used here crystal structures, to be able to construct a model that accurately predicts protein partition behavior across all ATPSs. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Integrating protein structural dynamics and evolutionary analysis with Bio3D.

    PubMed

    Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J

    2014-12-10

    Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .

  12. Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.

    PubMed

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-04-10

    DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.

  13. Adaptability of Protein Structures to Enable Functional Interactions and Evolutionary Implications

    PubMed Central

    Haliloglu, Turkan; Bahar, Ivet

    2015-01-01

    Several studies in recent years have drawn attention to the ability of proteins to adapt to intermolecular interactions by conformational changes along structure-encoded collective modes of motions. These so-called soft modes, primarily driven by entropic effects, facilitate, if not enable, functional interactions. They represent excursions on the conformational space along principal low-ascent directions/paths away from the original free energy minimum, and they are accessible to the protein even prior to protein-protein/ligand interactions. An emerging concept from these studies is the evolution of structures or modular domains to favor such modes of motion that will be recruited or integrated for enabling functional interactions. Structural dynamics, including the allosteric switches in conformation that are often stabilized upon formation of complexes and multimeric assemblies, emerge as key properties that are evolutionarily maintained to accomplish biological activities, consistent with the paradigm sequence → structure → dynamics → function where ‘dynamics’ bridges structure and function. PMID:26254902

  14. Predicting protein structures with a multiplayer online game.

    PubMed

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

    2010-08-05

    People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

  15. Prediction of protein tertiary structure to low resolution: performance for a large and structurally diverse test set.

    PubMed

    Eyrich, V A; Standley, D M; Friesner, R A

    1999-05-14

    We report the tertiary structure predictions for 95 proteins ranging in size from 17 to 160 residues starting from known secondary structure. Predictions are obtained from global minimization of an empirical potential function followed by the application of a refined atomic overlap potential. The minimization strategy employed represents a variant of the Monte Carlo plus minimization scheme of Li and Scheraga applied to a reduced model of the protein chain. For all of the cases except beta-proteins larger than 75 residues, a native-like structure, usually 4-6 A root-mean-square deviation from the native, is located. For beta-proteins larger than 75 residues, the energy gap between native-like structures and the lowest energy structures produced in the simulation is large, so that low RMSD structures are not generated starting from an unfolded state. This is attributed to the lack of an explicit hydrogen bond term in the potential function, which we hypothesize is necessary to stabilize large assemblies of beta-strands. Copyright 1999 Academic Press.

  16. Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

    PubMed

    Bratholm, Lars A; Jensen, Jan H

    2017-03-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural

  17. Students' understanding of primary and secondary protein structure: drawing secondary protein structure reveals student understanding better than simple recognition of structures.

    PubMed

    Harle, Marissa; Towns, Marcy H

    2013-01-01

    The interdisciplinary nature of biochemistry courses requires students to use both chemistry and biology knowledge to understand biochemical concepts. Research that has focused on external representations in biochemistry has uncovered student difficulties in comprehending and interpreting external representations in addition to a fragmented understanding of fundamental biochemistry concepts. This project focuses on students' understanding of primary and secondary protein structure and drawings (representations) of hydrogen-bonding in alpha helices and beta sheets. Analysis demonstrated that students can recognize and identify primary protein structure concepts when given a polypeptide. However, when asked to draw alpha helices and beta sheets and explain the role of hydrogen bonding their drawings students exhibited a fragmented understanding that lacked coherence. Faculty are encouraged to have students draw molecular level representations to make their mental models more explicit, complete, and coherent. This is in contrast to recognition and identification tasks, which do not adequately probe mental models and molecular level understanding. © 2013 by The International Union of Biochemistry and Molecular Biology.

  18. Dissecting protein function: an efficient protocol for identifying separation-of-function mutations that encode structurally stable proteins.

    PubMed

    Lubin, Johnathan W; Rao, Timsi; Mandell, Edward K; Wuttke, Deborah S; Lundblad, Victoria

    2013-03-01

    Mutations that confer the loss of a single biochemical property (separation-of-function mutations) can often uncover a previously unknown role for a protein in a particular biological process. However, most mutations are identified based on loss-of-function phenotypes, which cannot differentiate between separation-of-function alleles vs. mutations that encode unstable/unfolded proteins. An alternative approach is to use overexpression dominant-negative (ODN) phenotypes to identify mutant proteins that disrupt function in an otherwise wild-type strain when overexpressed. This is based on the assumption that such mutant proteins retain an overall structure that is comparable to that of the wild-type protein and are able to compete with the endogenous protein (Herskowitz 1987). To test this, the in vivo phenotypes of mutations in the Est3 telomerase subunit from Saccharomyces cerevisiae were compared with the in vitro secondary structure of these mutant proteins as analyzed by circular-dichroism spectroscopy, which demonstrates that ODN is a more sensitive assessment of protein stability than the commonly used method of monitoring protein levels from extracts. Reverse mutagenesis of EST3, which targeted different categories of amino acids, also showed that mutating highly conserved charged residues to the oppositely charged amino acid had an increased likelihood of generating a severely defective est3(-) mutation, which nevertheless encoded a structurally stable protein. These results suggest that charge-swap mutagenesis directed at a limited subset of highly conserved charged residues, combined with ODN screening to eliminate partially unfolded proteins, may provide a widely applicable and efficient strategy for generating separation-of-function mutations.

  19. Iterative non-sequential protein structural alignment.

    PubMed

    Salem, Saeed; Zaki, Mohammed J; Bystroff, Christopher

    2009-06-01

    Structural similarity between proteins gives us insights into their evolutionary relationships when there is low sequence similarity. In this paper, we present a novel approach called SNAP for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process consisting of a superposition step and an alignment step, until convergence. We propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of SNAP alignments were assessed by comparing against the manually curated reference alignments in the challenging SISY and RIPC datasets. Moreover, when applied to a dataset of 4410 protein pairs selected from the CATH database, SNAP produced longer alignments with lower rmsd than several state-of-the-art alignment methods. Classification of folds using SNAP alignments was both highly sensitive and highly selective. The SNAP software along with the datasets are available online at http://www.cs.rpi.edu/~zaki/software/SNAP.

  20. Measuring and comparing structural fluctuation patterns in large protein datasets.

    PubMed

    Fuglebakk, Edvin; Echave, Julián; Reuter, Nathalie

    2012-10-01

    The function of a protein depends not only on its structure but also on its dynamics. This is at the basis of a large body of experimental and theoretical work on protein dynamics. Further insight into the dynamics-function relationship can be gained by studying the evolutionary divergence of protein motions. To investigate this, we need appropriate comparative dynamics methods. The most used dynamical similarity score is the correlation between the root mean square fluctuations (RMSF) of aligned residues. Despite its usefulness, RMSF is in general less evolutionarily conserved than the native structure. A fundamental issue is whether RMSF is not as conserved as structure because dynamics is less conserved or because RMSF is not the best property to use to study its conservation. We performed a systematic assessment of several scores that quantify the (dis)similarity between protein fluctuation patterns. We show that the best scores perform as well as or better than structural dissimilarity, as assessed by their consistency with the SCOP classification. We conclude that to uncover the full extent of the evolutionary conservation of protein fluctuation patterns, it is important to measure the directions of fluctuations and their correlations between sites. Nathalie.Reuter@mbi.uib.no Supplementary data are available at Bioinformatics Online.