Sample records for domain structure sequence

  1. Predicting PDZ domain mediated protein interactions from structure

    PubMed Central

    2013-01-01

    Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252

  2. Structure and DNA-Binding Sites of the SWI1 AT-rich Interaction Domain (ARID) Suggest Determinants for Sequence-Specific DNA Recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Suhkmann; Zhang, Ziming; Upchurch, Sean

    2004-04-16

    2 ARID is a homologous family of DNA-binding domains that occur in DNA binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We havemore » solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized non-specific DNA-binding by the SWI1 ARID domain. Results from this study indicate that a flexible long internal loop in ARID motif is likely to be important for sequence specific DNA-recognition. The structure of human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that boundary of the DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies. Key Words: ARID, SWI1, NMR, structural genomics, protein-DNA interaction.« less

  3. Domain-specific learning of grammatical structure in musical and phonological sequences.

    PubMed

    Bly, Benjamin Martin; Carrión, Ricardo E; Rasch, Björn

    2009-01-01

    Artificial grammar learning depends on acquisition of abstract structural representations rather than domain-specific representational constraints, or so many studies tell us. Using an artificial grammar task, we compared learning performance in two stimulus domains in which respondents have differing tacit prior knowledge. We found that despite grammatically identical sequence structures, learning was better for harmonically related chord sequences than for letter name sequences or harmonically unrelated chord sequences. We also found transfer effects within the musical and letter name tasks, but not across the domains. We conclude that knowledge acquired in implicit learning depends not only on abstract features of structured stimuli, but that the learning of regularities is in some respects domain-specific and strongly linked to particular features of the stimulus domain.

  4. A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3

    PubMed Central

    Dietmann, Sabine; Park, Jong; Notredame, Cedric; Heger, Andreas; Lappe, Michael; Holm, Liisa

    2001-01-01

    The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families. PMID:11125048

  5. The ASTRAL Compendium in 2004

    DOE R&D Accomplishments Database

    Chandonia, John-Marc; Hon, Gary; Walker, Nigel S.; Lo Conte, Loredana; Koehl, Patrice; Levitt, Michael; Brenner, Steven E.

    2003-09-15

    The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54,745 domains, more than three times as many as the initial release four years ago. ASTRAL has undergone major transformations in the past two years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as available integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods.

  6. Structural Studies of Geosmin Synthase, a Bifunctional Sesquiterpene Synthase with Alpha-Alpha Domain Architecture that Catalyzes a Unique Cyclization-Fragmentation Reaction Sequence

    PubMed Central

    Harris, Golda G.; Lombardi, Patrick M.; Pemberton, Travis A.; Matsui, Tsutomu; Weiss, Thomas M.; Cole, Kathryn E.; Köksal, Mustafa; Murphy, Frank V.; Vedula, L. Sangeetha; Chou, Wayne K.W.; Cane, David E.; Christianson, David W.

    2015-01-01

    Geosmin synthase from Streptomyces coelicolor (ScGS) catalyzes an unusual, metal-dependent terpenoid cyclization and fragmentation reaction sequence. Two distinct active sites are required for catalysis: the N-terminal domain catalyzes the ionization and cyclization of farnesyl diphosphate to form germacradienol and inorganic pyrophosphate (PPi), and the C-terminal domain catalyzes the protonation, cyclization, and fragmentation of germacradienol to form geosmin and acetone through a retro-Prins reaction. A unique αα domain architecture is predicted for ScGS based on amino acid sequence: each domain contains the metal-binding motifs typical of a class I terpenoid cyclase, and each domain requires Mg2+ for catalysis. Here, we report the X-ray crystal structure of the unliganded N-terminal domain of ScGS and the structure of its complex with 3 Mg2+ ions and alendronate. These structures highlight conformational changes required for active site closure and catalysis. Although neither full-length ScGS nor constructs of the C-terminal domain could be crystallized, homology models of the C-terminal domain were constructed based on ~36% sequence identity with the N-terminal domain. Small-angle X-ray scattering experiments yield low resolution molecular envelopes into which the N-terminal domain crystal structure and the C-terminal domain homology model were fit, suggesting possible αα domain architectures as frameworks for bifunctional catalysis. PMID:26598179

  7. PAT: predictor for structured units and its application for the optimization of target molecules for the generation of synthetic antibodies.

    PubMed

    Jeon, Jouhyun; Arnold, Roland; Singh, Fateh; Teyra, Joan; Braun, Tatjana; Kim, Philip M

    2016-04-01

    The identification of structured units in a protein sequence is an important first step for most biochemical studies. Importantly for this study, the identification of stable structured region is a crucial first step to generate novel synthetic antibodies. While many approaches to find domains or predict structured regions exist, important limitations remain, such as the optimization of domain boundaries and the lack of identification of non-domain structured units. Moreover, no integrated tool exists to find and optimize structural domains within protein sequences. Here, we describe a new tool, PAT ( http://www.kimlab.org/software/pat ) that can efficiently identify both domains (with optimized boundaries) and non-domain putative structured units. PAT automatically analyzes various structural properties, evaluates the folding stability, and reports possible structural domains in a given protein sequence. For reliability evaluation of PAT, we applied PAT to identify antibody target molecules based on the notion that soluble and well-defined protein secondary and tertiary structures are appropriate target molecules for synthetic antibodies. PAT is an efficient and sensitive tool to identify structured units. A performance analysis shows that PAT can characterize structurally well-defined regions in a given sequence and outperforms other efforts to define reliable boundaries of domains. Specially, PAT successfully identifies experimentally confirmed target molecules for antibody generation. PAT also offers the pre-calculated results of 20,210 human proteins to accelerate common queries. PAT can therefore help to investigate large-scale structured domains and improve the success rate for synthetic antibody generation.

  8. LenVarDB: database of length-variant protein domains.

    PubMed

    Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan

    2014-01-01

    Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.

  9. Exploring the limits of sequence and structure in a variant βγ-crystallin domain of the protein absent in melanoma-1 (AIM1)

    PubMed Central

    Aravind, Penmatsa; Wistow, Graeme; Sharma, Yogendra; Sankaranarayanan, Rajan

    2008-01-01

    βγ-Crystallins belong to a superfamily of proteins in prokaryotes and eukaryotes that are based on duplications of a characteristic, highly conserved Greek Key motif. Most members of the superfamily in vertebrates are structural proteins of the eye lens that contain four motifs arranged as two structural domains. Absent in melanoma-1 (AIM1), an unusual member of the superfamily whose expression is associated with suppression of malignancy in melanoma, contains 12 βγ-crystallin motifs in six domains. Some of these motifs diverge considerably from the canonical motif sequence. AIM1g1, the first βγ-crystallin domain of AIM1, is the most variant of βγ-crystallin domains currently known. In order to understand the limits of sequence variation on the structure, we report the crystal structure of AIM1g1 at 1.9Å resolution. In spite of having changes in key residues, the domain retains the overall βγ-crystallin fold. The domain also contains an unusual extended surface loop that significantly alters the shape of the domain and its charge profile. This structure illustrates the resilience of the βγ fold to considerable sequence changes and its remarkable ability to adapt for novel functions. PMID:18582473

  10. Structural diversity of domain superfamilies in the CATH database.

    PubMed

    Reeves, Gabrielle A; Dallman, Timothy J; Redfern, Oliver C; Akpor, Adrian; Orengo, Christine A

    2006-07-14

    The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).

  11. Automatic prediction of protein domains from sequence information using a hybrid learning system.

    PubMed

    Nagarajan, Niranjan; Yona, Golan

    2004-06-12

    We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/

  12. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  13. Structural genomics reveals EVE as a new ASCH/PUA-related domain

    PubMed Central

    Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard

    2014-01-01

    Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354

  14. Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertonati, C.; Punta, M; Fischer, M

    2008-01-01

    We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less

  15. Structure and stability of the ankyrin domain of the Drosophila Notch receptor.

    PubMed

    Zweifel, Mark E; Leahy, Daniel J; Hughson, Frederick M; Barrick, Doug

    2003-11-01

    The Notch receptor contains a conserved ankyrin repeat domain that is required for Notch-mediated signal transduction. The ankyrin domain of Drosophila Notch contains six ankyrin sequence repeats previously identified as closely matching the ankyrin repeat consensus sequence, and a putative seventh C-terminal sequence repeat that exhibits lower similarity to the consensus sequence. To better understand the role of the Notch ankyrin domain in Notch-mediated signaling and to examine how structure is distributed among the seven ankyrin sequence repeats, we have determined the crystal structure of this domain to 2.0 angstroms resolution. The seventh, C-terminal, ankyrin sequence repeat adopts a regular ankyrin fold, but the first, N-terminal ankyrin repeat, which contains a 15-residue insertion, appears to be largely disordered. The structure reveals a substantial interface between ankyrin polypeptides, showing a high degree of shape and charge complementarity, which may be related to homotypic interactions suggested from indirect studies. However, the Notch ankyrin domain remains largely monomeric in solution, demonstrating that this interface alone is not sufficient to promote tight association. Using the structure, we have classified reported mutations within the Notch ankyrin domain that are known to disrupt signaling into those that affect buried residues and those restricted to surface residues. We show that the buried substitutions greatly decrease protein stability, whereas the surface substitutions have only a marginal affect on stability. The surface substitutions are thus likely to interfere with Notch signaling by disrupting specific Notch-effector interactions and map the sites of these interactions.

  16. Sequence and Secondary Structure of the Mitochondrial Small-Subunit rRNA V4, V6, and V9 Domains Reveal Highly Species-Specific Variations within the Genus Agrocybe

    PubMed Central

    Gonzalez, Patrice; Labarère, Jacques

    1998-01-01

    A comparative study of variable domains V4, V6, and V9 of the mitochondrial small-subunit (SSU) rRNA was carried out with the genus Agrocybe by PCR amplification of 42 wild isolates belonging to 10 species, Agrocybe aegerita, Agrocybe dura, Agrocybe chaxingu, Agrocybe erebia, Agrocybe firma, Agrocybe praecox, Agrocybe paludosa, Agrocybe pediades, Agrocybe alnetorum, and Agrocybe vervacti. Sequencing of the PCR products showed that the three domains in the isolates belonging to the same species were the same length and had the same sequence, while variations were found among the 10 species. Alignment of the sequences showed that nucleotide motifs encountered in the smallest sequence of each variable domain were also found in the largest sequence, indicating that the sequences evolved by insertion-deletion events. Determination of the secondary structure of each domain revealed that the insertion-deletion events commonly occurred in regions not directly involved in the secondary structure (i.e., the loops). Moreover, conserved sequences ranging from 4 to 25 nucleotides long were found at the beginning and end of each domain and could constitute genus-specific sequences. Comparisons of the V4, V6, and V9 secondary structures resulted in identification of the following four groups: (i) group I, which was characterized by the presence of additional P23-1 and P23-3 helices in the V4 domain and the lack of the P49-1 helix in V9 and included A. aegerita, A. chaxingu, and A. erebia; (ii) group II, which had the P23-3 helix in V4 and the P49-1 helix in V9 and included A. pediades; (iii) group III, which did not have additional helices in V4, had the P49-1 helix in V9 and included A. paludosa, A. firma, A. alnetorum, and A. praecox; and (iv) group IV, which lacked both the V4 additional helices and the P49-1 helix in V9 and included A. vervacti and A. dura. This grouping of species was supported by the structure of a consensus tree based on the variable domain sequences. The conservation of the sequences of the V4, V6, and V9 domains of the mitochondrial SSU rRNA within species and the high degree of interspecific variation found in the Agrocybe species studied open the way for these sequences to be used as specific molecular markers of the Basidiomycota. PMID:9797259

  17. Sequence and secondary structure of the mitochondrial small-subunit rRNA V4, V6, and V9 domains reveal highly species-specific variations within the genus Agrocybe.

    PubMed

    Gonzalez, P; Labarère, J

    1998-11-01

    A comparative study of variable domains V4, V6, and V9 of the mitochondrial small-subunit (SSU) rRNA was carried out with the genus Agrocybe by PCR amplification of 42 wild isolates belonging to 10 species, Agrocybe aegerita, Agrocybe dura, Agrocybe chaxingu, Agrocybe erebia, Agrocybe firma, Agrocybe praecox, Agrocybe paludosa, Agrocybe pediades, Agrocybe alnetorum, and Agrocybe vervacti. Sequencing of the PCR products showed that the three domains in the isolates belonging to the same species were the same length and had the same sequence, while variations were found among the 10 species. Alignment of the sequences showed that nucleotide motifs encountered in the smallest sequence of each variable domain were also found in the largest sequence, indicating that the sequences evolved by insertion-deletion events. Determination of the secondary structure of each domain revealed that the insertion-deletion events commonly occurred in regions not directly involved in the secondary structure (i.e., the loops). Moreover, conserved sequences ranging from 4 to 25 nucleotides long were found at the beginning and end of each domain and could constitute genus-specific sequences. Comparisons of the V4, V6, and V9 secondary structures resulted in identification of the following four groups: (i) group I, which was characterized by the presence of additional P23-1 and P23-3 helices in the V4 domain and the lack of the P49-1 helix in V9 and included A. aegerita, A. chaxingu, and A. erebia; (ii) group II, which had the P23-3 helix in V4 and the P49-1 helix in V9 and included A. pediades; (iii) group III, which did not have additional helices in V4, had the P49-1 helix in V9 and included A. paludosa, A. firma, A. alnetorum, and A. praecox; and (iv) group IV, which lacked both the V4 additional helices and the P49-1 helix in V9 and included A. vervacti and A. dura. This grouping of species was supported by the structure of a consensus tree based on the variable domain sequences. The conservation of the sequences of the V4, V6, and V9 domains of the mitochondrial SSU rRNA within species and the high degree of interspecific variation found in the Agrocybe species studied open the way for these sequences to be used as specific molecular markers of the Basidiomycota.

  18. Effect of the SH3-SH2 domain linker sequence on the structure of Hck kinase.

    PubMed

    Meiselbach, Heike; Sticht, Heinrich

    2011-08-01

    The coordination of activity in biological systems requires the existence of different signal transduction pathways that interact with one another and must be precisely regulated. The Src-family tyrosine kinases, which are found in many signaling pathways, differ in their physiological function despite their high overall structural similarity. In this context, the differences in the SH3-SH2 domain linkers might play a role for differential regulation, but the structural consequences of linker sequence remain poorly understood. We have therefore performed comparative molecular dynamics simulations of wildtype Hck and of a mutant Hck in which the SH3-SH2 domain linker is replaced by the corresponding sequence from the homologous kinase Lck. These simulations reveal that linker replacement not only affects the orientation of the SH3 domain itself, but also leads to an alternative conformation of the activation segment in the Hck kinase domain. The sequence of the SH3-SH2 domain linker thus exerts a remote effect on the active site geometry and might therefore play a role in modulating the structure of the inactive kinase or in fine-tuning the activation process itself.

  19. Nature of the protein universe

    PubMed Central

    Levitt, Michael

    2009-01-01

    The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by ≈15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families. PMID:19541617

  20. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. The primary structure of stinging nettle (Urtica dioica) agglutinin. A two-domain member of the hevein family.

    PubMed

    Beintema, J J; Peumans, W J

    1992-03-09

    The primary structure of stinging nettle (Urtica dioica) agglutinin has been determined by sequence analysis of peptides obtained from three overlapping proteolytic digests. The sequence of 80 residues consists of two hevein-like domains with the same spacing of half-cystine residues and several other conserved residues as observed earlier in other proteins with hevein-like domains. The hinge region between the two domains is four residues longer than those between the four domains in cereal lectins like wheat germ agglutinin.

  2. The sequence, structure and evolutionary features of HOTAIR in mammals

    PubMed Central

    2011-01-01

    Background An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. Results To identify conserved domains in HOTAIR and study the phylogenetic distribution of this lncRNA, we searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR contained elements shared by other lncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many were within introns of, or antisense to, protein-coding genes, and conservation of the flanking genes was observed only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for orthologous sequences of HOTAIR exons. Exon1 at the 5' end and a domain in exon6 near the 3' end, which contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures were predicted for exon1, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of two fragments, in exon1 and the domain B of exon6 respectively, were identified to robustly occur in predicted structures of exon1, domain B of exon6 and the full HOTAIR in mammals. Conclusions HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure prediction identifies two fragments in the 5' end exon1 and the 3' end domain B of exon6, with sequence and structure invariably occurring in various predicted structures of exon1, the domain B of exon6 and the full HOTAIR. PMID:21496275

  3. Structure of a Burkholderia pseudomallei Trimeric Autotransporter Adhesin Head

    PubMed Central

    Edwards, Thomas E.; Phan, Isabelle; Abendroth, Jan; Dieterich, Shellie H.; Masoudi, Amir; Guo, Wenjin; Hewitt, Stephen N.; Kelley, Angela; Leibly, David; Brittnacher, Mitch J.; Staker, Bart L.; Miller, Samuel I.; Van Voorhis, Wesley C.; Myler, Peter J.; Stewart, Lance J.

    2010-01-01

    Background Pathogenic bacteria adhere to the host cell surface using a family of outer membrane proteins called Trimeric Autotransporter Adhesins (TAAs). Although TAAs are highly divergent in sequence and domain structure, they are all conceptually comprised of a C-terminal membrane anchoring domain and an N-terminal passenger domain. Passenger domains consist of a secretion sequence, a head region that facilitates binding to the host cell surface, and a stalk region. Methodology/Principal Findings Pathogenic species of Burkholderia contain an overabundance of TAAs, some of which have been shown to elicit an immune response in the host. To understand the structural basis for host cell adhesion, we solved a 1.35 Å resolution crystal structure of a BpaA TAA head domain from Burkholderia pseudomallei, the pathogen that causes melioidosis. The structure reveals a novel fold of an intricately intertwined trimer. The BpaA head is composed of structural elements that have been observed in other TAA head structures as well as several elements of previously unknown structure predicted from low sequence homology between TAAs. These elements are typically up to 40 amino acids long and are not domains, but rather modular structural elements that may be duplicated or omitted through evolution, creating molecular diversity among TAAs. Conclusions/Significance The modular nature of BpaA, as demonstrated by its head domain crystal structure, and of TAAs in general provides insights into evolution of pathogen-host adhesion and may provide an avenue for diagnostics. PMID:20862217

  4. Domain atrophy creates rare cases of functional partial protein domains.

    PubMed

    Prakash, Ananth; Bateman, Alex

    2015-04-30

    Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.

  5. X-ray crystal structure of the passenger domain of plasmid encoded toxin(Pet), an autotransporter enterotoxin from enteroaggregative Escherichia coli (EAEC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Domingo Meza-Aguilar, J.; Laboratorio de Patogenicidad Bacteriana, Unidad de Hemato Oncología e Investigación, Hospital Infantil de México Federico Gómez 06720, D.F.; Fromme, Petra

    Highlights: • X-ray crystal structure of the passenger domain of Plasmid encoded toxin at 2.3 Å. • Structural differences between Pet passenger domain and EspP protein are described. • High flexibility of the C-terminal beta helix is structurally assigned. - Abstract: Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause ofmore » acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50% compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181–190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135 and 143 compared to the structure of EspP.« less

  6. Identification and Structural Characterization of the ALIX-Binding Late Domains of Simian Immunodeficiency Virus SIV mac239 and SIV agmTan-1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Q Zhai; M Landesman; H Robinson

    2011-12-31

    Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub MAC239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less

  7. Identification and Structural Characterization of the ALIX-Binding Late Domains of Simian Immunodeficiency Virus SIVmac239 and SIVagmTan-1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhai, Q.; Robinson, H.; Landesman, M. B.

    2011-01-01

    Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub mac239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less

  8. Kinetoplast DNA minicircles of phloem-restricted Phytomonas associated with wilt diseases of coconut and oil palms have a two-domain structure.

    PubMed

    Dollet, M; Sturm, N R; Ahomadegbe, J C; Campbell, D A

    2001-11-27

    We report the cloning and sequencing of the first minicircle from a phloem-restricted, pathogenic Phytomonas sp. (Hart 1) isolated from a coconut palm with hartrot disease. The minicircle possessed a two-domain structure of two conserved regions, each containing three conserved sequence blocks (CSB). Based on the sequence around CSB 3 from Hart 1, PCR primers were designed to allow specific amplification of Phytomonas minicircles. This primer pair demonstrated specificity for at least six groups of plant trypanosomatids and did not amplify from insect trypanosomatids. The PCR results were consistent with a two-domain structure for other plant trypanosomatids.

  9. Late Cenozoic sedimentation and volcanism during transtensional deformation in Wingate Wash and the Owlshead Mountains, Death Valley

    USGS Publications Warehouse

    Luckow, H.G.; Pavlis, T.L.; Serpa, L.F.; Guest, B.; Wagner, D.L.; Snee, L.; Hensley, T.M.; Korjenkov, A.

    2005-01-01

    New 1:24,000 scale mapping, geochemical analyses of volcanic rocks, and Ar/Ar and tephrochronology analyses of the Wingate Wash, northern Owlshead Mountain and Southern Panamint Mountain region document a complex structural history constrained by syntectonic volcanism and sedimentation. In this study, the region is divided into five structural domains with distinct, but related, histories: (1) The southern Panamint domain is a structurally intact, gently south-tilted block dominated by a middle Miocene volcanic center recognized as localized hypabyssal intrusives surrounded by proximal facies pyroclastic rocks. This Miocene volcanic sequence is an unusual alkaline volcanic assemblage ranging from trachybasalt to rhyolite, but dominated by trachyandesite. The volcanic rocks are overlain in the southwestern Panamint Mountains by a younger (Late Miocene?) fanglomerate sequence. (2) An upper Wingate Wash domain is characterized by large areas of Quaternary cover and complex overprinting of older structure by Quaternary deformation. Quaternary structures record ???N-S shortening concurrent with ???E-W extension accommodated by systems of strike-slip and thrust faults. (3) A central Wingate Wash domain contains a complex structural history that is closely tied to the stratigraphic evolution. In this domain, a middle Miocene volcanic package contains two distinct assemblages; a lower sequence dominated by alkaline pyroclastic rocks similar to the southern Panamint sequence and an upper basaltic sequence of alkaline basalt and basanites. This volcanic sequence is in turn overlain by a coarse clastic sedimentary sequence that records the unroofing of adjacent ranges and development of ???N-S trending, west-tilted fault blocks. We refer to this sedimentary sequence as the Lost Lake assemblage. (4) The lower Wingate Wash/northern Owlshead domain is characterized by a gently north-dipping stratigraphic sequence with an irregular unconformity at the base developed on granitic basement. The unconformity is locally overlain by channelized deposits of older Tertiary(?) red conglomerate, some of which predate the onset of extensive volcanism, but in most of the area is overlain by a moderately thick package of Middle Miocene trachybasalt, trachyandesitic, ash flows, lithic tuff, basaltic cinder, basanites, and dacitic pyroclastic, debris, and lahar flows with localized exposures of sedimentary rocks. The upper part of the Miocene stratigraphic sequence in this domain is comprised of coarse grained-clastic sediments that are apparently middle Miocene based on Ar/Ar dating of interbedded volcanic rocks. This sedimentary sequence, however, is lithologically indistinguishable from the structurally adjacent Late Miocene Lost Lake assemblage and a stratigraphically overlying Plio-Pleistocene alluvial fan; a relationship that handicaps tracing structures through this domain. This domain is also structurally complex and deformed by a series of northwest-southeast-striking, east-dipping, high-angle oblique, sinistral, normal faults that are cut by left-lateral strike-slip faults. The contact between the southern Panamint domain and the adjacent domains is a complex fault system that we interpret as a zone of Late Miocene distributed sinistral slip that is variably overprinted in different portions of the mapped area. The net sinistral slip across the Wingate Wash fault system is estimated at 7-9 km, based on offset of Proterozoic Crystal Springs Formation beneath the middle Miocene unconformity to as much as 15 km based on offset volcanic facies in Middle Miocene rocks. To the south of Wingate Wash, the northern Owlshead Mountains are also cut by a sinistral, northwest-dipping, oblique normal fault, (referred to as the Filtonny Fault) with significant slip that separates the Lower Wingate Wash and central Owlshead domains. The Filtonny Fault may represent a young conjugate fault to the dextral Southern Death Valley fault system and may be the northwest

  10. Characterizing protein domain associations by Small-molecule ligand binding

    PubMed Central

    Li, Qingliang; Cheng, Tiejun; Wang, Yanli; Bryant, Stephen H.

    2012-01-01

    Background Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. Methods We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. Results We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. Conclusions A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing. PMID:23745168

  11. Co-evolution of chitinases from maize and other cereals with secreted proteases from Pleosporineae fungi

    USDA-ARS?s Scientific Manuscript database

    Plant class IV chitinases are composed of a carboxy-terminal chitinase domain that is attached, through a linker sequence, to a small amino-terminal domain that can be thought of as a structured peptide. While both the peptide-like domain and the chitinase domain share sequence homology throughout m...

  12. Structural analysis of key gap junction domains--Lessons from genome data and disease-linked mutants.

    PubMed

    Bai, Donglin

    2016-02-01

    A gap junction (GJ) channel is formed by docking of two GJ hemichannels and each of these hemichannels is a hexamer of connexins. All connexin genes have been identified in human, mouse, and rat genomes and their homologous genes in many other vertebrates are available in public databases. The protein sequences of these connexins align well with high sequence identity in the same connexin across different species. Domains in closely related connexins and several residues in all known connexins are also well-conserved. These conserved residues form signatures (also known as sequence logos) in these domains and are likely to play important biological functions. In this review, the sequence logos of individual connexins, groups of connexins with common ancestors, and all connexins are analyzed to visualize natural evolutionary variations and the hot spots for human disease-linked mutations. Several gap junction domains are homologous, likely forming similar structures essential for their function. The availability of a high resolution Cx26 GJ structure and the subsequently-derived homology structure models for other connexin GJ channels elevated our understanding of sequence logos at the three-dimensional GJ structure level, thus facilitating the understanding of how disease-linked connexin mutants might impair GJ structure and function. This knowledge will enable the design of complementary variants to rescue disease-linked mutants. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase

    PubMed Central

    2014-01-01

    Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively. PMID:24742328

  14. X-Ray Crystal Structure of the passenger domain of Plasmid encoded toxin(Pet), an Autotransporter Enterotoxin from enteroaggregative Escherichia coli (EAEC)

    PubMed Central

    Meza-Aguilar, J. Domingo; Fromme, Petra; Torres-Larios, Alfredo; Mendoza-Hernández, Guillermo; Hernandez-Chiñas, Ulises; Monteros, Roberto A. Arreguin-Espinosa de los; Campos, Carlos A. Eslava; Fromme, Raimund

    2014-01-01

    Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause of acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50 % compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181-190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135-143 compared to the structure of EspP. PMID:24530907

  15. A high level interface to SCOP and ASTRAL implemented in python.

    PubMed

    Casbon, James A; Crooks, Gavin E; Saqi, Mansoor A S

    2006-01-10

    Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.

  16. Structure and inhibition analysis of the mouse SAD-B C-terminal fragment.

    PubMed

    Ma, Hui; Wu, Jing-Xiang; Wang, Jue; Wang, Zhi-Xin; Wu, Jia-Wei

    2016-10-01

    The SAD (synapses of amphids defective) kinases, including SAD-A and SAD-B, play important roles in the regulation of neuronal development, cell cycle, and energy metabolism. Our recent study of mouse SAD-A identified a unique autoinhibitory sequence (AIS), which binds at the junction of the kinase domain (KD) and the ubiquitin-associated (UBA) domain and exerts autoregulation in cooperation with UBA. Here, we report the crystal structure of the mouse SAD-B C-terminal fragment including the AIS and the kinase-associated domain 1 (KA1) at 2.8 Å resolution. The KA1 domain is structurally conserved, while the isolated AIS sequence is highly flexible and solvent-accessible. Our biochemical studies indicated that the SAD-B AIS exerts the same autoinhibitory role as that in SAD-A. We believe that the flexible isolated AIS sequence is readily available for interaction with KD-UBA and thus inhibits SAD-B activity.

  17. ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

    PubMed Central

    Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

    2009-01-01

    We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624

  18. Modular protein domains: an engineering approach toward functional biomaterials.

    PubMed

    Lin, Charng-Yu; Liu, Julie C

    2016-08-01

    Protein domains and peptide sequences are a powerful tool for conferring specific functions to engineered biomaterials. Protein sequences with a wide variety of functionalities, including structure, bioactivity, protein-protein interactions, and stimuli responsiveness, have been identified, and advances in molecular biology continue to pinpoint new sequences. Protein domains can be combined to make recombinant proteins with multiple functionalities. The high fidelity of the protein translation machinery results in exquisite control over the sequence of recombinant proteins and the resulting properties of protein-based materials. In this review, we discuss protein domains and peptide sequences in the context of functional protein-based materials, composite materials, and their biological applications. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. ECOD: An Evolutionary Classification of Protein Domains

    PubMed Central

    Kinch, Lisa N.; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V.

    2014-01-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies. PMID:25474468

  20. ECOD: an evolutionary classification of protein domains.

    PubMed

    Cheng, Hua; Schaeffer, R Dustin; Liao, Yuxing; Kinch, Lisa N; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V

    2014-12-01

    Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

  1. Domain organization and crystal structure of the catalytic domain of E.coli RluF, a pseudouridine synthase that acts on 23S rRNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sunita,S.; Zhenxing, H.; Swaathi, J.

    2006-01-01

    Pseudouridine synthases catalyze the isomerization of uridine to pseudouridine ({psi}) in rRNA and tRNA. The pseudouridine synthase RluF from Escherichia coli (E.C. 4.2.1.70) modifies U2604 in 23S rRNA, and belongs to a large family of pseudouridine synthases present in all kingdoms of life. Here we report the domain architecture and crystal structure of the catalytic domain of E. coli RluF at 2.6 Angstroms resolution. Limited proteolysis, mass spectrometry and N-terminal sequencing indicate that RluF has a distinct domain architecture, with the catalytic domain flanked at the N and C termini by additional domains connected to it by flexible linkers. Themore » structure of the catalytic domain of RluF is similar to those of RsuA and TruB. RluF is a member of the RsuA sequence family of {psi}-synthases, along with RluB and RluE. Structural comparison of RluF with its closest structural homologues, RsuA and TruB, suggests possible functional roles for the N-terminal and C-terminal domains of RluF.« less

  2. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures

    PubMed Central

    2012-01-01

    Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767

  3. CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences.

    PubMed

    Dawson, Natalie L; Sillitoe, Ian; Lees, Jonathan G; Lam, Su Datt; Orengo, Christine A

    2017-01-01

    This chapter describes the generation of the data in the CATH-Gene3D online resource and how it can be used to study protein domains and their evolutionary relationships. Methods will be presented for: comparing protein structures, recognizing homologs, predicting domain structures within protein sequences, and subclassifying superfamilies into functionally pure families, together with a guide on using the webpages.

  4. Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

    PubMed Central

    Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

    2013-01-01

    Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381

  5. Matrix metalloproteinases: structures, evolution, and diversification.

    PubMed

    Massova, I; Kotra, L P; Fridman, R; Mobashery, S

    1998-09-01

    A comprehensive sequence alignment of 64 members of the family of matrix metalloproteinases (MMPs) for the entire sequences, and subsequently the catalytic and the hemopexin-like domains, have been performed. The 64 MMPs were selected from plants, invertebrates, and vertebrates. The analyses disclosed that as many as 23 distinct subfamilies of these proteins are known to exist. Information from the sequence alignments was correlated with structures, both crystallographic as well as computational, of the catalytic domains for the 23 representative members of the MMP family. A survey of the metal binding sites and two loops containing variable sequences of amino acids, which are important for substrate interactions, are discussed. The collective data support the proposal that the assembly of the domains into multidomain enzymes was likely to be an early evolutionary event. This was followed by diversification, perhaps in parallel among the MMPs, in a subsequent evolutionary time scale. Analysis indicates that a retrograde structure simplification may have accounted for the evolution of MMPs with simple domain constituents, such as matrilysin, from the larger and more elaborate enzymes.

  6. Molecular Structures and Functional Relationships in Clostridial Neurotoxins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Swaminathan S.

    2011-12-01

    The seven serotypes of Clostridium botulinum neurotoxins (A-G) are the deadliest poison known to humans. They share significant sequence homology and hence possess similar structure-function relationships. Botulinum neurotoxins (BoNT) act via a four-step mechanism, viz., binding and internalization to neuronal cells, translocation of the catalytic domain into the cytosol and finally cleavage of one of the three soluble N-ethylmaleimide-sensitive factor attachment protein receptors (SNARE) causing blockage of neurotransmitter release leading to flaccid paralysis. Crystal structures of three holotoxins, BoNT/A, B and E, are available to date. Although the individual domains are remarkably similar, their domain organization is different. These structuresmore » have helped in correlating the structural and functional domains. This has led to the determination of structures of individual domains and combinations of them. Crystal structures of catalytic domains of all serotypes and several binding domains are now available. The catalytic domains are zinc endopeptidases and share significant sequence and structural homology. The active site architecture and the catalytic mechanism are similar although the binding mode of individual substrates may be different, dictating substrate specificity and peptide cleavage selectivity. Crystal structures of catalytic domains with substrate peptides provide clues to specificity and selectivity unique to BoNTs. Crystal structures of the receptor domain in complex with ganglioside or the protein receptor have provided information about the binding of botulinum neurotoxin to the neuronal cell. An overview of the structure-function relationship correlating the 3D structures with biochemical and biophysical data and how they can be used for structure-based drug discovery is presented here.« less

  7. Is there a domain-general cognitive structuring system? Evidence from structural priming across music, math, action descriptions, and language.

    PubMed

    Van de Cavey, Joris; Hartsuiker, Robert J

    2016-01-01

    Cognitive processing in many domains (e.g., sentence comprehension, music listening, and math solving) requires sequential information to be organized into an integrational structure. There appears to be some overlap in integrational processing across domains, as shown by cross-domain interference effects when for example linguistic and musical stimuli are jointly presented (Koelsch, Gunter, Wittfoth, & Sammler, 2005; Slevc, Rosenberg, & Patel, 2009). These findings support theories of overlapping resources for integrational processing across domains (cfr. SSIRH Patel, 2003; SWM, Kljajevic, 2010). However, there are some limitations to the studies mentioned above, such as the frequent use of unnaturalistic integrational difficulties. In recent years, the idea has risen that evidence for domain-generality in structural processing might also be yielded though priming paradigms (cfr. Scheepers, 2003). The rationale behind this is that integrational processing across domains regularly requires the processing of dependencies across short or long distances in the sequence, involving respectively less or more syntactic working memory resources (cfr. SWM, Kljajevic, 2010), and such processing decisions might persist over time. However, whereas recent studies have shown suggestive priming of integrational structure between language and arithmetics (though often dependent on arithmetic performance, cfr. Scheepers et al., 2011; Scheepers & Sturt, 2014), it remains to be investigated to what extent we can also find evidence for priming in other domains, such as music and action (cfr. SWM, Kljajevic, 2010). Experiment 1a showed structural priming from the processing of musical sequences onto the position in the sentence structure (early or late) to which a relative clause was attached in subsequent sentence completion. Importantly, Experiment 1b showed that a similar structural manipulation based on non-hierarchically ordered color sequences did not yield any priming effect, suggesting that the priming effect is not based on linear order, but integrational dependency. Finally, Experiment 2 presented primes in four domains (relative clause sentences, music, mathematics, and structured descriptions of actions), and consistently showed priming within and across domains. These findings provide clear evidence for domain-general structural processing mechanisms. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Structural basis of DNA target recognition by the B3 domain of Arabidopsis epigenome reader VAL1

    PubMed Central

    Sasnauskas, Giedrius; Kauneckaitė, Kotryna; Siksnys, Virginijus

    2018-01-01

    Abstract Arabidopsis thaliana requires a prolonged period of cold exposure during winter to initiate flowering in a process termed vernalization. Exposure to cold induces epigenetic silencing of the FLOWERING LOCUS C (FLC) gene by Polycomb group (PcG) proteins. A key role in this epigenetic switch is played by transcriptional repressors VAL1 and VAL2, which specifically recognize Sph/RY DNA sequences within FLC via B3 DNA binding domains, and mediate recruitment of PcG silencing machinery. To understand the structural mechanism of site-specific DNA recognition by VAL1, we have solved the crystal structure of VAL1 B3 domain (VAL1-B3) bound to a 12 bp oligoduplex containing the canonical Sph/RY DNA sequence 5′-CATGCA-3′/5′-TGCATG-3′. We find that VAL1-B3 makes H-bonds and van der Waals contacts to DNA bases of all six positions of the canonical Sph/RY element. In agreement with the structure, in vitro DNA binding studies show that VAL1-B3 does not tolerate substitutions at any position of the 5′-TGCATG-3′ sequence. The VAL1-B3–DNA structure presented here provides a structural model for understanding the specificity of plant B3 domains interacting with the Sph/RY and other DNA sequences. PMID:29660015

  9. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. PMID:27028541

  10. The Classification of Protein Domains.

    PubMed

    Dawson, Natalie; Sillitoe, Ian; Marsden, Russell L; Orengo, Christine A

    2017-01-01

    The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.

  11. Interaction of the p85 subunit of PI 3-kinase and its N-terminal SH2 domain with a PDGF receptor phosphorylation site: structural features and analysis of conformational changes.

    PubMed Central

    Panayotou, G; Bax, B; Gout, I; Federwisch, M; Wroblowski, B; Dhand, R; Fry, M J; Blundell, T L; Wollmer, A; Waterfield, M D

    1992-01-01

    Circular dichroism and fluorescence spectroscopy were used to investigate the structure of the p85 alpha subunit of the PI 3-kinase, a closely related p85 beta protein, and a recombinant SH2 domain-containing fragment of p85 alpha. Significant spectral changes, indicative of a conformational change, were observed on formation of a complex with a 17 residue peptide containing a phosphorylated tyrosine residue. The sequence of this peptide is identical to the sequence surrounding Tyr751 in the kinase-insert region of the platelet-derived growth factor beta-receptor (beta PDGFR). The rotational correlation times measured by fluorescence anisotropy decay indicated that phosphopeptide binding changed the shape of the SH2 domain-containing fragment. The CD and fluorescence spectroscopy data support the secondary structure prediction based on sequence analysis and provide evidence for flexible linker regions between the various domains of the p85 proteins. The significance of these results for SH2 domain-containing proteins is discussed. Images PMID:1330535

  12. Atomic interaction networks in the core of protein domains and their native folds.

    PubMed

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram

    2010-02-23

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.

  13. Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds

    PubMed Central

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram

    2010-01-01

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337

  14. The PYRIN domain: A member of the death domain-fold superfamily

    PubMed Central

    Fairbrother, Wayne J.; Gordon, Nathaniel C.; Humke, Eric W.; O'Rourke, Karen M.; Starovasnik, Melissa A.; Yin, Jian-Ping; Dixit, Vishva M.

    2001-01-01

    PYRIN domains were identified recently as putative protein–protein interaction domains at the N-termini of several proteins thought to function in apoptotic and inflammatory signaling pathways. The ∼95 residue PYRIN domains have no statistically significant sequence homology to proteins with known three-dimensional structure. Using secondary structure prediction and potential-based fold recognition methods, however, the PYRIN domain is predicted to be a member of the six-helix bundle death domain-fold superfamily that includes death domains (DDs), death effector domains (DEDs), and caspase recruitment domains (CARDs). Members of the death domain-fold superfamily are well established mediators of protein–protein interactions found in many proteins involved in apoptosis and inflammation, indicating further that the PYRIN domains serve a similar function. An homology model of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1, a member of the Apaf-1/Ced-4 family of proteins, was constructed using the three-dimensional structures of the FADD and p75 neurotrophin receptor DDs, and of the Apaf-1 and caspase-9 CARDs, as templates. Validation of the model using a variety of computational techniques indicates that the fold prediction is consistent with the sequence. Comparison of a circular dichroism spectrum of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1 with spectra of several proteins known to adopt the death domain-fold provides experimental support for the structure prediction. PMID:11514682

  15. Crystal structure of Src-like adaptor protein 2 reveals close association of SH3 and SH2 domains through β-sheet formation.

    PubMed

    Wybenga-Groot, Leanne E; McGlade, C Jane

    2013-12-01

    The Src-like adaptor proteins (SLAP/SLAP2) are key components of Cbl-dependent downregulation of antigen receptor, cytokine receptor, and receptor tyrosine kinase signaling in hematopoietic cells. SLAP and SLAP2 consist of adjacent SH3 and SH2 domains that are most similar in sequence to Src family kinases (SFKs). Notably, the SH3-SH2 connector sequence is significantly shorter in SLAP/SLAP2 than in SFKs. To understand the structural implication of a short SH3-SH2 connector sequence, we solved the crystal structure of a protein encompassing the SH3 domain, SH3-SH2 connector, and SH2 domain of SLAP2 (SLAP2-32). While both domains adopt typical folds, the short SH3-SH2 connector places them in close association. Strand βe of the SH3 domain interacts with strand βA of the SH2 domain, resulting in the formation of a continuous β sheet that spans the length of the protein. Disruption of the SH3/SH2 interface through mutagenesis decreases SLAP-32 stability in vitro, consistent with inter-domain binding being an important component of SLAP2 structure and function. The canonical peptide binding pockets of the SH3 and SH2 domains are fully accessible, in contrast to other protein structures that display direct interaction between SH3 and SH2 domains, in which either peptide binding surface is obstructed by the interaction. Our results reveal potential sites of novel interaction for SH3 and SH2 domains, and illustrate the adaptability of SH2 and SH3 domains in mediating interactions. As well, our results suggest that the SH3 and SH2 domains of SLAP2 function interdependently, with implications on their mode of substrate binding. © 2013.

  16. Molecular structures and functional relationships in clostridial neurotoxins.

    PubMed

    Swaminathan, Subramanyam

    2011-12-01

    The seven serotypes of Clostridium botulinum neurotoxins (A-G) are the deadliest poison known to humans. They share significant sequence homology and hence possess similar structure-function relationships. Botulinum neurotoxins (BoNT) act via a four-step mechanism, viz., binding and internalization to neuronal cells, translocation of the catalytic domain into the cytosol and finally cleavage of one of the three soluble N-ethylmaleimide-sensitive factor attachment protein receptors (SNARE) causing blockage of neurotransmitter release leading to flaccid paralysis. Crystal structures of three holotoxins, BoNT/A, B and E, are available to date. Although the individual domains are remarkably similar, their domain organization is different. These structures have helped in correlating the structural and functional domains. This has led to the determination of structures of individual domains and combinations of them. Crystal structures of catalytic domains of all serotypes and several binding domains are now available. The catalytic domains are zinc endopeptidases and share significant sequence and structural homology. The active site architecture and the catalytic mechanism are similar although the binding mode of individual substrates may be different, dictating substrate specificity and peptide cleavage selectivity. Crystal structures of catalytic domains with substrate peptides provide clues to specificity and selectivity unique to BoNTs. Crystal structures of the receptor domain in complex with ganglioside or the protein receptor have provided information about the binding of botulinum neurotoxin to the neuronal cell. An overview of the structure-function relationship correlating the 3D structures with biochemical and biophysical data and how they can be used for structure-based drug discovery is presented here. Journal compilation © 2011 FEBS. No claim to original US government works.

  17. Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution

    PubMed Central

    Wolf, Maxim Y; Wolf, Yuri I; Koonin, Eugene V

    2008-01-01

    Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude. Conclusion Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution. Reviewers This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section. PMID:18840284

  18. Structural and sequencing analysis of local target DNA recognition by MLV integrase.

    PubMed

    Aiyer, Sriram; Rossi, Paolo; Malani, Nirav; Schneider, William M; Chandar, Ashwin; Bushman, Frederic D; Montelione, Gaetano T; Roth, Monica J

    2015-06-23

    Target-site selection by retroviral integrase (IN) proteins profoundly affects viral pathogenesis. We describe the solution nuclear magnetic resonance structure of the Moloney murine leukemia virus IN (M-MLV) C-terminal domain (CTD) and a structural homology model of the catalytic core domain (CCD). In solution, the isolated MLV IN CTD adopts an SH3 domain fold flanked by a C-terminal unstructured tail. We generated a concordant MLV IN CCD structural model using SWISS-MODEL, MMM-tree and I-TASSER. Using the X-ray crystal structure of the prototype foamy virus IN target capture complex together with our MLV domain structures, residues within the CCD α2 helical region and the CTD β1-β2 loop were predicted to bind target DNA. The role of these residues was analyzed in vivo through point mutants and motif interchanges. Viable viruses with substitutions at the IN CCD α2 helical region and the CTD β1-β2 loop were tested for effects on integration target site selection. Next-generation sequencing and analysis of integration target sequences indicate that the CCD α2 helical region, in particular P187, interacts with the sequences distal to the scissile bonds whereas the CTD β1-β2 loop binds to residues proximal to it. These findings validate our structural model and disclose IN-DNA interactions relevant to target site selection. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Structured prediction models for RNN based sequence labeling in clinical text.

    PubMed

    Jagannatha, Abhyuday N; Yu, Hong

    2016-11-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.

  20. Structured prediction models for RNN based sequence labeling in clinical text

    PubMed Central

    Jagannatha, Abhyuday N; Yu, Hong

    2016-01-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040

  1. Using Molecular Dynamics Simulations as an Aid in the Prediction of Domain Swapping of Computationally Designed Protein Variants.

    PubMed

    Mou, Yun; Huang, Po-Ssu; Thomas, Leonard M; Mayo, Stephen L

    2015-08-14

    In standard implementations of computational protein design, a positive-design approach is used to predict sequences that will be stable on a given backbone structure. Possible competing states are typically not considered, primarily because appropriate structural models are not available. One potential competing state, the domain-swapped dimer, is especially compelling because it is often nearly identical with its monomeric counterpart, differing by just a few mutations in a hinge region. Molecular dynamics (MD) simulations provide a computational method to sample different conformational states of a structure. Here, we tested whether MD simulations could be used as a post-design screening tool to identify sequence mutations leading to domain-swapped dimers. We hypothesized that a successful computationally designed sequence would have backbone structure and dynamics characteristics similar to that of the input structure and that, in contrast, domain-swapped dimers would exhibit increased backbone flexibility and/or altered structure in the hinge-loop region to accommodate the large conformational change required for domain swapping. While attempting to engineer a homodimer from a 51-amino-acid fragment of the monomeric protein engrailed homeodomain (ENH), we had instead generated a domain-swapped dimer (ENH_DsD). MD simulations on these proteins showed increased B-factors derived from MD simulation in the hinge loop of the ENH_DsD domain-swapped dimer relative to monomeric ENH. Two point mutants of ENH_DsD designed to recover the monomeric fold were then tested with an MD simulation protocol. The MD simulations suggested that one of these mutants would adopt the target monomeric structure, which was subsequently confirmed by X-ray crystallography. Copyright © 2015. Published by Elsevier Ltd.

  2. Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)

    PubMed Central

    Singh, Ranjan K.; Tanner, John J.

    2013-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  3. Temperature inducible β-sheet structure in the transactivation domains of retroviral regulatory proteins of the Rev family

    NASA Astrophysics Data System (ADS)

    Thumb, Werner; Graf, Christine; Parslow, Tristram; Schneider, Rainer; Auer, Manfred

    1999-11-01

    The interaction of the human immunodeficiency virus type 1 (HIV-1) regulatory protein Rev with cellular cofactors is crucial for the viral life cycle. The HIV-1 Rev transactivation domain is functionally interchangeable with analog regions of Rev proteins of other retroviruses suggesting common folding patterns. In order to obtain experimental evidence for similar structural features mediating protein-protein contacts we investigated activation domain peptides from HIV-1, HIV-2, VISNA virus, feline immunodeficiency virus (FIV) and equine infectious anemia virus (EIAV) by CD spectroscopy, secondary structure prediction and sequence analysis. Although different in polarity and hydrophobicity, all peptides showed a similar behavior with respect to solution conformation, concentration dependence and variations in ionic strength and pH. Temperature studies revealed an unusual induction of β-structure with rising temperatures in all activation domain peptides. The high stability of β-structure in this region was demonstrated in three different peptides of the activation domain of HIV-1 Rev in solutions containing 40% hexafluoropropanol, a reagent usually known to induce α-helix into amino acid sequences. Sequence alignments revealed similarities between the polar effector domains from FIV and EIAV and the leucine rich (hydrophobic) effector domains found in HIV-1, HIV-2 and VISNA. Studies on activation domain peptides of two dominant negative HIV-1 Rev mutants, M10 and M32, pointed towards different reasons for the biological behavior. Whereas the peptide containing the M10 mutation (L 78E 79→D 78L 79) showed wild-type structure, the M32 mutant peptide (L 78L 81L 83→A 78A 81A 83) revealed a different protein fold to be the reason for the disturbed binding to cellular cofactors. From our data, we conclude, that the activation domain of Rev proteins from different viral origins adopt a similar fold and that a β-structural element is involved in binding to a cellular cofactor.

  4. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

    PubMed

    Xu, Qifang; Dunbrack, Roland L

    2012-11-01

    Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.

  5. Atomic resolution structure of cucurmosin, a novel type 1 ribosome-inactivating protein from the sarcocarp of Cucurbita moschata

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hou, Xiaomin; Meehan, Edward J.; Xie, Jieming

    2008-10-27

    A novel type 1 ribosome-inactivating protein (RIP) designated cucurmosin was isolated from the sarcocarp of Cucurbita moschata (pumpkin). Besides rRNA N-glycosidase activity, cucurmosin exhibits strong cytotoxicities to three cancer cell lines of both human and murine origins, but low toxicity to normal cells. Plant genomic DNA extracted from the tender leaves was amplified by PCR between primers based on the N-terminal sequence and X-ray sequence of the C-terminal. The complete mature protein sequence was obtained from N-terminal protein sequencing and partial DNA sequencing, confirmed by high resolution crystal structure analysis. The crystal structure of cucurmosin has been determined at 1.04more » {angstrom}, a resolution that has never been achieved before for any RIP. The structure contains two domains: a large N-terminal domain composed of seven {alpha}-helices and eight {beta}-strands, and a smaller C-terminal domain consisting of three {alpha}-helices and two {beta}-strands. The high resolution structure established a glycosylation pattern of GlcNAc{sub 2}Man3Xyl. Asn225 was identified as a glycosylation site. Residues Tyr70, Tyr109, Glu158 and Arg161 define the active site of cucurmosin as an RNA N-glycosidase. The structural basis of cytotoxicity difference between cucurmosin and trichosanthin is discussed.« less

  6. Using structural knowledge in the protein data bank to inform the search for potential host-microbe protein interactions in sequence space: application to Mycobacterium tuberculosis.

    PubMed

    Mahajan, Gaurang; Mande, Shekhar C

    2017-04-04

    A comprehensive map of the human-M. tuberculosis (MTB) protein interactome would help fill the gaps in our understanding of the disease, and computational prediction can aid and complement experimental studies towards this end. Several sequence-based in silico approaches tap the existing data on experimentally validated protein-protein interactions (PPIs); these PPIs serve as templates from which novel interactions between pathogen and host are inferred. Such comparative approaches typically make use of local sequence alignment, which, in the absence of structural details about the interfaces mediating the template interactions, could lead to incorrect inferences, particularly when multi-domain proteins are involved. We propose leveraging the domain-domain interaction (DDI) information in PDB complexes to score and prioritize candidate PPIs between host and pathogen proteomes based on targeted sequence-level comparisons. Our method picks out a small set of human-MTB protein pairs as candidates for physical interactions, and the use of functional meta-data suggests that some of them could contribute to the in vivo molecular cross-talk between pathogen and host that regulates the course of the infection. Further, we present numerical data for Pfam domain families that highlights interaction specificity on the domain level. Not every instance of a pair of domains, for which interaction evidence has been found in a few instances (i.e. structures), is likely to functionally interact. Our sorting approach scores candidates according to how "distant" they are in sequence space from known examples of DDIs (templates). Thus, it provides a natural way to deal with the heterogeneity in domain-level interactions. Our method represents a more informed application of local alignment to the sequence-based search for potential human-microbial interactions that uses available PPI data as a prior. Our approach is somewhat limited in its sensitivity by the restricted size and diversity of the template dataset, but, given the rapid accumulation of solved protein complex structures, its scope and utility are expected to keep steadily improving.

  7. Phylogenetic profiles reveal structural/functional determinants of TRPC3 signal-sensing antennae

    PubMed Central

    Ko, Kyung Dae; Bhardwaj, Gaurav; Hong, Yoojin; Chang, Gue Su; Kiselyov, Kirill

    2009-01-01

    Biochemical assessment of channel structure/function is incredibly challenging. Developing computational tools that provide these data would enable translational research, accelerating mechanistic experimentation for the bench scientist studying ion channels. Starting with the premise that protein sequence encodes information about structure, function and evolution (SF&E), we developed a unified framework for inferring SF&E from sequence information using a knowledge-based approach. The Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) provides phylogenetic profiles that can model, ab initio, SF&E relationships of biological sequences at the whole protein, single domain and single-amino acid level.1,2 In our recent paper,4 we have applied GDDA-BLAST analysis to study canonical TRP (TRPC) channels1 and empirically validated predicted lipid-binding and trafficking activities contained within the TRPC3 TRP_2 domain of unknown function. Overall, our in silico, in vitro, and in vivo experiments support a model in which TRPC3 has signal-sensing antennae which are adorned with lipid-binding, trafficking and calmodulin regulatory domains. In this Addendum, we correlate our functional domain analysis with the cryo-EM structure of TRPC3.3 In addition, we synthesize recent studies with our new findings to provide a refined model on the mechanism(s) of TRPC3 activation/deactivation. PMID:19704910

  8. Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily.

    PubMed

    Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju

    2015-01-01

    Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.

  9. Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria

    PubMed Central

    Chouhan, Bhanupratap; Denesyuk, Alexander; Heino, Jyrki; Johnson, Mark S.; Denessiouk, Konstantin

    2011-01-01

    Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found. PMID:22022374

  10. The structure of cell adhesion molecule uvomorulin. Insights into the molecular mechanism of Ca2+-dependent cell adhesion.

    PubMed Central

    Ringwald, M; Schuh, R; Vestweber, D; Eistetter, H; Lottspeich, F; Engel, J; Dölz, R; Jähnig, F; Epplen, J; Mayer, S

    1987-01-01

    We have determined the amino acid sequence of the Ca2+-dependent cell adhesion molecule uvomorulin as it appears on the cell surface. The extracellular part of the molecule exhibits three internally repeated domains of 112 residues which are most likely generated by gene duplication. Each of the repeated domains contains two highly conserved units which could represent putative Ca2+-binding sites. Secondary structure predictions suggest that the putative Ca2+-binding units are located in external loops at the surface of the protein. The protein sequence exhibits a single membrane-spanning region and a cytoplasmic domain. Sequence comparison reveals extensive homology to the chicken L-CAM. Both uvomorulin and L-CAM are identical in 65% of their entire amino acid sequence suggesting a common origin for both CAMs. Images Fig. 1. Fig. 4. Fig. 7. PMID:3501370

  11. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation

    PubMed Central

    Casadio, Rita

    2017-01-01

    Abstract BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. PMID:28453653

  12. Ermelin, an endoplasmic reticulum transmembrane protein, contains the novel HELP domain conserved in eukaryotes.

    PubMed

    Suzuki, Akiko; Endo, Takeshi

    2002-02-06

    We have cloned a cDNA encoding a novel protein referred to as ermelin from mouse C2 skeletal muscle cells. This protein contained six hydrophobic amino acid stretches corresponding to transmembrane domains, two histidine-rich sequences, and a sequence homologous to the fusion peptides of certain fusion proteins. Ermelin also contained a novel modular sequence, designated as HELP domain, which was highly conserved among eukaryotes, from yeast to higher plants and animals. All these HELP domain-containing proteins, including mouse KE4, Drosophila Catsup, and Arabidopsis IAR1, possessed multipass transmembrane domains and histidine-rich sequences. Ermelin was predominantly expressed in brain and testis, and induced during neuronal differentiation of N1E-115 neuroblastoma cells but downregulated during myogenic differentiation of C2 cells. The mRNA was accumulated in hippocampus and cerebellum of brain and central areas of seminiferous tubules in testis. Epitope-tagging experiments located ermelin and KE4 to a network structure throughout the cytoplasm. Staining with the fluorescent dye DiOC(6)(3) identified this structure as the endoplasmic reticulum. These results suggest that at least some, if not all, of the HELP domain-containing proteins are multipass endoplasmic reticulum membrane proteins with functions conserved among eukaryotes.

  13. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

    PubMed Central

    2012-01-01

    Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672

  14. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.

    PubMed

    Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl

    2012-07-13

    Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.

  15. Mutation of domain III and domain VI in L gene conserved domain of Nipah virus

    NASA Astrophysics Data System (ADS)

    Jalani, Siti Aishah; Ibrahim, Nazlina

    2016-11-01

    Nipah virus (NiV) is the etiologic agent responsible for the respiratory illness and causes fatal encephalitis in human. NiV L protein subunit is thought to be responsible for the majority of enzymatic activities involved in viral transcription and replication. The L protein which is the viral RNA dependent RNA polymerase has high sequence homology among negative sense RNA viruses. In negative stranded RNA viruses, based on sequence alignment six conserved domain (domain I-IV) have been determined. Each domain is separated on variable regions that suggest the structure to consist concatenated functional domain. To directly address the roles of domains III and VI, site-directed mutations were constructed by the substitution of bases at sequences 2497, 2500, 5528 and 5532. Each mutated L gene can be used in future studies to test the ability for expression on in vitro translation.

  16. Use of designed sequences in protein structure recognition.

    PubMed

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  17. Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

    PubMed Central

    Krishnan, Neeraja M.

    2017-01-01

    Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357

  18. The crystal structure of a bacterial Sufu-like protein defines a novel group of bacterial proteins that are similar to the N-terminal domain of human Sufu

    PubMed Central

    Das, Debanu; Finn, Robert D; Abdubek, Polat; Astakhova, Tamara; Axelrod, Herbert L; Bakolitsa, Constantina; Cai, Xiaohui; Carlton, Dennis; Chen, Connie; Chiu, Hsiu-Ju; Chiu, Michelle; Clayton, Thomas; Deller, Marc C; Duan, Lian; Ellrott, Kyle; Farr, Carol L; Feuerhelm, Julie; Grant, Joanna C; Grzechnik, Anna; Han, Gye Won; Jaroszewski, Lukasz; Jin, Kevin K; Klock, Heath E; Knuth, Mark W; Kozbial, Piotr; Sri Krishna, S; Kumar, Abhinav; Lam, Winnie W; Marciano, David; Miller, Mitchell D; Morse, Andrew T; Nigoghossian, Edward; Nopakun, Amanda; Okach, Linda; Puckett, Christina; Reyes, Ron; Tien, Henry J; Trame, Christine B; van den Bedem, Henry; Weekes, Dana; Wooten, Tiffany; Xu, Qingping; Yeh, Andrew; Zhou, Jiadong; Hodgson, Keith O; Wooley, John; Elsliger, Marc-André; Deacon, Ashley M; Godzik, Adam; Lesley, Scott A; Wilson, Ian A

    2010-01-01

    Sufu (Suppressor of Fused), a two-domain protein, plays a critical role in regulating Hedgehog signaling and is conserved from flies to humans. A few bacterial Sufu-like proteins have previously been identified based on sequence similarity to the N-terminal domain of eukaryotic Sufu proteins, but none have been structurally or biochemically characterized and their function in bacteria is unknown. We have determined the crystal structure of a more distantly related Sufu-like homolog, NGO1391 from Neisseria gonorrhoeae, at 1.4 Å resolution, which provides the first biophysical characterization of a bacterial Sufu-like protein. The structure revealed a striking similarity to the N-terminal domain of human Sufu (r.m.s.d. of 2.6 Å over 93% of the NGO1391 protein), despite an extremely low sequence identity of ∼15%. Subsequent sequence analysis revealed that NGO1391 defines a new subset of smaller, Sufu-like proteins that are present in ∼200 bacterial species and has resulted in expansion of the SUFU (PF05076) family in Pfam. PMID:20836087

  19. Molecular modelling of the Norrie disease protein predicts a cystine knot growth factor tertiary structure.

    PubMed

    Meitinger, T; Meindl, A; Bork, P; Rost, B; Sander, C; Haasemann, M; Murken, J

    1993-12-01

    The X-lined gene for Norrie disease, which is characterized by blindness, deafness and mental retardation has been cloned recently. This gene has been thought to code for a putative extracellular factor; its predicted amino acid sequence is homologous to the C-terminal domain of diverse extracellular proteins. Sequence pattern searches and three-dimensional modelling now suggest that the Norrie disease protein (NDP) has a tertiary structure similar to that of transforming growth factor beta (TGF beta). Our model identifies NDP as a member of an emerging family of growth factors containing a cystine knot motif, with direct implications for the physiological role of NDP. The model also sheds light on sequence related domains such as the C-terminal domain of mucins and of von Willebrand factor.

  20. Identification of two allelic IgG1 C(H) coding regions (Cgamma1) of cat.

    PubMed

    Kanai, T H; Ueda, S; Nakamura, T

    2000-01-31

    Two types of cDNA encoding IgG1 heavy chain (gamma1) were isolated from a single domestic short-hair cat. Sequence analysis indicated a higher level of similarity of these Cgamma1 sequences to human Cgamma1 sequence (76.9 and 77.0%) than to mouse sequence (70.0 and 69.7%) at the nucleotide level. Predicted primary structures of both the feline Cgamma1 genes, designated as Cgamma1a and Cgamma1b, were similar to that of human Cgamma1 gene, for instance, as to the size of constant domains, the presence of six conserved cysteine residues involved in formation of the domain structure, and the location of a conserved N-linked glycosylation site. Sequence comparison between the two alleles showed that 7 out of 10 nucleotide differences were within the C(H)3 domain coding region, all leading to nonsynonymous changes in amino acid residues. Partial sequence analysis of genomic clones showed three nucleotide substitutions between the two Cgamma1 alleles in the intron between the CH2 and C(H)3 domain coding regions. In 12 domestic short-hair cats used in this study, the frequency of Cgamma1a allele (62.5%) was higher than that of the Cgamma1b allele (37.5%).

  1. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB

    PubMed Central

    Dunbrack, Roland L.

    2012-01-01

    Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020

  2. From Structure to Function: A Comprehensive Compendium of Tools to Unveil Protein Domains and Understand Their Role in Cytokinesis.

    PubMed

    Rincon, Sergio A; Paoletti, Anne

    2016-01-01

    Unveiling the function of a novel protein is a challenging task that requires careful experimental design. Yeast cytokinesis is a conserved process that involves modular structural and regulatory proteins. For such proteins, an important step is to identify their domains and structural organization. Here we briefly discuss a collection of methods commonly used for sequence alignment and prediction of protein structure that represent powerful tools for the identification homologous domains and design of structure-function approaches to test experimentally the function of multi-domain proteins such as those implicated in yeast cytokinesis.

  3. Isolation and in silico analysis of a novel H+-pyrophosphatase gene orthologue from the halophytic grass Leptochloa fusca

    NASA Astrophysics Data System (ADS)

    Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid

    2017-02-01

    Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.

  4. Using Common Spatial Distributions of Atoms to Relate Functionally Divergent Influenza Virus N10 and N11 Protein Structures to Functionally Characterized Neuraminidase Structures, Toxin Cell Entry Domains, and Non-Influenza Virus Cell Entry Domains

    PubMed Central

    Weininger, Arthur; Weininger, Susan

    2015-01-01

    The ability to identify the functional correlates of structural and sequence variation in proteins is a critical capability. We related structures of influenza A N10 and N11 proteins that have no established function to structures of proteins with known function by identifying spatially conserved atoms. We identified atoms with common distributed spatial occupancy in PDB structures of N10 protein, N11 protein, an influenza A neuraminidase, an influenza B neuraminidase, and a bacterial neuraminidase. By superposing these spatially conserved atoms, we aligned the structures and associated molecules. We report spatially and sequence invariant residues in the aligned structures. Spatially invariant residues in the N6 and influenza B neuraminidase active sites were found in previously unidentified spatially equivalent sites in the N10 and N11 proteins. We found the corresponding secondary and tertiary structures of the aligned proteins to be largely identical despite significant sequence divergence. We found structural precedent in known non-neuraminidase structures for residues exhibiting structural and sequence divergence in the aligned structures. In N10 protein, we identified staphylococcal enterotoxin I-like domains. In N11 protein, we identified hepatitis E E2S-like domains, SARS spike protein-like domains, and toxin components shared by alpha-bungarotoxin, staphylococcal enterotoxin I, anthrax lethal factor, clostridium botulinum neurotoxin, and clostridium tetanus toxin. The presence of active site components common to the N6, influenza B, and S. pneumoniae neuraminidases in the N10 and N11 proteins, combined with the absence of apparent neuraminidase function, suggests that the role of neuraminidases in H17N10 and H18N11 emerging influenza A viruses may have changed. The presentation of E2S-like, SARS spike protein-like, or toxin-like domains by the N10 and N11 proteins in these emerging viruses may indicate that H17N10 and H18N11 sialidase-facilitated cell entry has been supplemented or replaced by sialidase-independent receptor binding to an expanded cell population that may include neurons and T-cells. PMID:25706124

  5. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    PubMed Central

    2010-01-01

    Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840

  6. A de novo redesign of the WW domain

    PubMed Central

    Kraemer-Pecore, Christina M.; Lecomte, Juliette T.J.; Desjarlais, John R.

    2003-01-01

    We have used a sequence prediction algorithm and a novel sampling method to design protein sequences for the WW domain, a small β-sheet motif. The procedure, referred to as SPANS, designs sequences to be compatible with an ensemble of closely related polypeptide backbones, mimicking the inherent flexibility of proteins. Two designed sequences (termed SPANS-WW1 and SPANS-WW2), using only naturally occurring l-amino acids, were selected for study and the corresponding polypeptides were prepared in Escherichia coli. Circular dichroism data suggested that both purified polypeptides adopted secondary structure features related to those of the target without the aid of disulfide bridges or bound cofactors. The structure exhibited by SPANS-WW2 melted cooperatively by raising the temperature of the solution. Further analysis of this polypeptide by proton nuclear magnetic resonance spectroscopy demonstrated that at 5°C, it folds into a structure closely resembling a natural WW domain. This achievement constitutes one of a small number of successful de novo protein designs through fully automated computational methods and highlights the feasibility of including backbone flexibility in the design strategy. PMID:14500877

  7. A de novo redesign of the WW domain.

    PubMed

    Kraemer-Pecore, Christina M; Lecomte, Juliette T J; Desjarlais, John R

    2003-10-01

    We have used a sequence prediction algorithm and a novel sampling method to design protein sequences for the WW domain, a small beta-sheet motif. The procedure, referred to as SPANS, designs sequences to be compatible with an ensemble of closely related polypeptide backbones, mimicking the inherent flexibility of proteins. Two designed sequences (termed SPANS-WW1 and SPANS-WW2), using only naturally occurring L-amino acids, were selected for study and the corresponding polypeptides were prepared in Escherichia coli. Circular dichroism data suggested that both purified polypeptides adopted secondary structure features related to those of the target without the aid of disulfide bridges or bound cofactors. The structure exhibited by SPANS-WW2 melted cooperatively by raising the temperature of the solution. Further analysis of this polypeptide by proton nuclear magnetic resonance spectroscopy demonstrated that at 5 degrees C, it folds into a structure closely resembling a natural WW domain. This achievement constitutes one of a small number of successful de novo protein designs through fully automated computational methods and highlights the feasibility of including backbone flexibility in the design strategy.

  8. Structure, replication efficiency and fragility of yeast ARS elements.

    PubMed

    Dhar, Manoj K; Sehgal, Shelly; Kaul, Sanjana

    2012-05-01

    DNA replication in eukaryotes initiates at specific sites known as origins of replication, or replicators. These replication origins occur throughout the genome, though the propensity of their occurrence depends on the type of organism. In eukaryotes, zones of initiation of replication spanning from about 100 to 50,000 base pairs have been reported. The characteristics of eukaryotic replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where some autonomously replicating sequences, or ARS elements, confer origin activity. ARS elements are short DNA sequences of a few hundred base pairs, identified by their efficiency at initiating a replication event when cloned in a plasmid. ARS elements, although structurally diverse, maintain a basic structure composed of three domains, A, B and C. Domain A is comprised of a consensus sequence designated ACS (ARS consensus sequence), while the B domain has the DNA unwinding element and the C domain is important for DNA-protein interactions. Although there are ∼400 ARS elements in the yeast genome, not all of them are active origins of replication. Different groups within the genus Saccharomyces have ARS elements as components of replication origin. The present paper provides a comprehensive review of various aspects of ARSs, starting from their structural conservation to sequence thermodynamics. All significant and conserved functional sequence motifs within different types of ARS elements have been extensively described. Issues like silencing at ARSs, their inherent fragility and factors governing their replication efficiency have also been addressed. Progress in understanding crucial components associated with the replication machinery and timing at these ARS elements is discussed in the section entitled "The replicon revisited". Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  9. A comparative analysis of the foamy and ortho virus capsid structures reveals an ancient domain duplication.

    PubMed

    Taylor, William R; Stoye, Jonathan P; Taylor, Ian A

    2017-04-04

    The Spumaretrovirinae (foamy viruses) and the Orthoretrovirinae (e.g. HIV) share many similarities both in genome structure and the sequences of the core viral encoded proteins, such as the aspartyl protease and reverse transcriptase. Similarity in the gag region of the genome is less obvious at the sequence level but has been illuminated by the recent solution of the foamy virus capsid (CA) structure. This revealed a clear structural similarity to the orthoretrovirus capsids but with marked differences that left uncertainty in the relationship between the two domains that comprise the structure. We have applied protein structure comparison methods in order to try and resolve this ambiguous relationship. These included both the DALI method and the SAP method, with rigorous statistical tests applied to the results of both methods. For this, we employed collections of artificial fold 'decoys' (generated from the pair of native structures being compared) to provide a customised background distribution for each comparison, thus allowing significance levels to be estimated. We have shown that the relationship of the two domains conforms to a simple linear correspondence rather than a domain transposition. These similarities suggest that the origin of both viral capsids was a common ancestor with a double domain structure. In addition, we show that there is also a significant structural similarity between the amino and carboxy domains in both the foamy and ortho viruses. These results indicate that, as well as the duplication of the double domain capsid, there may have been an even more ancient gene-duplication that preceded the double domain structure. In addition, our structure comparison methodology demonstrates a general approach to problems where the components have a high intrinsic level of similarity.

  10. Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity

    PubMed Central

    Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N

    2009-01-01

    Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148

  11. Crystal Structure of the HEAT Domain from the Pre-mRNA Processing Factor Symplekin

    PubMed Central

    Kennedy, Sarah A.; Frazier, Monica L.; Steiniger, Mindy; Mast, Ann M.; Marzluff, William F.; Redinbo, Matthew R.

    2009-01-01

    The majority of eukaryotic pre-mRNAs are processed by 3′-end cleavage and polyadenylation, although in metazoa the replication-dependant histone mRNAs are processed by 3′-end cleavage but not polyadenylation. The macromolecular complex responsible for processing both canonical and histone pre-mRNAs contains the ~1,160-residue protein Symplekin. Secondary structural prediction algorithms identified putative HEAT domains in the 300 N-terminal residues of all Symplekins of known sequence. The structure and dynamics of this domain were investigated to begin elucidating the role Symplekin plays in mRNA maturation. The crystal structure of the Drosophila melanogaster Symplekin HEAT domain was determined to 2.4 Å resolution using SAD phasing methods. The structure exhibits 5 canonical HEAT repeats along with an extended 31 amino acid loop (loop 8) between the fourth and fifth repeat that is conserved within closely related Symplekin sequences. Molecular dynamics simulations of this domain show that the presence of loop 8 dampens correlated and anticorrelated motion in the HEAT domain, therefore providing a neutral surface for potential protein-protein interactions. HEAT domains are often employed for such macromolecular contacts. The Symplekin HEAT region not only structurally aligns with several established scaffolding proteins, but also has been reported to contact proteins essential for regulating 3′-end processing. Taken together, these data support the conclusion that the Symplekin HEAT domain serves as a scaffold for protein-protein interactions essential to the mRNA maturation process. PMID:19576221

  12. The Caenorhabditis elegans gene unc-89, required fpr muscle M-line assembly, encodes a giant modular protein composed of Ig and signal transduction domains

    PubMed Central

    1996-01-01

    Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916

  13. Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

    PubMed Central

    Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V

    2007-01-01

    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321

  14. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

    PubMed

    Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita

    2017-07-03

    BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Crystal Structure of the Nipah Virus Phosphoprotein Tetramerization Domain

    PubMed Central

    Bruhn, Jessica F.; Barnett, Katherine C.; Bibby, Jaclyn; Thomas, Jens M. H.; Keegan, Ronan M.; Rigden, Daniel J.; Bornholdt, Zachary A.

    2014-01-01

    The Nipah virus phosphoprotein (P) is multimeric and tethers the viral polymerase to the nucleocapsid. We present the crystal structure of the multimerization domain of Nipah virus P: a long, parallel, tetrameric, coiled coil with a small, α-helical cap structure. Across the paramyxoviruses, these domains share little sequence identity yet are similar in length and structural organization, suggesting a common requirement for scaffolding or spatial organization of the functions of P in the virus life cycle. PMID:24155387

  16. Development of Specific Inhibitors for Breast Cancer-Associated Variants of ErbB2

    DTIC Science & Technology

    2015-10-01

    Produce ErbB2 structures for drug-lead identification Months 1-12 Milestone #2: Production of computationally-derived pdb files of the structures of...crystallographic structures of the kinase domain of ErbB2 and its close relative EGFR (ErbB1). The kinase domains of ErbB2 and EGFR are highly...homologous as indicated by a sequence identity of ~ 78%. There are two currently available crystallographic structures of the ErbB2 kinase domain. One is

  17. Phylogenomics and sequence-structure-function relationships in the GmrSD family of Type IV restriction enzymes.

    PubMed

    Machnicka, Magdalena A; Kaminska, Katarzyna H; Dunin-Horkawicz, Stanislaw; Bujnicki, Janusz M

    2015-10-23

    GmrSD is a modification-dependent restriction endonuclease that specifically targets and cleaves glucosylated hydroxymethylcytosine (glc-HMC) modified DNA. It is encoded either as two separate single-domain GmrS and GmrD proteins or as a single protein carrying both domains. Previous studies suggested that GmrS acts as endonuclease and NTPase whereas GmrD binds DNA. In this work we applied homology detection, sequence conservation analysis, fold recognition and homology modeling methods to study sequence-structure-function relationships in the GmrSD restriction endonucleases family. We also analyzed the phylogeny and genomic context of the family members. Results of our comparative genomics study show that GmrS exhibits similarity to proteins from the ParB/Srx fold which can have both NTPase and nuclease activity. In contrast to the previous studies though, we attribute the nuclease activity also to GmrD as we found it to contain the HNH endonuclease motif. We revealed residues potentially important for structure and function in both domains. Moreover, we found that GmrSD systems exist predominantly as a fused, double-domain form rather than as a heterodimer and that their homologs are often encoded in regions enriched in defense and gene mobility-related elements. Finally, phylogenetic reconstructions of GmrS and GmrD domains revealed that they coevolved and only few GmrSD systems appear to be assembled from distantly related GmrS and GmrD components. Our study provides insight into sequence-structure-function relationships in the yet poorly characterized family of Type IV restriction enzymes. Comparative genomics allowed to propose possible role of GmrD domain in the function of the GmrSD enzyme and possible active sites of both GmrS and GmrD domains. Presented results can guide further experimental characterization of these enzymes.

  18. Cellulase Linkers Are Optimized Based on Domain Type and Function: Insights from Sequence Analysis, Biophysical Measurements, and Molecular Simulation

    PubMed Central

    Sammond, Deanne W.; Payne, Christina M.; Brunecky, Roman; Himmel, Michael E.; Crowley, Michael F.; Beckham, Gregg T.

    2012-01-01

    Cellulase enzymes deconstruct cellulose to glucose, and are often comprised of glycosylated linkers connecting glycoside hydrolases (GHs) to carbohydrate-binding modules (CBMs). Although linker modifications can alter cellulase activity, the functional role of linkers beyond domain connectivity remains unknown. Here we investigate cellulase linkers connecting GH Family 6 or 7 catalytic domains to Family 1 or 2 CBMs, from both bacterial and eukaryotic cellulases to identify conserved characteristics potentially related to function. Sequence analysis suggests that the linker lengths between structured domains are optimized based on the GH domain and CBM type, such that linker length may be important for activity. Longer linkers are observed in eukaryotic GH Family 6 cellulases compared to GH Family 7 cellulases. Bacterial GH Family 6 cellulases are found with structured domains in either N to C terminal order, and similar linker lengths suggest there is no effect of domain order on length. O-glycosylation is uniformly distributed across linkers, suggesting that glycans are required along entire linker lengths for proteolysis protection and, as suggested by simulation, for extension. Sequence comparisons show that proline content for bacterial linkers is more than double that observed in eukaryotic linkers, but with fewer putative O-glycan sites, suggesting alternative methods for extension. Conversely, near linker termini where linkers connect to structured domains, O-glycosylation sites are observed less frequently, whereas glycines are more prevalent, suggesting the need for flexibility to achieve proper domain orientations. Putative N-glycosylation sites are quite rare in cellulase linkers, while an N-P motif, which strongly disfavors the attachment of N-glycans, is commonly observed. These results suggest that linkers exhibit features that are likely tailored for optimal function, despite possessing low sequence identity. This study suggests that cellulase linkers may exhibit function in enzyme action, and highlights the need for additional studies to elucidate cellulase linker functions. PMID:23139804

  19. Domain fusion analysis by applying relational algebra to protein sequence and domain databases

    PubMed Central

    Truong, Kevin; Ikura, Mitsuhiko

    2003-01-01

    Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020

  20. Structural changes in the BH3 domain of SOUL protein upon interaction with the anti-apoptotic protein Bcl-xL

    PubMed Central

    Ambrosi, Emmanuele; Capaldi, Stefano; Bovi, Michele; Saccomani, Gianmaria; Perduca, Massimiliano; Monaco, Hugo L.

    2011-01-01

    The SOUL protein is known to induce apoptosis by provoking the mitochondrial permeability transition, and a sequence homologous with the BH3 (Bcl-2 homology 3) domains has recently been identified in the protein, thus making it a potential new member of the BH3-only protein family. In the present study, we provide NMR, SPR (surface plasmon resonance) and crystallographic evidence that a peptide spanning residues 147–172 in SOUL interacts with the anti-apoptotic protein Bcl-xL. We have crystallized SOUL alone and the complex of its BH3 domain peptide with Bcl-xL, and solved their three-dimensional structures. The SOUL monomer is a single domain organized as a distorted β-barrel with eight anti-parallel strands and two α-helices. The BH3 domain extends across 15 residues at the end of the second helix and eight amino acids in the chain following it. There are important structural differences in the BH3 domain in the intact SOUL molecule and the same sequence bound to Bcl-xL. PMID:21639858

  1. Ketide Synthase (KS) Domain Prediction and Analysis of Iterative Type II PKS Gene in Marine Sponge-Associated Actinobacteria Producing Biosurfactants and Antimicrobial Agents

    PubMed Central

    Selvin, Joseph; Sathiyanarayanan, Ganesan; Lipton, Anuj N.; Al-Dhabi, Naif Abdullah; Valan Arasu, Mariadhas; Kiran, George S.

    2016-01-01

    The important biological macromolecules, such as lipopeptide and glycolipid biosurfactant producing marine actinobacteria were analyzed and their potential linkage between type II polyketide synthase (PKS) genes was explored. A unique feature of type II PKS genes is their high amino acid (AA) sequence homology and conserved gene organization. These enzymes mediate the biosynthesis of polyketide natural products with enormous structural complexity and chemical nature by combinatorial use of various domains. Therefore, deciphering the order of AA sequence encoded by PKS domains tailored the chemical structure of polyketide analogs still remains a great challenge. The present work deals with an in vitro and in silico analysis of PKS type II genes from five actinobacterial species to correlate KS domain architecture and structural features. Our present analysis reveals the unique protein domain organization of iterative type II PKS and KS domain of marine actinobacteria. The findings of this study would have implications in metabolic pathway reconstruction and design of semi-synthetic genomes to achieve rational design of novel natural products. PMID:26903957

  2. Sequence analyses reveal that a TPR-DP module, surrounded by recombinable flanking introns, could be at the origin of eukaryotic Hop and Hip TPR-DP domains and prokaryotic GerD proteins.

    PubMed

    Hernández Torres, Jorge; Papandreou, Nikolaos; Chomilier, Jacques

    2009-05-01

    The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR-DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR-DP domains.

  3. The Replication Focus Targeting Sequence (RFTS) Domain Is a DNA-competitive Inhibitor of Dnmt1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Syeda, Farisa; Fagan, Rebecca L.; Wean, Matthew

    Dnmt1 (DNA methyltransferase 1) is the principal enzyme responsible for maintenance of cytosine methylation at CpG dinucleotides in the mammalian genome. The N-terminal replication focus targeting sequence (RFTS) domain of Dnmt1 has been implicated in subcellular localization, protein association, and catalytic function. However, progress in understanding its function has been limited by the lack of assays for and a structure of this domain. Here, we show that the naked DNA- and polynucleosome-binding activities of Dnmt1 are inhibited by the RFTS domain, which functions by virtue of binding the catalytic domain to the exclusion of DNA. Kinetic analysis with a fluorogenicmore » DNA substrate established the RFTS domain as a 600-fold inhibitor of Dnmt1 enzymatic activity. The crystal structure of the RFTS domain reveals a novel fold and supports a mechanism in which an RFTS-targeted Dnmt1-binding protein, such as Uhrf1, may activate Dnmt1 for DNA binding.« less

  4. Structural characterization of the N-terminal mineral modification domains from the molluscan crystal-modulating biomineralization proteins, AP7 and AP24.

    PubMed

    Wustman, Brandon A; Morse, Daniel E; Evans, John Spencer

    2004-08-05

    The AP7 and AP24 proteins represent a class of mineral-interaction polypeptides that are found in the aragonite-containing nacre layer of mollusk shell (H. rufescens). These proteins have been shown to preferentially interfere with calcium carbonate mineral growth in vitro. It is believed that both proteins play an important role in aragonite polymorph selection in the mollusk shell. Previously, we demonstrated the 1-30 amino acid (AA) N-terminal sequences of AP7 and AP24 represent mineral interaction/modification domains in both proteins, as evidenced by their ability to frustrate calcium carbonate crystal growth at step edge regions. In this present report, using free N-terminal, C(alpha)-amide "capped" synthetic polypeptides representing the 1-30 AA regions of AP7 (AP7-1 polypeptide) and AP24 (AP24-1 polypeptide) and NMR spectroscopy, we confirm that both N-terminal sequences possess putative Ca (II) interaction polyanionic sequence regions (2 x -DD- in AP7-1, -DDDED- in AP24-1) that are random coil-like in structure. However, with regard to the remaining sequences regions, each polypeptide features unique structural differences. AP7-1 possesses an extended beta-strand or polyproline type II-like structure within the A11-M10, S12-V13, and S28-I27 sequence regions, with the remaining sequence regions adopting a random-coil-like structure, a trait common to other polyelectrolyte mineral-associated polypeptide sequences. Conversely, AP24-1 possesses random coil-like structure within A1-S9 and Q14-N16 sequence regions, and evidence for turn-like, bend, or loop conformation within the G10-N13, Q17-N24, and M29-F30 sequence regions, similar to the structures identified within the putative elastomeric proteins Lustrin A and sea urchin spicule matrix proteins. The similarities and differences in AP7 and AP24 N-terminal domain structure are discussed with regard to joint AP7-AP24 protein modification of calcium carbonate growth. Copyright 2004 Wiley Periodicals, Inc.

  5. Formation of highly stable chimeric trimers by fusion of an adenovirus fiber shaft fragment with the foldon domain of bacteriophage t4 fibritin.

    PubMed

    Papanikolopoulou, Katerina; Forge, Vincent; Goeltz, Pierrette; Mitraki, Anna

    2004-03-05

    The folding of beta-structured, fibrous proteins is a largely unexplored area. A class of such proteins is used by viruses as adhesins, and recent studies revealed novel beta-structured motifs for them. We have been studying the folding and assembly of adenovirus fibers that consist of a globular C-terminal domain, a central fibrous shaft, and an N-terminal part that attaches to the viral capsid. The globular C-terminal, or "head" domain, has been postulated to be necessary for the trimerization of the fiber and might act as a registration signal that directs its correct folding and assembly. In this work, we replaced the head of the fiber by the trimerization domain of the bacteriophage T4 fibritin, termed "foldon." Two chimeric proteins, comprising the foldon domain connected at the C-terminal end of four fiber shaft repeats with or without the use of a natural linker sequence, fold into highly stable, SDS-resistant trimers. The structural signatures of the chimeric proteins as seen by CD and infrared spectroscopy are reported. The results suggest that the foldon domain can successfully replace the fiber head domain in ensuring correct trimerization of the shaft sequences. Biological implications and implications for engineering highly stable, beta-structured nanorods are discussed.

  6. Structures of Bacterial Biosynthetic Arginine Decarboxylases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    F Forouhar; S Lew; J Seetharaman

    2011-12-31

    Biosynthetic arginine decarboxylase (ADC; also known as SpeA) plays an important role in the biosynthesis of polyamines from arginine in bacteria and plants. SpeA is a pyridoxal-5'-phosphate (PLP)-dependent enzyme and shares weak sequence homology with several other PLP-dependent decarboxylases. Here, the crystal structure of PLP-bound SpeA from Campylobacter jejuni is reported at 3.0 {angstrom} resolution and that of Escherichia coli SpeA in complex with a sulfate ion is reported at 3.1 {angstrom} resolution. The structure of the SpeA monomer contains two large domains, an N-terminal TIM-barrel domain followed by a {beta}-sandwich domain, as well as two smaller helical domains. Themore » TIM-barrel and {beta}-sandwich domains share structural homology with several other PLP-dependent decarboxylases, even though the sequence conservation among these enzymes is less than 25%. A similar tetramer is observed for both C. jejuni and E. coli SpeA, composed of two dimers of tightly associated monomers. The active site of SpeA is located at the interface of this dimer and is formed by residues from the TIM-barrel domain of one monomer and a highly conserved loop in the {beta}-sandwich domain of the other monomer. The PLP cofactor is recognized by hydrogen-bonding, {pi}-stacking and van der Waals interactions.« less

  7. MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks

    USDA-ARS?s Scientific Manuscript database

    Motivation: Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary h...

  8. A domain-centric solution to functional genomics via dcGO Predictor

    PubMed Central

    2013-01-01

    Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era. PMID:23514627

  9. Molecular Cloning of Drebrin: Progress and Perspectives.

    PubMed

    Kojima, Nobuhiko

    2017-01-01

    Chicken drebrin isoforms were first identified in the optic tectum of developing brain. Although the time course of protein expression was different in each drebrin isoform, the similarity between their protein structures was suggested by biochemical analysis of purified protein. To determine their protein structures, the cloning of drebrin cDNAs was conducted. Comparison between the cDNA sequences shows that all drebrin cDNAs are identical except that the internal insertion sequences are present or absent in their sequences. Chicken drebrin are now classified into three isoforms, namely, drebrins E1, E2, and A. Genomic cloning demonstrated that the three isoforms are generated by an alternative splicing of individual exons encoding the insertion sequences from single drebrin gene. The mechanism should be precisely regulated in cell-type-specific and developmental stage-specific fashion. Drebrin protein, which is well conserved in various vertebrate species, although mammalian drebrin has only two isoforms, namely, drebrin E and drebrin A, is different from chicken drebrin that has three isoforms. Drebrin belongs to an actin-depolymerizing factor homology (ADF-H) domain protein family. Besides the ADF-H domain, drebrin has other domains, including the actin-binding domain and Homer-binding motifs. Diversity of protein isoform and multiple domains of drebrin could interact differentially with the actin cytoskeleton and other intracellular proteins and regulate diverse cellular processes.

  10. Solution structure of telomere binding domain of AtTRB2 derived from Arabidopsis thaliana

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yun, Ji-Hye; Lee, Won Kyung; Kim, Heeyoun

    Highlights: • We have determined solution structure of Myb domain of AtTRB2. • The Myb domain of AtTRB2 is located in the N-terminal region. • The Myb domain of AtTRB2 binds to plant telomeric DNA without fourth helix. • Helix 2 and 3 of the Myb domain of AtTRB2 are involved in DNA recognition. • AtTRB2 is a novel protein distinguished from other known plant TBP. - Abstract: Telomere homeostasis is regulated by telomere-associated proteins, and the Myb domain is well conserved for telomere binding. AtTRB2 is a member of the SMH (Single-Myb-Histone)-like family in Arabidopsis thaliana, having an N-terminalmore » Myb domain, which is responsible for DNA binding. The Myb domain of AtTRB2 contains three α-helices and loops for DNA binding, which is unusual given that other plant telomere-binding proteins have an additional fourth helix that is essential for DNA binding. To understand the structural role for telomeric DNA binding of AtTRB2, we determined the solution structure of the Myb domain of AtTRB2 (AtTRB2{sub 1–64}) using nuclear magnetic resonance (NMR) spectroscopy. In addition, the inter-molecular interaction between AtTRB2{sub 1–64} and telomeric DNA has been characterized by the electrophoretic mobility shift assay (EMSA) and NMR titration analyses for both plant (TTTAGGG)n and human (TTAGGG)n telomere sequences. Data revealed that Trp28, Arg29, and Val47 residues located in Helix 2 and Helix 3 are crucial for DNA binding, which are well conserved among other plant telomere binding proteins. We concluded that although AtTRB2 is devoid of the additional fourth helix in the Myb-extension domain, it is able to bind to plant telomeric repeat sequences as well as human telomeric repeat sequences.« less

  11. Re-refinement of the spliceosomal U4 snRNP core-domain structure

    PubMed Central

    Li, Jade; Leung, Adelaine K.; Kondo, Yasushi; Oubridge, Chris; Nagai, Kiyoshi

    2016-01-01

    The core domain of small nuclear ribonucleoprotein (snRNP), comprised of a ring of seven paralogous proteins bound around a single-stranded RNA sequence, functions as the assembly nucleus in the maturation of U1, U2, U4 and U5 spliceosomal snRNPs. The structure of the human U4 snRNP core domain was initially solved at 3.6 Å resolution by experimental phasing using data with tetartohedral twinning. Molecular replacement from this model followed by density modification using untwinned data recently led to a structure of the minimal U1 snRNP at 3.3 Å resolution. With the latter structure providing a search model for molecular replacement, the U4 core-domain structure has now been re-refined. The U4 Sm site-sequence AAUUUUU has been shown to bind to the seven Sm proteins SmF–SmE–SmG–SmD3–SmB–SmD1–SmD2 in an identical manner as the U1 Sm-site sequence AAUUUGU, except in SmD1 where the bound U replaces G. The progression from the initial to the re-refined structure exemplifies a tortuous route to accuracy: where well diffracting crystals of complex assemblies are initially unavailable, the early model errors are rectified by exploiting preliminary interpretations in further experiments involving homologous structures. New insights are obtained from the more accurate model. PMID:26894541

  12. Homology modeling of a Class A GPCR in the inactive conformation: A quantitative analysis of the correlation between model/template sequence identity and model accuracy.

    PubMed

    Costanzi, Stefano; Skorski, Matthew; Deplano, Alessandro; Habermehl, Brett; Mendoza, Mary; Wang, Keyun; Biederman, Michelle; Dawson, Jessica; Gao, Jia

    2016-11-01

    With the present work we quantitatively studied the modellability of the inactive state of Class A G protein-coupled receptors (GPCRs). Specifically, we constructed models of one of the Class A GPCRs for which structures solved in the inactive state are available, namely the β 2 AR, using as templates each of the other class members for which structures solved in the inactive state are also available. Our results showed a detectable linear correlation between model accuracy and model/template sequence identity. This suggests that the likely accuracy of the homology models that can be built for a given receptor can be generally forecasted on the basis of the available templates. We also probed whether sequence alignments that allow for the presence of gaps within the transmembrane domains to account for structural irregularities afford better models than the classical alignment procedures that do not allow for the presence of gaps within such domains. As our results indicated, although the overall differences are very subtle, the inclusion of internal gaps within the transmembrane domains has a noticeable a beneficial effect on the local structural accuracy of the domain in question. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Domain Organization in Clostridium botulinum Neurotoxin Type E is Unique: Its Implication in Faster Translocation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumaran, D.; Eswaramoorthy, S; Furey, W

    2009-01-01

    Clostridium botulinum produces seven antigenically distinct neurotoxins [C. botulinum neurotoxins (BoNTs) A-G] sharing a significant sequence homology. Based on sequence and functional similarity, it was believed that their three-dimensional structures will also be similar. Indeed, the crystal structures of BoNTs A and B exhibit similar fold and domain association where the translocation domain is flanked on either side by binding and catalytic domains. Here, we report the crystal structure of BoNT E holotoxin and show that the domain association is different and unique, although the individual domains are similar to those of BoNTs A and B. In BoNT E, bothmore » the binding domain and the catalytic domain are on the same side of the translocation domain, and all three have mutual interfaces. This unique association may have an effect on the rate of translocation, with the molecule strategically positioned in the vesicle for quick entry into cytosol. Botulism, the disease caused by BoNT E, sets in faster than any other serotype because of its speedy internalization and translocation, and the present structure offers a credible explanation. We propose that the translocation domain in other BoNTs follows a two-step process to attain translocation-competent conformation as in BoNT E. We also suggest that this translocation-competent conformation in BoNT E is a probable reason for its faster toxic rate compared to BoNT A. However, this needs further experimental elucidation.« less

  14. MMDB: Entrez’s 3D-structure database

    PubMed Central

    Wang, Yanli; Anderson, John B.; Chen, Jie; Geer, Lewis Y.; He, Siqian; Hurwitz, David I.; Liebert, Cynthia A.; Madej, Thomas; Marchler, Gabriele H.; Marchler-Bauer, Aron; Panchenko, Anna R.; Shoemaker, Benjamin A.; Song, James S.; Thiessen, Paul A.; Yamashita, Roxanne A.; Bryant, Stephen H.

    2002-01-01

    Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure. PMID:11752307

  15. Probing the electrostatics and pharmacologic modulation of sequence-specific binding by the DNA-binding domain of the ETS-family transcription factor PU.1: a binding affinity and kinetics investigation

    PubMed Central

    Munde, Manoj; Poon, Gregory M. K.; Wilson, W. David

    2013-01-01

    Members of the ETS family of transcription factors regulate a functionally diverse array of genes. All ETS proteins share a structurally-conserved but sequence-divergent DNA-binding domain, known as the ETS domain. Although the structure and thermodynamics of the ETS-DNA complexes are well known, little is known about the kinetics of sequence recognition, a facet that offers potential insight into its molecular mechanism. We have characterized DNA binding by the ETS domain of PU.1 by biosensor-surface plasmon resonance (SPR). SPR analysis revealed a striking kinetic profile for DNA binding by the PU.1 ETS domain. At low salt concentrations, it binds high-affinity cognate DNA with a very slow association rate constant (≤105 M−1 s−1), compensated by a correspondingly small dissociation rate constant. The kinetics are strongly salt-dependent but mutually balance to produce a relatively weak dependence in the equilibrium constant. This profile contrasts sharply with reported data for other ETS domains (e.g., Ets-1, TEL) for which high-affinity binding is driven by rapid association (>107 M−1 s−1). We interpret this difference in terms of the hydration properties of ETS-DNA binding and propose that at least two mechanisms of sequence recognition are employed by this family of DNA-binding domain. Additionally, we use SPR to demonstrate the potential for pharmacological inhibition of sequence-specific ETS-DNA binding, using the minor groove-binding distamycin as a model compound. Our work establishes SPR as a valuable technique for extending our understanding of the molecular mechanisms of ETS-DNA interactions as well as developing potential small-molecule agents for biotechnological and therapeutic purposes. PMID:23416556

  16. Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

    PubMed Central

    Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

    2007-01-01

    We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688

  17. Prediction of Ras-effector interactions using position energy matrices.

    PubMed

    Kiel, Christina; Serrano, Luis

    2007-09-01

    One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.

  18. Structural analyses of the CRISPR protein Csc2 reveal the RNA-binding interface of the type I-D Cas7 family.

    PubMed

    Hrle, Ajla; Maier, Lisa-Katharina; Sharma, Kundan; Ebert, Judith; Basquin, Claire; Urlaub, Henning; Marchfelder, Anita; Conti, Elena

    2014-01-01

    Upon pathogen invasion, bacteria and archaea activate an RNA-interference-like mechanism termed CRISPR (clustered regularly interspaced short palindromic repeats). A large family of Cas (CRISPR-associated) proteins mediates the different stages of this sophisticated immune response. Bioinformatic studies have classified the Cas proteins into families, according to their sequences and respective functions. These range from the insertion of the foreign genetic elements into the host genome to the activation of the interference machinery as well as target degradation upon attack. Cas7 family proteins are central to the type I and type III interference machineries as they constitute the backbone of the large interference complexes. Here we report the crystal structure of Thermofilum pendens Csc2, a Cas7 family protein of type I-D. We found that Csc2 forms a core RRM-like domain, flanked by three peripheral insertion domains: a lid domain, a Zinc-binding domain and a helical domain. Comparison with other Cas7 family proteins reveals a set of similar structural features both in the core and in the peripheral domains, despite the absence of significant sequence similarity. T. pendens Csc2 binds single-stranded RNA in vitro in a sequence-independent manner. Using a crosslinking - mass-spectrometry approach, we mapped the RNA-binding surface to a positively charged surface patch on T. pendens Csc2. Thus our analysis of the key structural and functional features of T. pendens Csc2 highlights recurring themes and evolutionary relationships in type I and type III Cas proteins.

  19. Identification of novel protein domains required for the expression of an active dehydratase fragment from a polyunsaturated fatty acid synthase.

    PubMed

    Oyola-Robles, Delise; Gay, Darren C; Trujillo, Uldaeliz; Sánchez-Parés, John M; Bermúdez, Mei-Ling; Rivera-Díaz, Mónica; Carballeira, Néstor M; Baerga-Ortiz, Abel

    2013-07-01

    Polyunsaturated fatty acids (PUFAs) are made in some strains of deep-sea bacteria by multidomain proteins that catalyze condensation, ketoreduction, dehydration, and enoyl-reduction. In this work, we have used the Udwary-Merski Algorithm sequence analysis tool to define the boundaries that enclose the dehydratase (DH) domains in a PUFA multienzyme. Sequence analysis revealed the presence of four areas of high structure in a region that was previously thought to contain only two DH domains as defined by FabA-homology. The expression of the protein fragment containing all four protein domains resulted in an active enzyme, while shorter protein fragments were not soluble. The tetradomain fragment was capable of catalyzing the conversion of crotonyl-CoA to β-hydroxybutyryl-CoA efficiently, as shown by UV absorbance change as well as by chromatographic retention of reaction products. Sequence alignments showed that the two novel domains contain as much sequence conservation as the FabA-homology domains, suggesting that they too may play a functional role in the overall reaction. Structure predictions revealed that all domains belong to the hotdog protein family: two of them contain the active site His70 residue present in FabA-like DHs, while the remaining two do not. Replacing the active site His residues in both FabA domains for Ala abolished the activity of the tetradomain fragment, indicating that the DH activity is contained within the FabA-homology regions. Taken together, these results provide a first glimpse into a rare arrangement of DH domains which constitute a defining feature of the PUFA synthases. Copyright © 2013 The Protein Society.

  20. Identification of novel protein domains required for the expression of an active dehydratase fragment from a polyunsaturated fatty acid synthase

    PubMed Central

    Oyola-Robles, Delise; Gay, Darren C; Trujillo, Uldaeliz; Sánchez-Parés, John M; Bermúdez, Mei-Ling; Rivera-Díaz, Mónica; Carballeira, Néstor M; Baerga-Ortiz, Abel

    2013-01-01

    Polyunsaturated fatty acids (PUFAs) are made in some strains of deep-sea bacteria by multidomain proteins that catalyze condensation, ketoreduction, dehydration, and enoyl-reduction. In this work, we have used the Udwary-Merski Algorithm sequence analysis tool to define the boundaries that enclose the dehydratase (DH) domains in a PUFA multienzyme. Sequence analysis revealed the presence of four areas of high structure in a region that was previously thought to contain only two DH domains as defined by FabA-homology. The expression of the protein fragment containing all four protein domains resulted in an active enzyme, while shorter protein fragments were not soluble. The tetradomain fragment was capable of catalyzing the conversion of crotonyl-CoA to β-hydroxybutyryl-CoA efficiently, as shown by UV absorbance change as well as by chromatographic retention of reaction products. Sequence alignments showed that the two novel domains contain as much sequence conservation as the FabA-homology domains, suggesting that they too may play a functional role in the overall reaction. Structure predictions revealed that all domains belong to the hotdog protein family: two of them contain the active site His70 residue present in FabA-like DHs, while the remaining two do not. Replacing the active site His residues in both FabA domains for Ala abolished the activity of the tetradomain fragment, indicating that the DH activity is contained within the FabA-homology regions. Taken together, these results provide a first glimpse into a rare arrangement of DH domains which constitute a defining feature of the PUFA synthases. PMID:23696301

  1. Proteins with an Euonymus lectin-like domain are ubiquitous in Embryophyta

    PubMed Central

    2009-01-01

    Background Cloning of the Euonymus lectin led to the discovery of a novel domain that also occurs in some stress-induced plant proteins. The distribution and the diversity of proteins with an Euonymus lectin (EUL) domain were investigated using detailed analysis of sequences in publicly accessible genome and transcriptome databases. Results Comprehensive in silico analyses indicate that the recently identified Euonymus europaeus lectin domain represents a conserved structural unit of a novel family of putative carbohydrate-binding proteins, which will further be referred to as the Euonymus lectin (EUL) family. The EUL domain is widespread among plants. Analysis of retrieved sequences revealed that some sequences consist of a single EUL domain linked to an unrelated N-terminal domain whereas others comprise two in tandem arrayed EUL domains. A new classification system for these lectins is proposed based on the overall domain architecture. Evolutionary relationships among the sequences with EUL domains are discussed. Conclusion The identification of the EUL family provides the first evidence for the occurrence in terrestrial plants of a highly conserved plant specific domain. The widespread distribution of the EUL domain strikingly contrasts the more limited or even narrow distribution of most other lectin domains found in plants. The apparent omnipresence of the EUL domain is indicative for a universal role of this lectin domain in plants. Although there is unambiguous evidence that several EUL domains possess carbohydrate-binding activity further research is required to corroborate the carbohydrate-binding properties of different members of the EUL family. PMID:19930663

  2. 1H and 15N NMR resonance assignments and secondary structure of titin type I domains.

    PubMed

    Muhle-Goll, C; Nilges, M; Pastore, A

    1997-01-01

    Titin/connectin is a giant muscle protein with a highly modular architecture consisting of multiple repeats of two sequence motifs, named type I and type II. Type I modules have been suggested to be intracellular members of the fibronectin type III (Fn3) domain family. Along the titin sequence they are exclusively present in the region of the molecule located in the sarcomere A-band. This region has been shown to interact with myosin and C-protein. One of the most noticeable features of type I modules is that they are particularly rich in semiconserved prolines, since these residues account for about 8% of their sequence. We have determined the secondary structure of a representative type I domain (A71) by 15N and 1H NMR. We show that the type I domains of titin have the Fn3 fold as proposed, consisting of a three- and a four-stranded beta-sheet. When the two sheets are placed on top of each other to form the beta-sandwich characteristic of the Fn3 fold, 8 out of 10 prolines are found on the same side of the molecule and form an exposed hydrophobic patch. This suggests that the semiconserved prolines might be relevant for the function of type I modules, providing a surface for binding to other A-band proteins. The secondary structure of A71 was structurally aligned to other extracellular Fn3 modules of known 3D structure. The alignment shows that titin type I modules have closest similarity to the first Fn3 domain of Drosophila neuroglian.

  3. Common fold in helix–hairpin–helix proteins

    PubMed Central

    Shao, Xuguang; Grishin, Nick V.

    2000-01-01

    Helix–hairpin–helix (HhH) is a widespread motif involved in non-sequence-specific DNA binding. The majority of HhH motifs function as DNA-binding modules, however, some of them are used to mediate protein–protein interactions or have acquired enzymatic activity by incorporating catalytic residues (DNA glycosylases). From sequence and structural analysis of HhH-containing proteins we conclude that most HhH motifs are integrated as a part of a five-helical domain, termed (HhH)2 domain here. It typically consists of two consecutive HhH motifs that are linked by a connector helix and displays pseudo-2-fold symmetry. (HhH)2 domains show clear structural integrity and a conserved hydrophobic core composed of seven residues, one residue from each α-helix and each hairpin, and deserves recognition as a distinct protein fold. In addition to known HhH in the structures of RuvA, RadA, MutY and DNA-polymerases, we have detected new HhH motifs in sterile alpha motif and barrier-to-autointegration factor domains, the α-subunit of Escherichia coli RNA-polymerase, DNA-helicase PcrA and DNA glyco­s­y­lases. Statistically significant sequence similarity of HhH motifs and pronounced structural conservation argue for homology between (HhH)2 domains in different protein families. Our analysis helps to clarify how non-symmetric protein motifs bind to the double helix of DNA through the formation of a pseudo-2-fold symmetric (HhH)2 functional unit. PMID:10908318

  4. Prediction of a common beta-propeller catalytic domain for fructosyltransferases of different origin and substrate specificity.

    PubMed

    Pons, T; Hernández, L; Batista, F R; Chinea, G

    2000-11-01

    The three-dimensional (3D) structure of fructan biosynthetic enzymes is still unknown. Here, we have explored folding similarities between reported microbial and plant enzymes that catalyze transfructosylation reactions. A sequence-structure compatibility search using TOPITS, SDP, 3D-PSSM, and SAM-T98 programs identified a beta-propeller fold with scores above the confidence threshold that indicate a structurally conserved catalytic domain in fructosyltransferases (FTFs) of diverse origin and substrate specificity. The predicted fold appeared related to that of neuraminidase and sialidase, of glycoside hydrolase families 33 and 34, respectively. The most reliable structural model was obtained using the crystal structure of neuraminidase (Protein Data Bank file: 5nn9) as template, and it is consistent with the location of previously identified functional residues of bacterial levansucrases (Batista et al., 1999; Song & Jacques, 1999). The sequence-sequence analysis presented here reinforces the recent inclusion of fungal and plant FTFs into glycoside hydrolase family 32, and suggests a modified sequence pattern H-x (2)-[PTV]-x (4)-[LIVMA]-[NSCAYG]-[DE]-P-[NDSC][GA]3 for this family.

  5. Prediction of a common beta-propeller catalytic domain for fructosyltransferases of different origin and substrate specificity.

    PubMed Central

    Pons, T.; Hernández, L.; Batista, F. R.; Chinea, G.

    2000-01-01

    The three-dimensional (3D) structure of fructan biosynthetic enzymes is still unknown. Here, we have explored folding similarities between reported microbial and plant enzymes that catalyze transfructosylation reactions. A sequence-structure compatibility search using TOPITS, SDP, 3D-PSSM, and SAM-T98 programs identified a beta-propeller fold with scores above the confidence threshold that indicate a structurally conserved catalytic domain in fructosyltransferases (FTFs) of diverse origin and substrate specificity. The predicted fold appeared related to that of neuraminidase and sialidase, of glycoside hydrolase families 33 and 34, respectively. The most reliable structural model was obtained using the crystal structure of neuraminidase (Protein Data Bank file: 5nn9) as template, and it is consistent with the location of previously identified functional residues of bacterial levansucrases (Batista et al., 1999; Song & Jacques, 1999). The sequence-sequence analysis presented here reinforces the recent inclusion of fungal and plant FTFs into glycoside hydrolase family 32, and suggests a modified sequence pattern H-x (2)-[PTV]-x (4)-[LIVMA]-[NSCAYG]-[DE]-P-[NDSC][GA]3 for this family. PMID:11305239

  6. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  7. Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.

    PubMed

    Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui

    2012-11-07

    RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.

  8. New approaches to high-throughput structure characterization of SH3 complexes: the example of Myosin-3 and Myosin-5 SH3 domains from S. cerevisiae.

    PubMed

    Musi, Valeria; Birdsall, Berry; Fernandez-Ballester, Gregorio; Guerrini, Remo; Salvatori, Severo; Serrano, Luis; Pastore, Annalisa

    2006-04-01

    SH3 domains are small protein modules that are involved in protein-protein interactions in several essential metabolic pathways. The availability of the complete genome and the limited number of clearly identifiable SH3 domains make the yeast Saccharomyces cerevisae an ideal proteomic-based model system to investigate the structural rules dictating the SH3-mediated protein interactions and to develop new tools to assist these studies. In the present work, we have determined the solution structure of the SH3 domain from Myo3 and modeled by homology that of the highly homologous Myo5, two myosins implicated in actin polymerization. We have then implemented an integrated approach that makes use of experimental and computational methods to characterize their binding properties. While accommodating their targets in the classical groove, the two domains have selectivity in both orientation and sequence specificity of the target peptides. From our study, we propose a consensus sequence that may provide a useful guideline to identify new natural partners and suggest a strategy of more general applicability that may be of use in other structural proteomic studies.

  9. HCV IRES domain IIb affects the configuration of coding RNA in the 40S subunit's decoding groove

    PubMed Central

    Filbin, Megan E.; Kieft, Jeffrey S.

    2011-01-01

    Hepatitis C virus (HCV) uses a structured internal ribosome entry site (IRES) RNA to recruit the translation machinery to the viral RNA and begin protein synthesis without the ribosomal scanning process required for canonical translation initiation. Different IRES structural domains are used in this process, which begins with direct binding of the 40S ribosomal subunit to the IRES RNA and involves specific manipulation of the translational machinery. We have found that upon initial 40S subunit binding, the stem–loop domain of the IRES that contains the start codon unwinds and adopts a stable configuration within the subunit's decoding groove. This configuration depends on the sequence and structure of a different stem–loop domain (domain IIb) located far from the start codon in sequence, but spatially proximal in the IRES•40S complex. Mutation of domain IIb results in misconfiguration of the HCV RNA in the decoding groove that includes changes in the placement of the AUG start codon, and a substantial decrease in the ability of the IRES to initiate translation. Our results show that two distal regions of the IRES are structurally communicating at the initial step of 40S subunit binding and suggest that this is an important step in driving protein synthesis. PMID:21606179

  10. HCV IRES domain IIb affects the configuration of coding RNA in the 40S subunit's decoding groove.

    PubMed

    Filbin, Megan E; Kieft, Jeffrey S

    2011-07-01

    Hepatitis C virus (HCV) uses a structured internal ribosome entry site (IRES) RNA to recruit the translation machinery to the viral RNA and begin protein synthesis without the ribosomal scanning process required for canonical translation initiation. Different IRES structural domains are used in this process, which begins with direct binding of the 40S ribosomal subunit to the IRES RNA and involves specific manipulation of the translational machinery. We have found that upon initial 40S subunit binding, the stem-loop domain of the IRES that contains the start codon unwinds and adopts a stable configuration within the subunit's decoding groove. This configuration depends on the sequence and structure of a different stem-loop domain (domain IIb) located far from the start codon in sequence, but spatially proximal in the IRES•40S complex. Mutation of domain IIb results in misconfiguration of the HCV RNA in the decoding groove that includes changes in the placement of the AUG start codon, and a substantial decrease in the ability of the IRES to initiate translation. Our results show that two distal regions of the IRES are structurally communicating at the initial step of 40S subunit binding and suggest that this is an important step in driving protein synthesis.

  11. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    PubMed

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  12. Properties and structure of a low-potential, penta-heme cytochrome c 552 from a thermophilic purple sulfur photosynthetic bacterium Thermochromatium tepidum.

    PubMed

    Chen, Jing-Hua; Yu, Long-Jiang; Boussac, Alain; Wang-Otomo, Zheng-Yu; Kuang, Tingyun; Shen, Jian-Ren

    2018-04-24

    The thermophilic purple sulfur bacterium Thermochromatium tepidum possesses four main water-soluble redox proteins involved in the electron transfer behavior. Crystal structures have been reported for three of them: a high potential iron-sulfur protein, cytochrome c', and one of two low-potential cytochrome c 552 (which is a flavocytochrome c) have been determined. In this study, we purified another low-potential cytochrome c 552 (LPC), determined its N-terminal amino acid sequence and the whole gene sequence, characterized it with absorption and electron paramagnetic spectroscopy, and solved its high-resolution crystal structure. This novel cytochrome was found to contain five c-type hemes. The overall fold of LPC consists of two distinct domains, one is the five heme-containing domain and the other one is an Ig-like domain. This provides a representative example for the structures of multiheme cytochromes containing an odd number of hemes, although the structures of multiheme cytochromes with an even number of hemes are frequently seen in the PDB database. Comparison of the sequence and structure of LPC with other proteins in the databases revealed several characteristic features which may be important for its functioning. Based on the results obtained, we discuss the possible intracellular function of this LPC in Tch. tepidum.

  13. Structure of the Head of the Bartonella Adhesin BadA

    PubMed Central

    Szczesny, Pawel; Linke, Dirk; Ursinus, Astrid; Bär, Kerstin; Schwarz, Heinz; Riess, Tanja M.; Kempf, Volkhard A. J.; Lupas, Andrei N.; Martin, Jörg; Zeth, Kornelius

    2008-01-01

    Trimeric autotransporter adhesins (TAAs) are a major class of proteins by which pathogenic proteobacteria adhere to their hosts. Prominent examples include Yersinia YadA, Haemophilus Hia and Hsf, Moraxella UspA1 and A2, and Neisseria NadA. TAAs also occur in symbiotic and environmental species and presumably represent a general solution to the problem of adhesion in proteobacteria. The general structure of TAAs follows a head-stalk-anchor architecture, where the heads are the primary mediators of attachment and autoagglutination. In the major adhesin of Bartonella henselae, BadA, the head consists of three domains, the N-terminal of which shows strong sequence similarity to the head of Yersinia YadA. The two other domains were not recognizably similar to any protein of known structure. We therefore determined their crystal structure to a resolution of 1.1 Å. Both domains are β-prisms, the N-terminal one formed by interleaved, five-stranded β-meanders parallel to the trimer axis and the C-terminal one by five-stranded β-meanders orthogonal to the axis. Despite the absence of statistically significant sequence similarity, the two domains are structurally similar to domains from Haemophilus Hia, albeit in permuted order. Thus, the BadA head appears to be a chimera of domains seen in two other TAAs, YadA and Hia, highlighting the combinatorial evolutionary strategy taken by pathogens. PMID:18688279

  14. Sequence analyses reveal that a TPR–DP module, surrounded by recombinable flanking introns, could be at the origin of eukaryotic Hop and Hip TPR–DP domains and prokaryotic GerD proteins

    PubMed Central

    Papandreou, Nikolaos; Chomilier, Jacques

    2008-01-01

    The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR–DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR–DP domains. Electronic supplementary material The online version of this article (doi:10.1007/s12192-008-0083-8) contains supplementary material, which is available to authorized users. PMID:18987995

  15. Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks

    PubMed Central

    2011-01-01

    Background Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. Results A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Conclusions Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced. PMID:21849086

  16. Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks.

    PubMed

    Xie, Xueying; Jin, Jing; Mao, Yongyi

    2011-08-18

    Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Helander, Sara; Montecchio, Meri; Lemak, Alexander

    Highlights: • We describe the structure of a novel fold in FKBP25 and HectD. • The new fold is named the Basic Tilted Helix Bundle (BTHB) domain. • A conserved basic surface patch is presented, suggesting a functional role. - Abstract: In this paper, we describe the structure of a N-terminal domain motif in nuclear-localized FKBP25{sub 1–73}, a member of the FKBP family, together with the structure of a sequence-related subdomain of the E3 ubiquitin ligase HectD1 that we show belongs to the same fold. This motif adopts a compact 5-helix bundle which we name the Basic Tilted Helix Bundlemore » (BTHB) domain. A positively charged surface patch, structurally centered around the tilted helix H4, is present in both FKBP25 and HectD1 and is conserved in both proteins, suggesting a conserved functional role. We provide detailed comparative analysis of the structures of the two proteins and their sequence similarities, and analysis of the interaction of the proposed FKBP25 binding protein YY1. We suggest that the basic motif in BTHB is involved in the observed DNA binding of FKBP25, and that the function of this domain can be affected by regulatory YY1 binding and/or interactions with adjacent domains.« less

  18. Proline-poor hydrophobic domains modulate the assembly and material properties of polymeric elastin.

    PubMed

    Muiznieks, Lisa D; Reichheld, Sean E; Sitarz, Eva E; Miao, Ming; Keeley, Fred W

    2015-10-01

    Elastin is a self-assembling extracellular matrix protein that provides elasticity to tissues. For entropic elastomers such as elastin, conformational disorder of the monomer building block, even in the polymeric form, is essential for elastomeric recoil. The highly hydrophobic monomer employs a range of strategies for maintaining disorder and flexibility within hydrophobic domains, particularly involving a minimum compositional threshold of proline and glycine residues. However, the native sequence of hydrophobic elastin domain 30 is uncharacteristically proline-poor and, as an isolated polypeptide, is susceptible to formation of amyloid-like structures comprised of stacked β-sheet. Here we investigated the biophysical and mechanical properties of multiple sets of elastin-like polypeptides designed with different numbers of proline-poor domain 30 from human or rat tropoelastins. We compared the contributions of these proline-poor hydrophobic sequences to self-assembly through characterization of phase separation, and to the tensile properties of cross-linked, polymeric materials. We demonstrate that length of hydrophobic domains and propensity to form β-structure, both affecting polypeptide chain flexibility and cross-link density, play key roles in modulating elastin mechanical properties. This study advances the understanding of elastin sequence-structure-function relationships, and provides new insights that will directly support rational approaches to the design of biomaterials with defined suites of mechanical properties. © 2015 Wiley Periodicals, Inc.

  19. A galaxy of folds.

    PubMed

    Alva, Vikram; Remmert, Michael; Biegert, Andreas; Lupas, Andrei N; Söding, Johannes

    2010-01-01

    Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.

  20. Basic Tilted Helix Bundle - a new protein fold in human FKBP25/FKBP3 and HectD1.

    PubMed

    Helander, Sara; Montecchio, Meri; Lemak, Alexander; Farès, Christophe; Almlöf, Jonas; Yi, Yanjun; Yee, Adelinda; Arrowsmith, Cheryl; DhePaganon, Sirano; Sunnerhagen, Maria

    2014-04-25

    In this paper, we describe the structure of a N-terminal domain motif in nuclear-localized FKBP251-73, a member of the FKBP family, together with the structure of a sequence-related subdomain of the E3 ubiquitin ligase HectD1 that we show belongs to the same fold. This motif adopts a compact 5-helix bundle which we name the Basic Tilted Helix Bundle (BTHB) domain. A positively charged surface patch, structurally centered around the tilted helix H4, is present in both FKBP25 and HectD1 and is conserved in both proteins, suggesting a conserved functional role. We provide detailed comparative analysis of the structures of the two proteins and their sequence similarities, and analysis of the interaction of the proposed FKBP25 binding protein YY1. We suggest that the basic motif in BTHB is involved in the observed DNA binding of FKBP25, and that the function of this domain can be affected by regulatory YY1 binding and/or interactions with adjacent domains. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Functional characterization of Arabidopsis thaliana transthyretin-like protein.

    PubMed

    Pessoa, João; Sárkány, Zsuzsa; Ferreira-da-Silva, Frederico; Martins, Sónia; Almeida, Maria R; Li, Jianming; Damas, Ana M

    2010-02-18

    Arabidopsis thaliana transthyretin-like (TTL) protein is a potential substrate in the brassinosteroid signalling cascade, having a role that moderates plant growth. Moreover, sequence homology revealed two sequence domains similar to 2-oxo-4-hydroxy-4-carboxy-5-ureidoimidazoline (OHCU) decarboxylase (N-terminal domain) and 5-hydroxyisourate (5-HIU) hydrolase (C-terminal domain). TTL is a member of the transthyretin-related protein family (TRP), which comprises a number of proteins with sequence homology to transthyretin (TTR) and the characteristic C-terminal sequence motif Tyr-Arg-Gly-Ser. TRPs are single domain proteins that form tetrameric structures with 5-HIU hydrolase activity. Experimental evidence is fundamental for knowing if TTL is a tetrameric protein, formed by the association of the 5-HIU hydrolase domains and, in this case, if the structural arrangement allows for OHCU decarboxylase activity. This work reports about the biochemical and functional characterization of TTL. The TTL gene was cloned and the protein expressed and purified for biochemical and functional characterization. The results show that TTL is composed of four subunits, with a moderately elongated shape. We also found evidence for 5-HIU hydrolase and OHCU decarboxylase activities in vitro, in the full-length protein. The Arabidopsis thaliana transthyretin-like (TTL) protein is a tetrameric bifunctional enzyme, since it has 5-HIU hydrolase and OHCU decarboxylase activities, which were simultaneously observed in vitro.

  2. Genomic structure of the human D-site binding protein (DBP) gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shutler, G.; Glassco, T.; Kang, Xiaolin

    1996-06-15

    The human gene for the D-Site Binding Protein (DBP) has been sequenced and characterized. This gene is a member of the b/ZIP family of transcription factors and is one of three genes forming the PAR sub-family. DBP has been implicated in the diurnal regulation of a variety of liver-specific genes. Examination of the genomic structure of DBP reveals that the gene is divided into four exons and is contained within a relatively compact region of approximately 6 kb. These exons appear to correspond to functional divisions the DBP protein. Exon 1 contains a long 5{prime} UTR, and conservation between themore » rat and the human genes of the presence of small open reading frames within this region suggests that is may play a role in translational control. Exon 2 contains a limited region of similarity to the other PAR domain genes, which may be part of a potential activation domain. Exon 3 contains the PAR domain and differs by only 1 of 71 amino acids between rat and human. Exon 4, containing both the basic and the leucine zipper domains, is likewise highly conserved. The overall degree of homology between the rat and the human cDNA sequences is 82% for the nucleic acid sequence and 92% for the protein sequence. comparison of the rat and human proximal promoters reveals extensive sequence conservation, with two previously characterized DNA binding sites being conserved at the functional and sequence levels. 31 refs., 4 figs.« less

  3. Crystal Structure of the Bovine lactadherin C2 Domain, a Membrane Binding Motif, Shows Similarity to the C2 Domains of Factor V and Factor VIII

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin,L.

    2007-01-01

    Lactadherin, a glycoprotein secreted by a variety of cell types, contains two EGF domains and two C domains with sequence homology to the C domains of blood coagulation proteins factor V and factor VIII. Like these proteins, lactadherin binds to phosphatidylserine (PS)-containing membranes with high affinity. We determined the crystal structure of the bovine lactadherin C2 domain (residues 1 to 158) at 2.4 {angstrom}. The lactadherin C2 structure is similar to the C2 domains of factors V and VIII (rmsd of C{sub {alpha}} atoms of 0.9 {angstrom} and 1.2 {angstrom}, and sequence identities of 43% and 38%, respectively). The lactadherinmore » C2 domain has a discoidin-like fold containing two {beta}-sheets of five and three antiparallel {beta}-strands packed against one another. The N and C termini are linked by a disulfide bridge between Cys1 and Cys158. One {beta}-turn and two loops containing solvent-exposed hydrophobic residues extend from the C2 domain {beta}-sandwich core. In analogy with the C2 domains of factors V and VIII, some or all of these solvent-exposed hydrophobic residues, Trp26, Leu28, Phe31, and Phe81, likely participate in membrane binding. The C2 domain of lactadherin may serve as a marker of cell surface phosphatidylserine exposure and may have potential as a unique anti-thrombotic agent.« less

  4. Crystal Structure of the Bovine lactadherin C2 Domain, a Membrane Binding Motif, Shows Similarity of the C2 Domains of Factor V and Factor VIII

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin,L.; Huai, Q.; Huang, M.

    2007-01-01

    Lactadherin, a glycoprotein secreted by a variety of cell types, contains two EGF domains and two C domains with sequence homology to the C domains of blood coagulation proteins factor V and factor VIII. Like these proteins, lactadherin binds to phosphatidylserine (PS)-containing membranes with high affinity. We determined the crystal structure of the bovine lactadherin C2 domain (residues 1 to 158) at 2.4 Angstroms. The lactadherin C2 structure is similar to the C2 domains of factors V and VIII (rmsd of C? atoms of 0.9 Angstroms and 1.2 Angstroms, and sequence identities of 43% and 38%, respectively). The lactadherin C2more » domain has a discoidin-like fold containing two ?-sheets of five and three antiparallel ?-strands packed against one another. The N and C termini are linked by a disulfide bridge between Cys1 and Cys158. One ?-turn and two loops containing solvent-exposed hydrophobic residues extend from the C2 domain ?-sandwich core. In analogy with the C2 domains of factors V and VIII, some or all of these solvent-exposed hydrophobic residues, Trp26, Leu28, Phe31, and Phe81, likely participate in membrane binding. The C2 domain of lactadherin may serve as a marker of cell surface phosphatidylserine exposure and may have potential as a unique anti-thrombotic agent.« less

  5. Detecting Coevolution in and among Protein Domains

    PubMed Central

    Yeang, Chen-Hsiang; Haussler, David

    2007-01-01

    Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264

  6. Do pattern recognition skills transfer across sports? A preliminary analysis.

    PubMed

    Smeeton, Nicholas J; Ward, Paul; Williams, A Mark

    2004-02-01

    The ability to recognize patterns of play is fundamental to performance in team sports. While typically assumed to be domain-specific, pattern recognition skills may transfer from one sport to another if similarities exist in the perceptual features and their relations and/or the strategies used to encode and retrieve relevant information. A transfer paradigm was employed to compare skilled and less skilled soccer, field hockey and volleyball players' pattern recognition skills. Participants viewed structured and unstructured action sequences from each sport, half of which were randomly represented with clips not previously seen. The task was to identify previously viewed action sequences quickly and accurately. Transfer of pattern recognition skill was dependent on the participant's skill, sport practised, nature of the task and degree of structure. The skilled soccer and hockey players were quicker than the skilled volleyball players at recognizing structured soccer and hockey action sequences. Performance differences were not observed on the structured volleyball trials between the skilled soccer, field hockey and volleyball players. The skilled field hockey and soccer players were able to transfer perceptual information or strategies between their respective sports. The less skilled participants' results were less clear. Implications for domain-specific expertise, transfer and diversity across domains are discussed.

  7. Structural Determination of Functional Domains in Early B-cell Factor (EBF) Family of Transcription Factors Reveals Similarities to Rel DNA-binding Proteins and a Novel Dimerization Motif*

    PubMed Central

    Siponen, Marina I.; Wisniewska, Magdalena; Lehtiö, Lari; Johansson, Ida; Svensson, Linda; Raszewski, Grzegorz; Nilsson, Lennart; Sigvardsson, Mikael; Berglund, Helena

    2010-01-01

    The early B-cell factor (EBF) transcription factors are central regulators of development in several organs and tissues. This protein family shows low sequence similarity to other protein families, which is why structural information for the functional domains of these proteins is crucial to understand their biochemical features. We have used a modular approach to determine the crystal structures of the structured domains in the EBF family. The DNA binding domain reveals a striking resemblance to the DNA binding domains of the Rel homology superfamily of transcription factors but contains a unique zinc binding structure, termed zinc knuckle. Further the EBF proteins contain an IPT/TIG domain and an atypical helix-loop-helix domain with a novel type of dimerization motif. The data presented here provide insights into unique structural features of the EBF proteins and open possibilities for detailed molecular investigations of this important transcription factor family. PMID:20592035

  8. De novo identification of highly diverged protein repeats by probabilistic consistency.

    PubMed

    Biegert, A; Söding, J

    2008-03-15

    An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance. Server: http://toolkit.tuebingen.mpg.de/hhrepid; Executables: ftp://ftp.tuebingen.mpg.de/pub/protevo/HHrepID

  9. Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter.

    PubMed

    Roux-Rouquie, M; Marilley, M

    2000-09-15

    We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.

  10. FIST: a sensory domain for diverse signal transduction pathways in prokaryotes and ubiquitin signaling in eukaryotes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Borziak, Kirill; Jouline, Igor B

    2007-01-01

    Motivation: Sensory domains that are conserved among Bacteria, Archaea and Eucarya are important detectors of common signals detected by living cells. Due to their high sequence divergence, sensory domains are difficult to identify. We systematically look for novel sensory domains using sensitive profile-based searches initi-ated with regions of signal transduction proteins where no known domains can be identified by current domain models. Results: Using profile searches followed by multiple sequence alignment, structure prediction, and domain architecture analysis, we have identified a novel sensory domain termed FIST, which is present in signal transduction proteins from Bacteria, Archaea and Eucarya. Remote similaritymore » to a known ligand-binding fold and chromosomal proximity of FIST-encoding genes to those coding for proteins involved in amino acid metabolism and transport suggest that FIST domains bind small ligands, such as amino acids.« less

  11. The history of the CATH structural classification of protein domains.

    PubMed

    Sillitoe, Ian; Dawson, Natalie; Thornton, Janet; Orengo, Christine

    2015-12-01

    This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  12. From Binding-Induced Dynamic Effects in SH3 Structures to Evolutionary Conserved Sectors.

    PubMed

    Zafra Ruano, Ana; Cilia, Elisa; Couceiro, José R; Ruiz Sanz, Javier; Schymkowitz, Joost; Rousseau, Frederic; Luque, Irene; Lenaerts, Tom

    2016-05-01

    Src Homology 3 domains are ubiquitous small interaction modules known to act as docking sites and regulatory elements in a wide range of proteins. Prior experimental NMR work on the SH3 domain of Src showed that ligand binding induces long-range dynamic changes consistent with an induced fit mechanism. The identification of the residues that participate in this mechanism produces a chart that allows for the exploration of the regulatory role of such domains in the activity of the encompassing protein. Here we show that a computational approach focusing on the changes in side chain dynamics through ligand binding identifies equivalent long-range effects in the Src SH3 domain. Mutation of a subset of the predicted residues elicits long-range effects on the binding energetics, emphasizing the relevance of these positions in the definition of intramolecular cooperative networks of signal transduction in this domain. We find further support for this mechanism through the analysis of seven other publically available SH3 domain structures of which the sequences represent diverse SH3 classes. By comparing the eight predictions, we find that, in addition to a dynamic pathway that is relatively conserved throughout all SH3 domains, there are dynamic aspects specific to each domain and homologous subgroups. Our work shows for the first time from a structural perspective, which transduction mechanisms are common between a subset of closely related and distal SH3 domains, while at the same time highlighting the differences in signal transduction that make each family member unique. These results resolve the missing link between structural predictions of dynamic changes and the domain sectors recently identified for SH3 domains through sequence analysis.

  13. From Binding-Induced Dynamic Effects in SH3 Structures to Evolutionary Conserved Sectors

    PubMed Central

    Ruiz Sanz, Javier; Schymkowitz, Joost; Rousseau, Frederic

    2016-01-01

    Src Homology 3 domains are ubiquitous small interaction modules known to act as docking sites and regulatory elements in a wide range of proteins. Prior experimental NMR work on the SH3 domain of Src showed that ligand binding induces long-range dynamic changes consistent with an induced fit mechanism. The identification of the residues that participate in this mechanism produces a chart that allows for the exploration of the regulatory role of such domains in the activity of the encompassing protein. Here we show that a computational approach focusing on the changes in side chain dynamics through ligand binding identifies equivalent long-range effects in the Src SH3 domain. Mutation of a subset of the predicted residues elicits long-range effects on the binding energetics, emphasizing the relevance of these positions in the definition of intramolecular cooperative networks of signal transduction in this domain. We find further support for this mechanism through the analysis of seven other publically available SH3 domain structures of which the sequences represent diverse SH3 classes. By comparing the eight predictions, we find that, in addition to a dynamic pathway that is relatively conserved throughout all SH3 domains, there are dynamic aspects specific to each domain and homologous subgroups. Our work shows for the first time from a structural perspective, which transduction mechanisms are common between a subset of closely related and distal SH3 domains, while at the same time highlighting the differences in signal transduction that make each family member unique. These results resolve the missing link between structural predictions of dynamic changes and the domain sectors recently identified for SH3 domains through sequence analysis. PMID:27213566

  14. A proteome view of structural, functional, and taxonomic characteristics of major protein domain clusters.

    PubMed

    Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing

    2017-10-27

    Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.

  15. Purification of SOCS (Suppressor of Cytokine Signaling) SH2 Domains for Structural and Functional Studies.

    PubMed

    Liau, Nicholas P D; Laktyushin, Artem; Babon, Jeffrey J

    2017-01-01

    Src Homology 2 (SH2) domains are protein domains which have a high binding affinity for specific amino acid sequences containing a phosphorylated tyrosine residue. The Suppressors of Cytokine Signaling (SOCS) proteins use an SH2 domain to bind to components of certain cytokine signaling pathways to downregulate the signaling cascade. The recombinantly produced SH2 domains of various SOCS proteins have been used to undertake structural and functional studies elucidating the method of how such targeting occurs. Here, we describe the protocol for the recombinant production and purification of SOCS SH2 domains, with an emphasis on SOCS3.

  16. Discovering Prerequisite Structure of Skills through Probabilistic Association Rules Mining

    ERIC Educational Resources Information Center

    Chen, Yang; Wuillemin, Pierre-Henr; Labat, Jean-Marc

    2015-01-01

    Estimating the prerequisite structure of skills is a crucial issue in domain modeling. Students usually learn skills in sequence since the preliminary skills need to be learned prior to the complex skills. The prerequisite relations between skills underlie the design of learning sequence and adaptation strategies for tutoring systems. The…

  17. The impact of p53 protein core domain structural alteration on ovarian cancer survival.

    PubMed

    Rose, Stephen L; Robertson, Andrew D; Goodheart, Michael J; Smith, Brian J; DeYoung, Barry R; Buller, Richard E

    2003-09-15

    Although survival with a p53 missense mutation is highly variable, p53-null mutation is an independent adverse prognostic factor for advanced stage ovarian cancer. By evaluating ovarian cancer survival based upon a structure function analysis of the p53 protein, we tested the hypothesis that not all missense mutations are equivalent. The p53 gene was sequenced from 267 consecutive ovarian cancers. The effect of individual missense mutations on p53 structure was analyzed using the International Agency for Research on Cancer p53 Mutational Database, which specifies the effects of p53 mutations on p53 core domain structure. Mutations in the p53 core domain were classified as either explained or not explained in structural or functional terms by their predicted effects on protein folding, protein-DNA contacts, or mutation in highly conserved residues. Null mutations were classified by their mechanism of origin. Mutations were sequenced from 125 tumors. Effects of 62 of the 82 missense mutations (76%) could be explained by alterations in the p53 protein. Twenty-three (28%) of the explained mutations occurred in highly conserved regions of the p53 core protein. Twenty-two nonsense point mutations and 21 frameshift null mutations were sequenced. Survival was independent of missense mutation type and mechanism of null mutation. The hypothesis that not all missense mutations are equivalent is, therefore, rejected. Furthermore, p53 core domain structural alteration secondary to missense point mutation is not functionally equivalent to a p53-null mutation. The poor prognosis associated with p53-null mutation is independent of the mutation mechanism.

  18. When Global Structure "Explains Away" Local Grammar: A Bayesian Account of Rule-Induction in Tone Sequences

    ERIC Educational Resources Information Center

    Dawson, Colin; Gerken, LouAnn

    2011-01-01

    While many constraints on learning must be relatively experience-independent, past experience provides a rich source of guidance for subsequent learning. Discovering structure in some domain can inform a learner's future hypotheses about that domain. If a general property accounts for particular sub-patterns, a rational learner should not…

  19. SH2-catalytic domain linker heterogeneity influences allosteric coupling across the SFK family.

    PubMed

    Register, A C; Leonard, Stephen E; Maly, Dustin J

    2014-11-11

    Src-family kinases (SFKs) make up a family of nine homologous multidomain tyrosine kinases whose misregulation is responsible for human disease (cancer, diabetes, inflammation, etc.). Despite overall sequence homology and identical domain architecture, differences in SH3 and SH2 regulatory domain accessibility and ability to allosterically autoinhibit the ATP-binding site have been observed for the prototypical SFKs Src and Hck. Biochemical and structural studies indicate that the SH2-catalytic domain (SH2-CD) linker, the intramolecular binding epitope for SFK SH3 domains, is responsible for allosterically coupling SH3 domain engagement to autoinhibition of the ATP-binding site through the conformation of the αC helix. As a relatively unconserved region between SFK family members, SH2-CD linker sequence variability across the SFK family is likely a source of nonredundant cellular functions between individual SFKs via its effect on the availability of SH3 and SH2 domains for intermolecular interactions and post-translational modification. Using a combination of SFKs engineered with enhanced or weakened regulatory domain intramolecular interactions and conformation-selective inhibitors that report αC helix conformation, this study explores how SH2-CD sequence heterogeneity affects allosteric coupling across the SFK family by examining Lyn, Fyn1, and Fyn2. Analyses of Fyn1 and Fyn2, isoforms that are identical but for a 50-residue sequence spanning the SH2-CD linker, demonstrate that SH2-CD linker sequence differences can have profound effects on allosteric coupling between otherwise identical kinases. Most notably, a dampened allosteric connection between the SH3 domain and αC helix leads to greater autoinhibitory phosphorylation by Csk, illustrating the complex effects of SH2-CD linker sequence on cellular function.

  20. Structure of the Response Regulator PhoP from Mycobacterium tuberculosis Reveals a Dimer Through the Receiver Domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    S Menon; S Wang

    The PhoP protein from Mycobacterium tuberculosis is a response regulator of the OmpR/PhoB subfamily, whose structure consists of an N-terminal receiver domain and a C-terminal DNA-binding domain. How the DNA-binding activities are regulated by phosphorylation of the receiver domain remains unclear due to a lack of structural information on the full-length proteins. Here we report the crystal structure of the full-length PhoP of M. tuberculosis. Unlike other known structures of full-length proteins of the same subfamily, PhoP forms a dimer through its receiver domain with the dimer interface involving {alpha}4-{beta}5-{alpha}5, a common interface for activated receiver domain dimers. However, themore » switch residues, Thr99 and Tyr118, are in a conformation resembling those of nonactivated receiver domains. The Tyr118 side chain is involved in the dimer interface interactions. The receiver domain is tethered to the DNA-binding domain through a flexible linker and does not impose structural constraints on the DNA-binding domain. This structure suggests that phosphorylation likely facilitates/stabilizes receiver domain dimerization, bringing the DNA-binding domains to close proximity, thereby increasing their binding affinity for direct repeat DNA sequences.« less

  1. OST-HTH: a novel predicted RNA-binding domain

    PubMed Central

    2010-01-01

    Background The mechanism by which the arthropod Oskar and vertebrate TDRD5/TDRD7 proteins nucleate or organize structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood. Using sequence profile searches we identify a novel domain in these proteins that is widely conserved across eukaryotes and bacteria. Results Using contextual information from domain architectures, sequence-structure superpositions and available functional information we predict that this domain is likely to adopt the winged helix-turn-helix fold and bind RNA with a potential specificity for dsRNA. We show that in eukaryotes this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures. Conclusions Thus, proteins with this domain might have a key role in the recognition and localization of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterized domain (DUF88). We present evidence that it is an RNAse belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains and might be recruited to degrade certain RNAs. Reviewers This article was reviewed by Sandor Pongor and Arcady Mushegian. PMID:20302647

  2. Coiled-coil length: Size does matter.

    PubMed

    Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B

    2015-12-01

    Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.

  3. Cross-Species Analyses Identify the BNIP-2 and Cdc42GAP Homology (BCH) Domain as a Distinct Functional Subclass of the CRAL_TRIO/Sec14 Superfamily

    PubMed Central

    Gupta, Anjali Bansal; Wee, Liang En; Zhou, Yi Ting; Hortsch, Michael; Low, Boon Chuan

    2012-01-01

    The CRAL_TRIO protein domain, which is unique to the Sec14 protein superfamily, binds to a diverse set of small lipophilic ligands. Similar domains are found in a range of different proteins including neurofibromatosis type-1, a Ras GTPase-activating Protein (RasGAP) and Rho guanine nucleotide exchange factors (RhoGEFs). Proteins containing this structural protein domain exhibit a low sequence similarity and ligand specificity while maintaining an overall characteristic three-dimensional structure. We have previously demonstrated that the BNIP-2 and Cdc42GAP Homology (BCH) protein domain, which shares a low sequence homology with the CRAL_TRIO domain, can serve as a regulatory scaffold that binds to Rho, RhoGEFs and RhoGAPs to control various cell signalling processes. In this work, we investigate 175 BCH domain-containing proteins from a wide range of different organisms. A phylogenetic analysis with ∼100 CRAL_TRIO and similar domains from eight representative species indicates a clear distinction of BCH-containing proteins as a novel subclass within the CRAL_TRIO/Sec14 superfamily. BCH-containing proteins contain a hallmark sequence motif R(R/K)h(R/K)(R/K)NL(R/K)xhhhhHPs (‘h’ is large and hydrophobic residue and ‘s’ is small and weekly polar residue) and can be further subdivided into three unique subtypes associated with BNIP-2-N, macro- and RhoGAP-type protein domains. A previously unknown group of genes encoding ‘BCH-only’ domains is also identified in plants and arthropod species. Based on an analysis of their gene-structure and their protein domain context we hypothesize that BCH domain-containing genes evolved through gene duplication, intron insertions and domain swapping events. Furthermore, we explore the point of divergence between BCH and CRAL-TRIO proteins in relation to their ability to bind small GTPases, GAPs and GEFs and lipid ligands. Our study suggests a need for a more extensive analysis of previously uncharacterized BCH, ‘BCH-like’ and CRAL_TRIO-containing proteins and their significance in regulating signaling events involving small GTPases. PMID:22479462

  4. Structures of Human Pumilio with Noncognate RNAs Reveal Molecular Mechanisms for Binding Promiscuity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gupta,Y.; Nair, D.; Wharton, R.

    2008-01-01

    Pumilio is a founder member of the evolutionarily conserved Puf family of RNA-binding proteins that control a number of physiological processes in eukaryotes. A structure of human Pumilio (hPum) Puf domain bound to a Drosophila regulatory sequence showed that each Puf repeat recognizes a single nucleotide. Puf domains in general bind promiscuously to a large set of degenerate sequences, but the structural basis for this promiscuity has been unclear. Here, we describe the structures of hPum Puf domain complexed to two noncognate RNAs, CycBreverse and Puf5. In each complex, one of the nucleotides is ejected from the binding surface, inmore » effect, acting as a 'spacer.' The complexes also reveal the plasticity of several Puf repeats, which recognize noncanonical nucleotides. Together, these complexes provide a molecular basis for recognition of degenerate binding sites, which significantly increases the number of mRNAs targeted for regulation by Puf proteins in vivo.« less

  5. Functional and Structural Characterization of Novel Type of Linker Connecting Capsid and Nucleocapsid Protein Domains in Murine Leukemia Virus.

    PubMed

    Doležal, Michal; Hadravová, Romana; Kožíšek, Milan; Bednárová, Lucie; Langerová, Hana; Ruml, Tomáš; Rumlová, Michaela

    2016-09-23

    The assembly of immature retroviral particles is initiated in the cytoplasm by the binding of the structural polyprotein precursor Gag with viral genomic RNA. The protein interactions necessary for assembly are mediated predominantly by the capsid (CA) and nucleocapsid (NC) domains, which have conserved structures. In contrast, the structural arrangement of the CA-NC connecting region differs between retroviral species. In HIV-1 and Rous sarcoma virus, this region forms a rod-like structure that separates the CA and NC domains, whereas in Mason-Pfizer monkey virus, this region is densely packed, thus holding the CA and NC domains in close proximity. Interestingly, the sequence connecting the CA and NC domains in gammaretroviruses, such as murine leukemia virus (MLV), is unique. The sequence is called a charged assembly helix (CAH) due to a high number of positively and negatively charged residues. Although both computational and deletion analyses suggested that the MLV CAH forms a helical conformation, no structural or biochemical data supporting this hypothesis have been published. Using an in vitro assembly assay, alanine scanning mutagenesis, and biophysical techniques (circular dichroism, NMR, microcalorimetry, and electrophoretic mobility shift assay), we have characterized the structure and function of the MLV CAH. We provide experimental evidence that the MLV CAH belongs to a group of charged, E(R/K)-rich, single α-helices. This is the first single α-helix motif identified in viral proteins. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  6. Supplementary motor area as key structure for domain-general sequence processing: A unified account.

    PubMed

    Cona, Giorgia; Semenza, Carlo

    2017-01-01

    The Supplementary Motor Area (SMA) is considered as an anatomically and functionally heterogeneous region and is implicated in several functions. We propose that SMA plays a crucial role in domain-general sequence processes, contributing to the integration of sequential elements into higher-order representations regardless of the nature of such elements (e.g., motor, temporal, spatial, numerical, linguistic, etc.). This review emphasizes the domain-general involvement of the SMA, as this region has been found to support sequence operations in a variety of cognitive domains that, albeit different, share an inherent sequence processing. These include action, time and spatial processing, numerical cognition, music and language processing, and working memory. In this light, we reviewed and synthesized recent neuroimaging, stimulation and electrophysiological studies in order to compare and reconcile the distinct sources of data by proposing a unifying account for the role of the SMA. We also discussed the differential contribution of the pre-SMA and SMA-proper in sequence operations, and possible neural mechanisms by which such operations are executed. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Classification of protein quaternary structure by functional domain composition

    PubMed Central

    Yu, Xiaojing; Wang, Chuan; Li, Yixue

    2006-01-01

    Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572

  8. SH2/SH3 signaling proteins.

    PubMed

    Schlessinger, J

    1994-02-01

    SH2 and SH3 domains are small protein modules that mediate protein-protein interactions in signal transduction pathways that are activated by protein tyrosine kinases. SH2 domains bind to short phosphotyrosine-containing sequences in growth factor receptors and other phosphoproteins. SH3 domains bind to target proteins through sequences containing proline and hydrophobic amino acids. SH2 and SH3 domain containing proteins, such as Grb2 and phospholipase C gamma, utilize these modules in order to link receptor and cytoplasmic protein tyrosine kinases to the Ras signaling pathway and to phosphatidylinositol hydrolysis, respectively. The three-dimensional structures of several SH2 and SH3 domains have been determined by NMR and X-ray crystallography, and the molecular basis of their specificity is beginning to be unveiled.

  9. Improvement of training set structure in fusion data cleaning using Time-Domain Global Similarity method

    NASA Astrophysics Data System (ADS)

    Liu, J.; Lan, T.; Qin, H.

    2017-10-01

    Traditional data cleaning identifies dirty data by classifying original data sequences, which is a class-imbalanced problem since the proportion of incorrect data is much less than the proportion of correct ones for most diagnostic systems in Magnetic Confinement Fusion (MCF) devices. When using machine learning algorithms to classify diagnostic data based on class-imbalanced training set, most classifiers are biased towards the major class and show very poor classification rates on the minor class. By transforming the direct classification problem about original data sequences into a classification problem about the physical similarity between data sequences, the class-balanced effect of Time-Domain Global Similarity (TDGS) method on training set structure is investigated in this paper. Meanwhile, the impact of improved training set structure on data cleaning performance of TDGS method is demonstrated with an application example in EAST POlarimetry-INTerferometry (POINT) system.

  10. Classification and Lineage Tracing of SH2 Domains Throughout Eukaryotes.

    PubMed

    Liu, Bernard A

    2017-01-01

    Today there exists a rapidly expanding number of sequenced genomes. Cataloging protein interaction domains such as the Src Homology 2 (SH2) domain across these various genomes can be accomplished with ease due to existing algorithms and predictions models. An evolutionary analysis of SH2 domains provides a step towards understanding how SH2 proteins integrated with existing signaling networks to position phosphotyrosine signaling as a crucial driver of robust cellular communication networks in metazoans. However organizing and tracing SH2 domain across organisms and understanding their evolutionary trajectory remains a challenge. This chapter describes several methodologies towards analyzing the evolutionary trajectory of SH2 domains including a global SH2 domain classification system, which facilitates annotation of new SH2 sequences essential for tracing the lineage of SH2 domains throughout eukaryote evolution. This classification utilizes a combination of sequence homology, protein domain architecture and the boundary positions between introns and exons within the SH2 domain or genes encoding these domains. Discrete SH2 families can then be traced across various genomes to provide insight into its origins. Furthermore, additional methods for examining potential mechanisms for divergence of SH2 domains from structural changes to alterations in the protein domain content and genome duplication will be discussed. Therefore a better understanding of SH2 domain evolution may enhance our insight into the emergence of phosphotyrosine signaling and the expansion of protein interaction domains.

  11. An Internal Signal Sequence Directs Intramembrane Proteolysis of a Cellular Immunoglobulin Domain Protein*S⃞

    PubMed Central

    Robakis, Thalia; Bak, Beata; Lin, Shu-huei; Bernard, Daniel J.; Scheiffele, Peter

    2008-01-01

    Precursor proteolysis is a crucial mechanism for regulating protein structure and function. Signal peptidase (SP) is an enzyme with a well defined role in cleaving N-terminal signal sequences but no demonstrated function in the proteolysis of cellular precursor proteins. We provide evidence that SP mediates intraprotein cleavage of IgSF1, a large cellular Ig domain protein that is processed into two separate Ig domain proteins. In addition, our results suggest the involvement of signal peptide peptidase (SPP), an intramembrane protease, which acts on substrates that have been previously cleaved by SP. We show that IgSF1 is processed through sequential proteolysis by SP and SPP. Cleavage is directed by an internal signal sequence and generates two separate Ig domain proteins from a polytopic precursor. Our findings suggest that SP and SPP function are not restricted to N-terminal signal sequence cleavage but also contribute to the processing of cellular transmembrane proteins. PMID:18981173

  12. Conserved Features in the Structure, Mechanism, and Biogenesis of the Inverse Autotransporter Protein Family

    PubMed Central

    Heinz, Eva; Stubenrauch, Christopher J.; Grinter, Rhys; Croft, Nathan P.; Purcell, Anthony W.; Strugnell, Richard A.; Dougan, Gordon; Lithgow, Trevor

    2016-01-01

    The bacterial cell surface proteins intimin and invasin are virulence factors that share a common domain structure and bind selectively to host cell receptors in the course of bacterial pathogenesis. The β-barrel domains of intimin and invasin show significant sequence and structural similarities. Conversely, a variety of proteins with sometimes limited sequence similarity have also been annotated as “intimin-like” and “invasin” in genome datasets, while other recent work on apparently unrelated virulence-associated proteins ultimately revealed similarities to intimin and invasin. Here we characterize the sequence and structural relationships across this complex protein family. Surprisingly, intimins and invasins represent a very small minority of the sequence diversity in what has been previously the “intimin/invasin protein family”. Analysis of the assembly pathway for expression of the classic intimin, EaeA, and a characteristic example of the most prevalent members of the group, FdeC, revealed a dependence on the translocation and assembly module as a common feature for both these proteins. While the majority of the sequences in the grouping are most similar to FdeC, a further and widespread group is two-partner secretion systems that use the β-barrel domain as the delivery device for secretion of a variety of virulence factors. This comprehensive analysis supports the adoption of the “inverse autotransporter protein family” as the most accurate nomenclature for the family and, in turn, has important consequences for our overall understanding of the Type V secretion systems of bacterial pathogens. PMID:27190006

  13. Directed evolution of the TALE N-terminal domain for recognition of all 5' bases.

    PubMed

    Lamb, Brian M; Mercer, Andrew C; Barbas, Carlos F

    2013-11-01

    Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5'-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5' T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5' T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes.

  14. Origins and Structural Properties of Novel and De Novo Protein Domains During Insect Evolution.

    PubMed

    Klasberg, Steffen; Bitard-Feildel, Tristan; Callebaut, Isabelle; Bornberg-Bauer, Erich

    2018-05-26

    Over long time scales, protein evolution is characterised by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 my. We use established domain models and foldable domains delineated by Hydrophobic-Cluster-Analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, i.e. from previously non-coding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonisation of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multi-domain arrangements. Young domains, such as most HCA defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of denovo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterised by cross-species comparisons alone. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  15. Identification of a "glycine-loop"-like coiled structure in the 34 AA Pro,Gly,Met repeat domain of the biomineral-associated protein, PM27.

    PubMed

    Wustman, Brandon A; Santos, Rudolpho; Zhang, Bo; Evans, John Spencer

    2002-12-05

    Fracture resistance in biomineralized structures has been linked to the presence of proteins, some of which possess sequences that are associated with elastic behavior. One such protein superfamily, the Pro,Gly-rich sea urchin intracrystalline spicule matrix proteins, form protein-protein supramolecular assemblies that modify the microstructure and fracture-resistant properties of the calcium carbonate mineral phase within embryonic sea urchin spicules and adult sea urchin spines. In this report, we detail the identification of a repetitive keratin-like "glycine-loop"- or coil-like structure within the 34-AA (AA: amino acid) N-terminal domain, (PGMG)(8)PG, of the spicule matrix protein, PM27. The identification of this repetitive structural motif was accomplished using two capped model peptides: a 9-AA sequence, GPGMGPGMG, and a 34-AA peptide representing the entire motif. Using CD, NMR spectrometry, and molecular dynamics simulated annealing/minimization simulations, we have determined that the 9-AA model peptide adopts a loop-like structure at pH 7.4. The structure of the 34-AA polypeptide resembles a coil structure consisting of repeating loop motifs that do not exhibit long-range ordering. Given that loop structures have been associated with protein elastic behavior and protein motion, it is plausible that the 34-AA Pro,Gly,Met repeat sequence motif in PM27 represents a putative elastic or mobile domain. Copyright 2002 Wiley Periodicals, Inc.

  16. Molecular recognition of the Tes LIM2-3 domains by the actin-related protein Arp7A.

    PubMed

    Boëda, Batiste; Knowles, Phillip P; Briggs, David C; Murray-Rust, Judith; Soriano, Erika; Garvalov, Boyan K; McDonald, Neil Q; Way, Michael

    2011-04-01

    Actin-related proteins (Arps) are a highly conserved family of proteins that have extensive sequence and structural similarity to actin. All characterized Arps are components of large multimeric complexes associated with chromatin or the cytoskeleton. In addition, the human genome encodes five conserved but largely uncharacterized "orphan" Arps, which appear to be mostly testis-specific. Here we show that Arp7A, which has 43% sequence identity with β-actin, forms a complex with the cytoskeletal proteins Tes and Mena in the subacrosomal layer of round spermatids. The N-terminal 65-residue extension to the actin-like fold of Arp7A interacts directly with Tes. The crystal structure of the 1-65(Arp7A)·LIM2-3(Tes)·EVH1(Mena) complex reveals that residues 28-49 of Arp7A contact the LIM2-3 domains of Tes. Two alanine residues from Arp7A that occupy equivalent apolar pockets in both LIM domains as well as an intervening GPAK linker that binds the LIM2-3 junction are critical for the Arp7A-Tes interaction. Equivalent occupied apolar pockets are also seen in the tandem LIM domain structures of LMO4 and Lhx3 bound to unrelated ligands. Our results indicate that apolar pocket interactions are a common feature of tandem LIM domain interactions, but ligand specificity is principally determined by the linker sequence.

  17. Characterization of the ligand-binding site of the transferrin receptor in Trypanosoma brucei demonstrates a structural relationship with the N-terminal domain of the variant surface glycoprotein.

    PubMed

    Salmon, D; Hanocq-Quertier, J; Paturiaux-Hanocq, F; Pays, A; Tebabi, P; Nolan, D P; Michel, A; Pays, E

    1997-12-15

    The Trypanosoma brucei transferrin (Tf) receptor is a heterodimer encoded by ESAG7 and ESAG6, two genes contained in the different polycistronic transcription units of the variant surface glycoprotein (VSG) gene. The sequence of ESAG7/6 differs slightly between different units, so that receptors with different affinities for Tf are expressed alternatively following transcriptional switching of VSG expression sites during antigenic variation of the parasite. Based on the sequence homology between pESAG7/6 and the N-terminal domain of VSGs, it can be predicted that the four blocks containing the major sequence differences between pESAG7 and pESAG6 form surface-exposed loops and generate the ligand-binding site. The exchange of a few amino acids in this region between pESAG6s encoded by different VSG units greatly increased the affinity for bovine Tf. Similar changes in other regions were ineffective, while mutations predicted to alter the VSG-like structure abolished the binding. Chimeric proteins containing the N-terminal dimerization domain of VSG and the C-terminal half of either pESAG7 or pESAG6, which contains the ligand-binding domain, can form heterodimers that bind Tf. Taken together, these data provided evidence that the T.brucei Tf receptor is structurally related to the N-terminal domain of the VSG and that the ligand-binding site corresponds to the exposed surface loops of the protein.

  18. Thermal and chemical denaturation of the BRCT functional module of human 53BP1.

    PubMed

    Thanassoulas, Angelos; Nomikos, Michail; Theodoridou, Maria; Stavros, Philemon; Mastellos, Dimitris; Nounesis, George

    2011-10-01

    BRCTs are protein-docking modules involved in eukaryotic DNA repair. They are characterized by low sequence homology with generally well-conserved structure organization. In a considerable number of proteins, a pair of BRCT structural repeats occurs, connected with inter-BRCT linkers, variable in length, sequence and structure. Linkers may separate and control the relative position of BRCT domains as well as protect and stabilize the hydrophobic inter-BRCT interface region. Their vital role in protein function has been demonstrated by recent findings associating missense mutations in the inter-repeat linker region of the BRCT domain of BRCA1 (BRCA1-BRCT) to hereditary breast/ovarian cancer. The interaction of 53BP1 with the core domain of the p53 tumor suppressor involves the C-terminal BRCT repeat as well as the inert-BRCT linker of the tandem BRCT domain of 53BP1 (53BP1-BRCT). High-accuracy differential scanning calorimetry (DSC) and circular dichroism (CD) have been employed to characterize the heat-induced unfolding of 53BP1-BRCT domain. The calorimetric results provide evidence for unfolding to an intermediate, only partly unfolded state, which, based on the CD results, retains the secondary structural characteristics of the native protein. A direct comparison with the corresponding thermal processes for BRAC1-BRCT and BARD1-BRCT provides evidence that the observed behavior is analogous to BRCA1-BRCT even though the two domains differ substantially in the linker structure. Moreover, chemical denaturation experiments of the untagged 53BP1-BRCT and comparison with BRCA1 and BARD1 BRCTs show that no clear association can be drawn between the structural organization of the inter-BRCT linkers and the overall stability of the BRCT domains. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. Sequence Tolerance of a Highly Stable Single Domain Antibody: Comparison of Computational and Experimental Profiles

    DTIC Science & Technology

    2016-09-09

    evaluating 18 mutants using either the A or B conformer is only r = ~ 0.2. Given the poor performance of approximating the observed experimental ...1    Sequence Tolerance of a Highly Stable Single Domain Antibody: Comparison of Computational and Experimental Profiles Mark A. Olson,1 Patricia...unusually high thermal stability is explored by a combined computational and experimental study. Starting with the crystallographic structure

  20. Protein family clustering for structural genomics.

    PubMed

    Yan, Yongpan; Moult, John

    2005-10-28

    A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.

  1. Classification of proteins with shared motifs and internal repeats in the ECOD database

    PubMed Central

    Kinch, Lisa N.; Liao, Yuxing

    2016-01-01

    Abstract Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain‐like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade. PMID:26833690

  2. Cache domains that are homologous to, but different from PAS domains comprise the largest superfamily of extracellular sensors in prokaryotes

    DOE PAGES

    Upadhyay, Amit A.; Fleetwood, Aaron D.; Adebali, Ogun; ...

    2016-04-06

    Cellular receptors usually contain a designated sensory domain that recognizes the signal. Per/Arnt/Sim (PAS) domains are ubiquitous sensors in thousands of species ranging from bacteria to humans. Although PAS domains were described as intracellular sensors, recent structural studies revealed PAS-like domains in extracytoplasmic regions in several transmembrane receptors. However, these structurally defined extracellular PAS-like domains do not match sequence-derived PAS domain models, and thus their distribution across the genomic landscape remains largely unknown. Here we show that structurally defined extracellular PAS-like domains belong to the Cache superfamily, which is homologous to, but distinct from the PAS superfamily. Our newly builtmore » computational models enabled identification of Cache domains in tens of thousands of signal transduction proteins including those from important pathogens and model organisms.Moreover, we show that Cache domains comprise the dominant mode of extracellular sensing in prokaryotes.« less

  3. Cache domains that are homologous to, but different from PAS domains comprise the largest superfamily of extracellular sensors in prokaryotes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Upadhyay, Amit A.; Fleetwood, Aaron D.; Adebali, Ogun

    Cellular receptors usually contain a designated sensory domain that recognizes the signal. Per/Arnt/Sim (PAS) domains are ubiquitous sensors in thousands of species ranging from bacteria to humans. Although PAS domains were described as intracellular sensors, recent structural studies revealed PAS-like domains in extracytoplasmic regions in several transmembrane receptors. However, these structurally defined extracellular PAS-like domains do not match sequence-derived PAS domain models, and thus their distribution across the genomic landscape remains largely unknown. Here we show that structurally defined extracellular PAS-like domains belong to the Cache superfamily, which is homologous to, but distinct from the PAS superfamily. Our newly builtmore » computational models enabled identification of Cache domains in tens of thousands of signal transduction proteins including those from important pathogens and model organisms.Moreover, we show that Cache domains comprise the dominant mode of extracellular sensing in prokaryotes.« less

  4. Order within disorder: Aggrecan chondroitin sulphate-attachment region provides new structural insights into protein sequences classified as disordered

    PubMed Central

    Jowitt, Thomas A; Murdoch, Alan D; Baldock, Clair; Berry, Richard; Day, Joanna M; Hardingham, Timothy E

    2010-01-01

    Structural investigation of proteins containing large stretches of sequences without predicted secondary structure is the focus of much increased attention. Here, we have produced an unglycosylated 30 kDa peptide from the chondroitin sulphate (CS)-attachment region of human aggrecan (CS-peptide), which was predicted to be intrinsically disordered and compared its structure with the adjacent aggrecan G3 domain. Biophysical analyses, including analytical ultracentrifugation, light scattering, and circular dichroism showed that the CS-peptide had an elongated and stiffened conformation in contrast to the globular G3 domain. The results suggested that it contained significant secondary structure, which was sensitive to urea, and we propose that the CS-peptide forms an elongated wormlike molecule based on a dynamic range of energetically equivalent secondary structures stabilized by hydrogen bonds. The dimensions of the structure predicted from small-angle X-ray scattering analysis were compatible with EM images of fully glycosylated aggrecan and a partly glycosylated aggrecan CS2-G3 construct. The semiordered structure identified in CS-peptide was not predicted by common structural algorithms and identified a potentially distinct class of semiordered structure within sequences currently identified as disordered. Sequence comparisons suggested some evidence for comparable structures in proteins encoded by other genes (PRG4, MUC5B, and CBP). The function of these semiordered sequences may serve to spatially position attached folded modules and/or to present polypeptides for modification, such as glycosylation, and to provide templates for the multiple pleiotropic interactions proposed for disordered proteins. Proteins 2010. © 2010 Wiley-Liss, Inc. PMID:20806220

  5. Equation Chapter 1 Section 1Sequence-To-Conformation Relationships of Disordered Regions Tethered to Folded Domains of Proteins.

    PubMed

    Mittal, Anuradha; Holehouse, Alex S; Cohan, Megan C; Pappu, Rohit V

    2018-05-12

    Intrinsically disordered proteins and regions (IDPs / IDRs) are characterized by well-defined sequence-to-conformation relationships (SCRs). These relationships refer to the sequence-specific preferences for average sizes, shapes, residue-specific secondary structure propensities, and amplitudes of multiscale conformational fluctuations. SCRs are discerned from the sequence-specific conformational ensembles of IDPs. A vast majority of IDPs are actually tethered to folded domains (FDs). This raises the question of whether or not SCRs inferred for IDPs are applicable to IDRs tethered to folded domains. Here, we use atomistic simulations based on a well-established forcefield paradigm and an enhanced sampling method to obtain comparative assessments of SCRs for thirteen archetypal IDRs modeled as autonomous units, as C-terminal tails connected to folded domains, and as linkers between pairs of folded domains. Our studies uncover a set of general observations regarding context-independent versus context-dependent SCRs of IDRs. SCRs are minimally perturbed upon tethering to folded domains if the IDRs are deficient in charged residues and for polyampholytic IDRs where the oppositely charged residues within the sequence of the IDR are separated into distinct blocks. In contrast, the interplay between IDRs and tethered folded domains has a significant modulatory effect on SCRs if the IDRs have intermediate fractions of charged residues or if they have sequence-intrinsic conformational preferences for canonical random coils. Our findings suggest that IDRs with context-independent SCRs might be independent evolutionary modules whereas IDRs with context-dependent intrinsic SCRs might co-evolve with the FDs to which they are tethered. Copyright © 2018. Published by Elsevier Ltd.

  6. Domain structure sequence in ferroelectric Pb(Zr0.2Ti0.8)O3 thin film on MgO

    NASA Astrophysics Data System (ADS)

    Janolin, Pierre-Eymeric; Fraisse, Bernard; Dkhil, Brahim; Le Marrec, Françoise; Ringgaard, Erling

    2007-04-01

    The structural evolution of a polydomain ferroelectric Pb(Zr0.2Ti0.8)O3 film was studied by temperature-dependent x-ray diffraction. Two critical temperatures were evidenced: T*=740K, corresponding to a change in the domain structure (a /c/a/c to a1/a2/a1/a2), and TCfilm=825K, where the film undergoes a ferroelectric-paraelectric phase transition. The film remains tetragonal on the whole range of temperature investigated. The evolutions of the domain structure and lattice parameters were found to be in very good agreement with the calculated domain stability map and theoretical temperature-misfit strain phase diagram, respectively.

  7. Sequence characterization of S100A8 gene reveals structural differences of protein and transcriptional factor binding sites in water buffalo and yak.

    PubMed

    Kathiravan, P; Goyal, S; Kataria, R S; Mishra, B P; Jayakumar, S; Joshi, B K

    2011-01-01

    The present study was undertaken to characterize the structure of S100A8 gene and its promoter in water buffalo and yak. Sequence data of 2.067 kb, 2.071 kb, and 2.052 kb with respect to complete S100A8 gene including 5' flanking region was generated in river buffalo, swamp buffalo, and yak, respectively. BLAST analysis of coding DNA sequences (CDS) of S100A8 gene revealed 95% homology of buffalo sequence with cattle, 85% with pig and horse, 83% with dog, 72-73% with murines, and around 79% with primates and humans. Phylogenetic analysis of predicted CDS revealed distinct clustering of murines, primates, and domestic animals with bovines and bubalines forming a subcluster among farm animals. In silico translation of predicted CDS revealed a sequence of 89 amino acids with 7 amino acid changes between cattle and buffalo and 2 changes between cattle and yak. The search for Pfam family revealed the N-terminal calcium binding domain and the noncanonical EF hand domain in the carboxy terminus, with more variations being observed in the N-terminal domain among different species. Two amino acid changes observed in carboxy terminal EF hand domain resulted in altered secondary structure of yak S100A8 protein. Analysis of S100A8 gene promoter revealed 14 putative motifs for transcriptional factor binding sites. Two putative motifs viz. C/EBP and v-Myb were found to be absent in swamp buffalo as compared to river buffalo and cattle. Differences in the structure of S100A8 protein and the transcriptional factor binding sites identified in the present study need to be analyzed further for their functional significance in yak and swamp buffalo respectively. Copyright © Taylor & Francis Group, LLC

  8. A comprehensive analysis of the Omp85/TpsB protein superfamily structural diversity, taxonomic occurrence, and evolution

    PubMed Central

    Heinz, Eva; Lithgow, Trevor

    2014-01-01

    Members of the Omp85/TpsB protein superfamily are ubiquitously distributed in Gram-negative bacteria, and function in protein translocation (e.g., FhaC) or the assembly of outer membrane proteins (e.g., BamA). Several recent findings are suggestive of a further level of variation in the superfamily, including the identification of the novel membrane protein assembly factor TamA and protein translocase PlpD. To investigate the diversity and the causal evolutionary events, we undertook a comprehensive comparative sequence analysis of the Omp85/TpsB proteins. A total of 10 protein subfamilies were apparent, distinguished in their domain structure and sequence signatures. In addition to the proteins FhaC, BamA, and TamA, for which structural and functional information is available, are families of proteins with so far undescribed domain architectures linked to the Omp85 β-barrel domain. This study brings a classification structure to a dynamic protein superfamily of high interest given its essential function for Gram-negative bacteria as well as its diverse domain architecture, and we discuss several scenarios of putative functions of these so far undescribed proteins. PMID:25101071

  9. Systematic Analysis of Primary Sequence Domain Segments for the Discrimination Between Class C GPCR Subtypes.

    PubMed

    König, Caroline; Alquézar, René; Vellido, Alfredo; Giraldo, Jesús

    2018-03-01

    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.

  10. AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain–domain interaction prediction

    PubMed Central

    Xu, Dong; Jaroszewski, Lukasz; Li, Zhanwen; Godzik, Adam

    2015-01-01

    Motivation: Most proteins consist of multiple domains, independent structural and evolutionary units that are often reshuffled in genomic rearrangements to form new protein architectures. Template-based modeling methods can often detect homologous templates for individual domains, but templates that could be used to model the entire query protein are often not available. Results: We have developed a fast docking algorithm ab initio domain assembly (AIDA) for assembling multi-domain protein structures, guided by the ab initio folding potential. This approach can be extended to discontinuous domains (i.e. domains with ‘inserted’ domains). When tested on experimentally solved structures of multi-domain proteins, the relative domain positions were accurately found among top 5000 models in 86% of cases. AIDA server can use domain assignments provided by the user or predict them from the provided sequence. The latter approach is particularly useful for automated protein structure prediction servers. The blind test consisting of 95 CASP10 targets shows that domain boundaries could be successfully determined for 97% of targets. Availability and implementation: The AIDA package as well as the benchmark sets used here are available for download at http://ffas.burnham.org/AIDA/. Contact: adam@sanfordburnham.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25701568

  11. Identification of linker regions and domain borders of the transcription activator protein NtrC from Escherichia coli by limited proteolysis, in-gel digestion, and mass spectrometry.

    PubMed

    Bantscheff, M; Weiss, V; Glocker, M O

    1999-08-24

    We have developed a mass spectrometry based method for the identification of linker regions and domain borders in multidomain proteins. This approach combines limited proteolysis and in-gel proteolytic digestions and was applied to the determination of linkers in the transcription factor NtrC from Escherichia coli. Limited proteolysis of NtrC with thermolysin and papain revealed that initial digestion yielded two major bands in SDS-PAGE that were identified by mass spectrometry as the R-domain and the still covalently linked OC-domains. Subsequent steps in limited proteolysis afforded further cleavage of the OC-fragment into the O- and the C-domain at accessible amino acid residues. Mass spectrometric identification of the tryptic/thermolytic peptides obtained after in-gel total proteolysis of the SDS-PAGE-separated domains determined the domain borders and showed that the protease accessible linker between R- and O-domain comprised amino acids Val-131 and Gln-132 within the "Q-linker" in agreement with papain and subtilisin digestion. The region between amino acid residues Thr-389 and Gln-396 marked the hitherto unknown linker sequence that connects the O- with the C-domain. High abundances of proline-, alanine-, serine-, and glutamic acid residues were found in this linker structure (PASE-linker) of related NtrC response regulator proteins. While R- and C-domains remained stable under the applied limited proteolysis conditions, the O-domain was further truncated yielding a core fragment that comprised the sequence from Ile-140 to Arg-320. ATPase activity was lost after separation of the R-domain from the OC-fragment. However, binding of OC- and C- fragments to specific DNA was observed by characteristic band-shifts in migration retardation assays, indicating intact tertiary structures of the C-domain. The outlined strategy proved to be highly efficient and afforded lead information of tertiary structural features necessary for protein design and engineering and for structure-function studies.

  12. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  13. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    PubMed

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-03-10

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.

  14. HMMerThread: Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

    PubMed Central

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-01-01

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de. PMID:21423752

  15. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    PubMed Central

    2002-01-01

    Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134

  16. Structural studies of MFE-1: the 1.9 A crystal structure of the dehydrogenase part of rat peroxisomal MFE-1.

    PubMed

    Taskinen, Jukka P; Kiema, Tiila R; Hiltunen, J Kalervo; Wierenga, Rik K

    2006-01-27

    The 1.9 A structure of the C-terminal dehydrogenase part of the rat peroxisomal monomeric multifunctional enzyme type 1 (MFE-1) has been determined. In this construct (residues 260-722 and referred to as MFE1-DH) the N-terminal hydratase part of MFE-1 has been deleted. The structure of MFE1-DH shows that it consists of an N-terminal helix, followed by a Rossmann-fold domain (domain C), followed by two tightly associated helical domains (domains D and E), which have similar topology. The structure of MFE1-DH is compared with the two known homologous structures: human mitochondrial 3-hydroxyacyl-CoA dehydrogenase (HAD; sequence identity is 33%) (which is dimeric and monofunctional) and with the dimeric multifunctional alpha-chain (alphaFOM; sequence identity is 28%) of the bacterial fatty acid beta-oxidation alpha2beta2-multienzyme complex. Like MFE-1, alphaFOM has an N-terminal hydratase part and a C-terminal dehydrogenase part, and the structure comparisons show that the N-terminal helix of MFE1-DH corresponds to the alphaFOM linker helix, located between its hydratase and dehydrogenase part. It is also shown that this helix corresponds to the C-terminal helix-10 of the hydratase/isomerase superfamily, suggesting that functionally it belongs to the N-terminal hydratase part of MFE-1.

  17. Using the structure-function linkage database to characterize functional domains in enzymes.

    PubMed

    Brown, Shoshana; Babbitt, Patricia

    2014-12-12

    The Structure-Function Linkage Database (SFLD; http://sfld.rbvi.ucsf.edu/) is a Web-accessible database designed to link enzyme sequence, structure, and functional information. This unit describes the protocols by which a user may query the database to predict the function of uncharacterized enzymes and to correct misannotated functional assignments. The information in this unit is especially useful in helping a user discriminate functional capabilities of a sequence that is only distantly related to characterized sequences in publicly available databases. Copyright © 2014 John Wiley & Sons, Inc.

  18. Structural basis of Bloom syndrome (BS) causing mutations in the BLM helicase domain.

    PubMed Central

    Rong, S. B.; Väliaho, J.; Vihinen, M.

    2000-01-01

    BACKGROUND: Bloom syndrome (BS) is characterized by mutations within the BLM gene. The Bloom syndrome protein (BLM) has similarity to the RecQ subfamily of DNA helicases, which contain seven conserved helicase domains and share significant sequence and structural similarity with the Rep and PcrA DNA helicases. We modeled the three-dimensional structure of the BLM helicase domain to analyze the structural basis of BS-causing mutations. MATERIALS AND METHODS: The sequence alignment was performed for RecQ DNA helicases and Rep and PcrA helicases. The crystal structure of PcrA helicase (PDB entry 3PJR) was used as the template for modeling the BLM helicase domain. The model was used to infer the function of BLM and to analyze the effect of the mutations. RESULTS: The structural model with good stereochemistry of the BLM helicase domain contains two subdomains, 1A and 2A. The electrostatic potential of the model is highly negative over most of the surface, except for the cleft between subdomains 1A and 2A which is similar to the template protein. The ATP-binding site is located inside the model between subdomains 1A and 2A; whereas, the DNA-binding region is situated at the surface cleft, with positive potential between 1A and 2A. CONCLUSIONS: The three-dimensional structure of the BLM helicase domain was modeled and applied to interpret BS-causing mutations. The mutation I841T is likely to weaken DNA binding, while the mutations C891R, C901Y, and Q672R presumably disturb the ATP binding. In addition, other critical positions are discussed. PMID:10965492

  19. Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter

    PubMed Central

    Roux-Rouquie, Magali; Marilley, Monique

    2000-01-01

    We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X.laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed. PMID:10982860

  20. The structure and dynamics of tandem WW domains in a negative regulator of notch signaling, Suppressor of deltex.

    PubMed

    Fedoroff, Oleg Y; Townson, Sharon A; Golovanov, Alexander P; Baron, Martin; Avis, Johanna M

    2004-08-13

    WW domains mediate protein recognition, usually though binding to proline-rich sequences. In many proteins, WW domains occur in tandem arrays. Whether or how individual domains within such arrays cooperate to recognize biological partners is, as yet, poorly characterized. An important question is whether functional diversity of different WW domain proteins is reflected in the structural organization and ligand interaction mechanisms of their multiple domains. We have determined the solution structure and dynamics of a pair of WW domains (WW3-4) from a Drosophila Nedd4 family protein called Suppressor of deltex (Su(dx)), a regulator of Notch receptor signaling. We find that the binding of a type 1 PPPY ligand to WW3 stabilizes the structure with effects propagating to the WW4 domain, a domain that is not active for ligand binding. Both WW domains adopt the characteristic triple-stranded beta-sheet structure, and significantly, this is the first example of a WW domain structure to include a domain (WW4) lacking the second conserved Trp (replaced by Phe). The domains are connected by a flexible linker, which allows a hinge-like motion of domains that may be important for the recognition of functionally relevant targets. Our results contrast markedly with those of the only previously determined three-dimensional structure of tandem WW domains, that of the rigidly oriented WW domain pair from the RNA-splicing factor Prp40. Our data illustrate that arrays of WW domains can exhibit a variety of higher order structures and ligand interaction mechanisms.

  1. [NMR structure and dynamics of the chimeric protein SH3-F2].

    PubMed

    Kutyshenko, V P; Gushchina, L V; Khristoforov, V S; Prokhorov, D A; Timchenko, M A; Kudrevatykh, Iu A; Fediukina, D V; Filimonov, V V

    2010-01-01

    For the further elucidation of structural and dynamic principles of protein self-organization and protein-ligand interactions the design of new chimeric protein SH3-F2 was made and genetically engineered construct was created. The SH3-F2 amino acid sequence consists of polyproline ligand mgAPPLPPYSA, GG linker and the sequence of spectrin SH3 domain circular permutant S19-P20s. Structural and dynamics properties of the protein were studied by high-resolution NMR. According to NMR data the tertiary structure of the chimeric protein SH3-F2 has the topology which is typical of SH3 domains in the complex with the ligand, forming polyproline type II helix, located in the conservative region of binding in the orientation II. The polyproline ligand closely adjoins with the protein globule and is stabilized by hydrophobic interactions. However the interaction of ligand and the part of globule relative to SH3 domain is not too large because the analysis of protein dynamic characteristics points to the low amplitude, high-frequency ligand tumbling in relation to the slow intramolecular motions of the main globule. The constructed chimera permits to carry out further structural and thermodynamic investigations of polyproline helix properties and its interaction with regulatory domains.

  2. Structure of a Spumaretrovirus Gag Central Domain Reveals an Ancient Retroviral Capsid

    PubMed Central

    Dutta, Moumita; Pollard, Dominic J.; Goldstone, David C.; Ramos, Andres; Müllers, Erik; Stirnnagel, Kristin; Stanke, Nicole; Lindemann, Dirk; Taylor, William R.; Rosenthal, Peter B.

    2016-01-01

    The Spumaretrovirinae, or foamy viruses (FVs) are complex retroviruses that infect many species of monkey and ape. Despite little sequence homology, FV and orthoretroviral Gag proteins perform equivalent functions, including genome packaging, virion assembly, trafficking and membrane targeting. However, there is a paucity of structural information for FVs and it is unclear how disparate FV and orthoretroviral Gag molecules share the same function. To probe the functional overlap of FV and orthoretroviral Gag we have determined the structure of a central region of Gag from the Prototype FV (PFV). The structure comprises two all α-helical domains NtDCEN and CtDCEN that although they have no sequence similarity, we show they share the same core fold as the N- (NtDCA) and C-terminal domains (CtDCA) of archetypal orthoretroviral capsid protein (CA). Moreover, structural comparisons with orthoretroviral CA align PFV NtDCEN and CtDCEN with NtDCA and CtDCA respectively. Further in vitro and functional virological assays reveal that residues making inter-domain NtDCEN—CtDCEN interactions are required for PFV capsid assembly and that intact capsid is required for PFV reverse transcription. These data provide the first information that relates the Gag proteins of Spuma and Orthoretrovirinae and suggests a common ancestor for both lineages containing an ancient CA fold. PMID:27829070

  3. Structure of the first representative of Pfam family PF09410 (DUF2006) reveals a structural signature of the calycin superfamily that suggests a role in lipid metabolism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chiu, Hsiu-Ju; Bakolitsa, Constantina; Skerra, Arne

    The first structural representative of the domain of unknown function DUF2006 family, also known as Pfam family PF09410, comprises a lipocalin-like fold with domain duplication. The finding of the calycin signature in the N-terminal domain, combined with remote sequence similarity to two other protein families (PF07143 and PF08622) implicated in isoprenoid metabolism and the oxidative stress response, support an involvement in lipid metabolism. Clusters of conserved residues that interact with ligand mimetics suggest that the binding and regulation sites map to the N-terminal domain and to the interdomain interface, respectively.

  4. Prediction of Protein Structure by Template-Based Modeling Combined with the UNRES Force Field.

    PubMed

    Krupa, Paweł; Mozolewska, Magdalena A; Joo, Keehyoung; Lee, Jooyoung; Czaplewski, Cezary; Liwo, Adam

    2015-06-22

    A new approach to the prediction of protein structures that uses distance and backbone virtual-bond dihedral angle restraints derived from template-based models and simulations with the united residue (UNRES) force field is proposed. The approach combines the accuracy and reliability of template-based methods for the segments of the target sequence with high similarity to those having known structures with the ability of UNRES to pack the domains correctly. Multiplexed replica-exchange molecular dynamics with restraints derived from template-based models of a given target, in which each restraint is weighted according to the accuracy of the prediction of the corresponding section of the molecule, is used to search the conformational space, and the weighted histogram analysis method and cluster analysis are applied to determine the families of the most probable conformations, from which candidate predictions are selected. To test the capability of the method to recover template-based models from restraints, five single-domain proteins with structures that have been well-predicted by template-based methods were used; it was found that the resulting structures were of the same quality as the best of the original models. To assess whether the new approach can improve template-based predictions with incorrectly predicted domain packing, four such targets were selected from the CASP10 targets; for three of them the new approach resulted in significantly better predictions compared with the original template-based models. The new approach can be used to predict the structures of proteins for which good templates can be found for sections of the sequence or an overall good template can be found for the entire sequence but the prediction quality is remarkably weaker in putative domain-linker regions.

  5. The gene for stinging nettle lectin (Urtica dioica agglutinin) encodes both a lectin and a chitinase.

    PubMed

    Lerner, D R; Raikhel, N V

    1992-06-05

    Chitin-binding proteins are present in a wide range of plant species, including both monocots and dicots, even though these plants contain no chitin. To investigate the relationship between in vitro antifungal and insecticidal activities of chitin-binding proteins and their unknown endogenous functions, the stinging nettle lectin (Urtica dioica agglutinin, UDA) cDNA was cloned using a synthetic gene as the probe. The nettle lectin cDNA clone contained an open reading frame encoding 374 amino acids. Analysis of the deduced amino acid sequence revealed a 21-amino acid putative signal sequence and the 86 amino acids encoding the two chitin-binding domains of nettle lectin. These domains were fused to a 19-amino acid "spacer" domain and a 244-amino acid carboxyl extension with partial identity to a chitinase catalytic domain. The authenticity of the cDNA clone was confirmed by deduced amino acid sequence identity with sequence data obtained from tryptic digests, RNA gel blot, and polymerase chain reaction analyses. RNA gel blot analysis also showed the nettle lectin message was present primarily in rhizomes and inflorescence (with immature seeds) but not in leaves or stems. Chitinase enzymatic activity was found when the chitinase-like domain alone or the chitinase-like domain with the chitin-binding domains were expressed in Escherichia coli. This is the first example of a chitin-binding protein with both a duplication of the 43-amino acid chitin-binding domain and a fusion of the chitin-binding domains to a structurally unrelated domain, the chitinase domain.

  6. FOLD-EM: automated fold recognition in medium- and low-resolution (4-15 Å) electron density maps.

    PubMed

    Saha, Mitul; Morais, Marc C

    2012-12-15

    Owing to the size and complexity of large multi-component biological assemblies, the most tractable approach to determining their atomic structure is often to fit high-resolution radiographic or nuclear magnetic resonance structures of isolated components into lower resolution electron density maps of the larger assembly obtained using cryo-electron microscopy (cryo-EM). This hybrid approach to structure determination requires that an atomic resolution structure of each component, or a suitable homolog, is available. If neither is available, then the amount of structural information regarding that component is limited by the resolution of the cryo-EM map. However, even if a suitable homolog cannot be identified using sequence analysis, a search for structural homologs should still be performed because structural homology often persists throughout evolution even when sequence homology is undetectable, As macromolecules can often be described as a collection of independently folded domains, one way of searching for structural homologs would be to systematically fit representative domain structures from a protein domain database into the medium/low resolution cryo-EM map and return the best fits. Taken together, the best fitting non-overlapping structures would constitute a 'mosaic' backbone model of the assembly that could aid map interpretation and illuminate biological function. Using the computational principles of the Scale-Invariant Feature Transform (SIFT), we have developed FOLD-EM-a computational tool that can identify folded macromolecular domains in medium to low resolution (4-15 Å) electron density maps and return a model of the constituent polypeptides in a fully automated fashion. As a by-product, FOLD-EM can also do flexible multi-domain fitting that may provide insight into conformational changes that occur in macromolecular assemblies.

  7. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.

    PubMed

    Keel, Brittney N; Deng, Bo; Moriyama, Etsuko N

    2018-04-15

    Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. emoriyama2@unl.edu. Supplementary data are available at Bioinformatics online.

  8. Crystal structure of tandem type III fibronectin domains from Drosophila neuroglian at 2.0 A.

    PubMed

    Huber, A H; Wang, Y M; Bieber, A J; Bjorkman, P J

    1994-04-01

    We report the crystal structure of two adjacent fibronectin type III repeats from the Drosophila neural cell adhesion molecule neuroglian. Each domain consists of two antiparallel beta sheets and is folded topologically identically to single fibronectin type III domains from the extracellular matrix proteins tenascin and fibronectin. beta bulges and left-handed polyproline II helices disrupt the regular beta sheet structure of both neuroglian domains. The hydrophobic interdomain interface includes a metal-binding site, presumably involved in stabilizing the relative orientation between domains and predicted by sequence comparision to be present in the vertebrate homolog molecule L1. The neuroglian domains are related by a near perfect 2-fold screw axis along the longest molecular dimension. Using this relationship, a model for arrays of tandem fibronectin type III repeats in neuroglian and other molecules is proposed.

  9. The structure of the nucleoprotein binding domain of lyssavirus phosphoprotein reveals a structural relationship between the N-RNA binding domains of Rhabdoviridae and Paramyxoviridae.

    PubMed

    Delmas, Olivier; Assenberg, Rene; Grimes, Jonathan M; Bourhy, Hervé

    2010-01-01

    The phosphoprotein P of non-segmented negative-sense RNA viruses is an essential component of the replication and transcription complex and acts as a co-factor for the viral RNA-dependent RNA polymerase. P recruits the viral polymerase to the nucleoprotein-bound viral RNA (N-RNA) via an interaction between its C-terminal domain and the N-RNA complex. We have obtained the structure of the C-terminal domain of P of Mokola virus (MOKV), a lyssavirus that belongs to the Rhabdoviridae family and mapped at the amino acid level the crucial positions involved in interaction with N and in the formation of the viral replication complex. Comparison of the N-RNA binding domains of P solved to date suggests that the N-RNA binding domains are structurally conserved among paramyxoviruses and rhabdoviruses in spite of low sequence conservation. We also review the numerous other functions of this domain and more generally of the phosphoprotein.

  10. Secondary structure prediction for complete rDNA sequences (18S, 5.8S, and 28S rDNA) of Demodex folliculorum, and comparison of divergent domains structures across Acari.

    PubMed

    Zhao, Ya-E; Wang, Zheng-Hang; Xu, Yang; Wu, Li-Ping; Hu, Li

    2013-10-01

    According to base pairing, the rRNA folds into corresponding secondary structures, which contain additional phylogenetic information. On the basis of sequencing for complete rDNA sequences (18S, ITS1, 5.8S, ITS2 and 28S rDNA) of Demodex, we predicted the secondary structure of the complete rDNA sequence (18S, 5.8S, and 28S rDNA) of Demodex folliculorum, which was in concordance with that of the main arthropod lineages in past studies. And together with the sequence data from GenBank, we also predicted the secondary structures of divergent domains in SSU rRNA of 51 species and in LSU rRNA of 43 species from four superfamilies in Acari (Cheyletoidea, Tetranychoidea, Analgoidea and Ixodoidea). The multiple alignment among the four superfamilies in Acari showed that, insertions from Tetranychoidea SSU rRNA formed two newly proposed helixes, and helix c3-2b of LSU rRNA was absent in Demodex (Cheyletoidea) taxa. Generally speaking, LSU rRNA presented more remarkable differences than SSU rRNA did, mainly in D2, D3, D5, D7a, D7b, D8 and D10. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Comparison between TRF2 and TRF1 of their telomeric DNA-bound structures and DNA-binding activities

    PubMed Central

    Hanaoka, Shingo; Nagadoi, Aritaka; Nishimura, Yoshifumi

    2005-01-01

    Mammalian telomeres consist of long tandem arrays of double-stranded telomeric TTAGGG repeats packaged by the telomeric DNA-binding proteins TRF1 and TRF2. Both contain a similar C-terminal Myb domain that mediates sequence-specific binding to telomeric DNA. In a DNA complex of TRF1, only the single Myb-like domain consisting of three helices can bind specifically to double-stranded telomeric DNA. TRF2 also binds to double-stranded telomeric DNA. Although the DNA binding mode of TRF2 is likely identical to that of TRF1, TRF2 plays an important role in the t-loop formation that protects the ends of telomeres. Here, to clarify the details of the double-stranded telomeric DNA-binding modes of TRF1 and TRF2, we determined the solution structure of the DNA-binding domain of human TRF2 bound to telomeric DNA; it consists of three helices, and like TRF1, the third helix recognizes TAGGG sequence in the major groove of DNA with the N-terminal arm locating in the minor groove. However, small but significant differences are observed; in contrast to the minor groove recognition of TRF1, in which an arginine residue recognizes the TT sequence, a lysine residue of TRF2 interacts with the TT part. We examined the telomeric DNA-binding activities of both DNA-binding domains of TRF1 and TRF2 and found that TRF1 binds more strongly than TRF2. Based on the structural differences of both domains, we created several mutants of the DNA-binding domain of TRF2 with stronger binding activities compared to the wild-type TRF2. PMID:15608118

  12. Hidden Structural Codes in Protein Intrinsic Disorder.

    PubMed

    Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo

    2017-10-17

    Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.

  13. Sequence-structure mapping errors in the PDB: OB-fold domains

    PubMed Central

    Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee

    2004-01-01

    The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161

  14. Concomitant prediction of function and fold at the domain level with GO-based profiles.

    PubMed

    Lopez, Daniel; Pazos, Florencio

    2013-01-01

    Predicting the function of newly sequenced proteins is crucial due to the pace at which these raw sequences are being obtained. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the allocated function. This is not a limitation of the methodologies themselves but it is due to the fact that in the databases of functional annotations these methods use for transferring functional terms to new proteins, these annotations are done on a whole-chain basis. Nevertheless, domains are the basic evolutionary and often functional units of proteins. In many cases, the domains of a protein chain have distinct molecular functions, independent from each other. For that reason resources with functional annotations at the domain level, as well as methodologies for predicting function for individual domains adapted to these resources are required.We present a methodology for predicting the molecular function of individual domains, based on a previously developed database of functional annotations at the domain level. The approach, which we show outperforms a standard method based on sequence searches in assigning function, concomitantly predicts the structural fold of the domains and can give hints on the functionally important residues associated to the predicted function.

  15. Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST.

    PubMed

    Goonesekere, Nalin Cw

    2009-01-01

    The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.

  16. Did Convergent Protein Evolution Enable Phytoplasmas to Generate 'Zombie Plants'?

    PubMed

    Rümpler, Florian; Gramzow, Lydia; Theißen, Günter; Melzer, Rainer

    2015-12-01

    Phytoplasmas are pathogenic bacteria that reprogram plant development such that leaf-like structures instead of floral organs develop. Infected plants are sterile and mainly serve to propagate phytoplasmas and thus have been termed 'zombie plants'. The developmental reprogramming relies on specific interactions of the phytoplasma protein SAP54 with a small subset of MADS-domain transcription factors. Here, we propose that SAP54 folds into a structure that is similar to that of the K-domain, a protein-protein interaction domain of MADS-domain proteins. We suggest that undergoing convergent structural and sequence evolution, SAP54 evolved to mimic the K-domain. Given the high specificity of resulting developmental alterations, phytoplasmas might be used to study flower development in genetically intractable plants. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. A structural portrait of the PDZ domain family.

    PubMed

    Ernst, Andreas; Appleton, Brent A; Ivarsson, Ylva; Zhang, Yingnan; Gfeller, David; Wiesmann, Christian; Sidhu, Sachdev S

    2014-10-23

    PDZ (PSD-95/Discs-large/ZO1) domains are interaction modules that typically bind to specific C-terminal sequences of partner proteins and assemble signaling complexes in multicellular organisms. We have analyzed the existing database of PDZ domain structures in the context of a specificity tree based on binding specificities defined by peptide-phage binding selections. We have identified 16 structures of PDZ domains in complex with high-affinity ligands and have elucidated four additional structures to assemble a structural database that covers most of the branches of the PDZ specificity tree. A detailed comparison of the structures reveals features that are responsible for the diverse specificities across the PDZ domain family. Specificity differences can be explained by differences in PDZ residues that are in contact with the peptide ligands, but these contacts involve both side-chain and main-chain interactions. Most PDZ domains bind peptides in a canonical conformation in which the ligand main chain adopts an extended β-strand conformation by interacting in an antiparallel fashion with a PDZ β-strand. However, a subset of PDZ domains bind peptides with a bent main-chain conformation and the specificities of these non-canonical domains could not be explained based on canonical structures. Our analysis provides a structural portrait of the PDZ domain family, which serves as a guide in understanding the structural basis for the diverse specificities across the family. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. A chitin deacetylase of Podospora anserina has two functional chitin binding domains and a unique mode of action.

    PubMed

    Hoßbach, Janina; Bußwinkel, Franziska; Kranz, Andreas; Wattjes, Jasper; Cord-Landwehr, Stefan; Moerschbacher, Bruno M

    2018-03-01

    Chitosan is a structurally diverse biopolymer that is commercially derived from chitin by chemical processing, but chitin deacetylases (CDAs) potentially offer a sustainable and more controllable approach allowing the production of chitosans with tailored structures and biological activities. We investigated the CDA from Podospora anserina (PaCDA) which is closely related to Colletotrichum lindemuthianum CDA in the catalytic domain, but unique in having two chitin-binding domains. We produced recombinant PaCDA in Hansenula polymorpha for biochemical characterization and found that the catalytic domain of PaCDA is also functionally similar to C. lindemuthianum CDA, though differing in detail. When studying the enzyme's mode of action on chitin oligomers by quantitative mass-spectrometric sequencing, we found almost all possible sequences up to full deacetylation but with a clear preference for specific products. Deletion muteins lacking one or both CBDs confirmed their proposed function in supporting the enzymatic conversion of the insoluble substrate colloidal chitin. Copyright © 2017. Published by Elsevier Ltd.

  19. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    PubMed

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  20. Directed evolution of the TALE N-terminal domain for recognition of all 5′ bases

    PubMed Central

    Lamb, Brian M.; Mercer, Andrew C.; Barbas, Carlos F.

    2013-01-01

    Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5′-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5′ T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5′ T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes. PMID:23980031

  1. Solution structure of a DNA mimicking motif of an RNA aptamer against transcription factor AML1 Runt domain.

    PubMed

    Nomura, Yusuke; Tanaka, Yoichiro; Fukunaga, Jun-ichi; Fujiwara, Kazuya; Chiba, Manabu; Iibuchi, Hiroaki; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Kozu, Tomoko; Sakamoto, Taiichi

    2013-12-01

    AML1/RUNX1 is an essential transcription factor involved in the differentiation of hematopoietic cells. AML1 binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. In a previous study, we obtained RNA aptamers against the AML1 Runt domain by systematic evolution of ligands by exponential enrichment and revealed that RNA aptamers exhibit higher affinity for the Runt domain than that for RDE and possess the 5'-GCGMGNN-3' and 5'-N'N'CCAC-3' conserved motif (M: A or C; N and N' form Watson-Crick base pairs) that is important for Runt domain binding. In this study, to understand the structural basis of recognition of the Runt domain by the aptamer motif, the solution structure of a 22-mer RNA was determined using nuclear magnetic resonance. The motif contains the AH(+)-C mismatch and base triple and adopts an unusual backbone structure. Structural analysis of the aptamer motif indicated that the aptamer binds to the Runt domain by mimicking the RDE sequence and structure. Our data should enhance the understanding of the structural basis of DNA mimicry by RNA molecules.

  2. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    PubMed

    Smith, Colin A; Kortemme, Tanja

    2011-01-01

    Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  3. Characterization of a novel domain ‘GATE’ in the ABC protein DrrA and its role in drug efflux by the DrrAB complex

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Han; Rahman, Sadia; Li, Wen

    2015-03-27

    A novel domain, GATE (Glycine-loop And Transducer Element), is identified in the ABC protein DrrA. This domain shows sequence and structural conservation among close homologs of DrrA as well as distantly-related ABC proteins. Among the highly conserved residues in this domain are three glycines, G215, G221 and G231, of which G215 was found to be critical for stable expression of the DrrAB complex. Other conserved residues, including E201, G221, K227 and G231, were found to be critical for the catalytic and transport functions of the DrrAB transporter. Structural analysis of both the previously published crystal structure of the DrrA homologmore » MalK and the modeled structure of DrrA showed that G215 makes close contacts with residues in and around the Walker A motif, suggesting that these interactions may be critical for maintaining the integrity of the ATP binding pocket as well as the complex. It is also shown that G215A or K227R mutation diminishes some of the atomic interactions essential for ATP catalysis and overall transport function. Therefore, based on both the biochemical and structural analyses, it is proposed that the GATE domain, located outside of the previously identified ATP binding and hydrolysis motifs, is an additional element involved in ATP catalysis. - Highlights: • A novel domain ‘GATE’ is identified in the ABC protein DrrA. • GATE shows high sequence and structural conservation among diverse ABC proteins. • GATE is located outside of the previously studied ATP binding and hydrolysis motifs. • Conserved GATE residues are critical for stability of DrrAB and for ATP catalysis.« less

  4. Russell body inducing threshold depends on the variable domain sequences of individual human IgG clones and the cellular protein homeostasis.

    PubMed

    Stoops, Janelle; Byrd, Samantha; Hasegawa, Haruki

    2012-10-01

    Russell bodies are intracellular aggregates of immunoglobulins. Although the mechanism of Russell body biogenesis has been extensively studied by using truncated mutant heavy chains, the importance of the variable domain sequences in this process and in immunoglobulin biosynthesis remains largely unknown. Using a panel of structurally and functionally normal human immunoglobulin Gs, we show that individual immunoglobulin G clones possess distinctive Russell body inducing propensities that can surface differently under normal and abnormal cellular conditions. Russell body inducing predisposition unique to each immunoglobulin G clone was corroborated by the intrinsic physicochemical properties encoded in the heavy chain variable domain/light chain variable domain sequence combinations that define each immunoglobulin G clone. While the sequence based intrinsic factors predispose certain immunoglobulin G clones to be more prone to induce Russell bodies, extrinsic factors such as stressful cell culture conditions also play roles in unmasking Russell body propensity from immunoglobulin G clones that are normally refractory to developing Russell bodies. By taking advantage of heterologous expression systems, we dissected the roles of individual subunit chains in Russell body formation and examined the effect of non-cognate subunit chain pair co-expression on Russell body forming propensity. The results suggest that the properties embedded in the variable domain of individual light chain clones and their compatibility with the partnering heavy chain variable domain sequences underscore the efficiency of immunoglobulin G biosynthesis, the threshold for Russell body induction, and the level of immunoglobulin G secretion. We propose that an interplay between the unique properties encoded in variable domain sequences and the state of protein homeostasis determines whether an immunoglobulin G expressing cell will develop the Russell body phenotype in a dynamic cellular setting. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. Molecular gene organisation and secondary structure of the mitochondrial large subunit ribosomal RNA from the cultivated Basidiomycota Agrocybe aegerita: a 13 kb gene possessing six unusual nucleotide extensions and eight introns.

    PubMed

    Gonzalez, P; Barroso, G; Labarère, J

    1999-04-01

    The complete gene sequence and secondary structure of the mitochondrial LSU rRNA from the cultivated Basidiomycota Agrocybe aegerita was derived by chromosome walking. The A.aegerita LSU rRNA gene (13 526 nt) represents, to date, the longest described, due to the highest number of introns (eight) and the occurrence of six long nucleotidic extensions. Seven introns belong to group I, while the intronic sequence i5 constitutes the first typical group II intron reported in a fungal mitochondrial LSU rDNA. As with most fungal LSU rDNA introns reported to date, four introns (i5-i8) are distributed in domain V associated with the peptidyl-transferase activity. One intron (i1) is located in domain I, and three (i2-i4) in domain II. The introns i2-i8 possess homologies with other fungal, algal or protozoan introns located at the same position in LSU rDNAs. One of them (i6) is located at the same insertion site as most Ascomycota or algae LSU introns, suggesting a possible inheritance from a common ancestor. On the contrary, intron i1 is located at a so-far unreported insertion site. Among the six unusual nucleotide extensions, five are located in domain I and one in domain V. This is the first report of a mitochondrial LSU rRNA gene sequence and secondary structure for the whole Basidiomycota division.

  6. Use of Limited Proteolysis and Mutagenesis To Identify Folding Domains and Sequence Motifs Critical for Wax Ester Synthase/Acyl Coenzyme A:Diacylglycerol Acyltransferase Activity

    PubMed Central

    Villa, Juan A.; Cabezas, Matilde; de la Cruz, Fernando

    2014-01-01

    Triacylglycerols and wax esters are synthesized as energy storage molecules by some proteobacteria and actinobacteria under stress. The enzyme responsible for neutral lipid accumulation is the bifunctional wax ester synthase/acyl-coenzyme A (CoA):diacylglycerol acyltransferase (WS/DGAT). Structural modeling of WS/DGAT suggests that it can adopt an acyl-CoA-dependent acyltransferase fold with the N-terminal and C-terminal domains connected by a helical linker, an architecture demonstrated experimentally by limited proteolysis. Moreover, we found that both domains form an active complex when coexpressed as independent polypeptides. The structural prediction and sequence alignment of different WS/DGAT proteins indicated catalytically important motifs in the enzyme. Their role was probed by measuring the activities of a series of alanine scanning mutants. Our study underscores the structural understanding of this protein family and paves the way for their modification to improve the production of neutral lipids. PMID:24296496

  7. Studies of the structure-activity relationships of peptides and proteins involved in growth and development based on their three-dimensional structures.

    PubMed

    Nagata, Koji

    2010-01-01

    Peptides and proteins with similar amino acid sequences can have different biological functions. Knowledge of their three-dimensional molecular structures is critically important in identifying their functional determinants. In this review, I describe the results of our and other groups' structure-based functional characterization of insect insulin-like peptides, a crustacean hyperglycemic hormone-family peptide, a mammalian epidermal growth factor-family protein, and an intracellular signaling domain that recognizes proline-rich sequence.

  8. A thermophilic mini-chaperonin contains a conserved polypeptide-binding surface: combined crystallographic and NMR studies of the GroEL Apical Domain with implications for substrate interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hua, Q. X. H.; Dementieva, I. S. D.; Walsh, M. A. W.

    2001-02-23

    A homologue of the Escherichia coli GroEL apical domain was obtained from thermophilic eubacterium Thermus thermophilus. The domains share 70 % sequence identity (101 out of 145 residues). The thermal stability of the T. thermophilus apical domain (T{sub m}>100{sup o}C as evaluated by circular dichroism) is at least 35{sup o}C greater than that of the E. coli apical domain (T{sub m}=65{sup o}C). The crystal structure of a selenomethione-substituted apical domain from T. thermophilus was determined to a resolution of 1.78 {angstrom} using multiwavelength-anomalous-diffraction phasing. The structure is similar to that of the E. coli apical domain (root-mean-square deviation 0.45 {angstrom}more » based on main-chain atoms). The thermophilic structure contains seven additional salt bridges of which four contain charge-stabilized hydrogen bonds. Only one of the additional salt bridges would face the 'Anfinsen cage' in GroEL. High temperatures were exploited to map sites of interactions between the apical domain and molten globules. NMR footprints of apical domain-protein complexes were obtained at elevated temperature using {sup 15}N-{sup 1}H correlation spectra of {sup 15}N-labeled apical domain. Footprints employing two polypeptides unrelated in sequence or structure (an insulin monomer and the SRY high-mobility-group box, each partially unfolded at 50{sup o}C) are essentially the same and consistent with the peptide-binding surface previously defined in E. coli GroEL and its apical domain-peptide complexes. An additional part of this surface comprising a short N-terminal {alpha}-helix is observed. The extended footprint rationalizes mutagenesis studies of intact GroEL in which point mutations affecting substrate binding were found outside the 'classical' peptide-binding site. Our results demonstrate structural conservation of the apical domain among GroEL homologues and conservation of an extended non-polar surface recognizing diverse polypeptides.« less

  9. Cognitive representation of "musical fractals": Processing hierarchy and recursion in the auditory domain.

    PubMed

    Martins, Mauricio Dias; Gingras, Bruno; Puig-Waldmueller, Estela; Fitch, W Tecumseh

    2017-04-01

    The human ability to process hierarchical structures has been a longstanding research topic. However, the nature of the cognitive machinery underlying this faculty remains controversial. Recursion, the ability to embed structures within structures of the same kind, has been proposed as a key component of our ability to parse and generate complex hierarchies. Here, we investigated the cognitive representation of both recursive and iterative processes in the auditory domain. The experiment used a two-alternative forced-choice paradigm: participants were exposed to three-step processes in which pure-tone sequences were built either through recursive or iterative processes, and had to choose the correct completion. Foils were constructed according to generative processes that did not match the previous steps. Both musicians and non-musicians were able to represent recursion in the auditory domain, although musicians performed better. We also observed that general 'musical' aptitudes played a role in both recursion and iteration, although the influence of musical training was somehow independent from melodic memory. Moreover, unlike iteration, recursion in audition was well correlated with its non-auditory (recursive) analogues in the visual and action sequencing domains. These results suggest that the cognitive machinery involved in establishing recursive representations is domain-general, even though this machinery requires access to information resulting from domain-specific processes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  10. Structural insight into the specificity of the B3 DNA-binding domains provided by the co-crystal structure of the C-terminal fragment of BfiI restriction enzyme

    PubMed Central

    Golovenko, Dmitrij; Manakova, Elena; Zakrys, Linas; Zaremba, Mindaugas; Sasnauskas, Giedrius; Gražulis, Saulius; Siksnys, Virginijus

    2014-01-01

    The B3 DNA-binding domains (DBDs) of plant transcription factors (TF) and DBDs of EcoRII and BfiI restriction endonucleases (EcoRII-N and BfiI-C) share a common structural fold, classified as the DNA-binding pseudobarrel. The B3 DBDs in the plant TFs recognize a diverse set of target sequences. The only available co-crystal structure of the B3-like DBD is that of EcoRII-N (recognition sequence 5′-CCTGG-3′). In order to understand the structural and molecular mechanisms of specificity of B3 DBDs, we have solved the crystal structure of BfiI-C (recognition sequence 5′-ACTGGG-3′) complexed with 12-bp cognate oligoduplex. Structural comparison of BfiI-C–DNA and EcoRII-N–DNA complexes reveals a conserved DNA-binding mode and a conserved pattern of interactions with the phosphodiester backbone. The determinants of the target specificity are located in the loops that emanate from the conserved structural core. The BfiI-C–DNA structure presented here expands a range of templates for modeling of the DNA-bound complexes of the B3 family of plant TFs. PMID:24423868

  11. Reciprocal Influence of Protein Domains in the Cold-Adapted Acyl Aminoacyl Peptidase from Sporosarcina psychrophila

    PubMed Central

    Parravicini, Federica; Natalello, Antonino; Papaleo, Elena; De Gioia, Luca; Doglia, Silvia Maria; Lotti, Marina; Brocca, Stefania

    2013-01-01

    Acyl aminoacyl peptidases are two-domain proteins composed by a C-terminal catalytic α/β-hydrolase domain and by an N-terminal β-propeller domain connected through a structural element that is at the N-terminus in sequence but participates in the 3D structure of the C-domain. We investigated about the structural and functional interplay between the two domains and the bridge structure (in this case a single helix named α1-helix) in the cold-adapted enzyme from Sporosarcina psychrophila (SpAAP) using both protein variants in which entire domains were deleted and proteins carrying substitutions in the α1-helix. We found that in this enzyme the inter-domain connection dramatically affects the stability of both the whole enzyme and the β-propeller. The α1-helix is required for the stability of the intact protein, as in other enzymes of the same family; however in this psychrophilic enzyme only, it destabilizes the isolated β-propeller. A single charged residue (E10) in the α1-helix plays a major role for the stability of the whole structure. Overall, a strict interaction of the SpAAP domains seems to be mandatory for the preservation of their reciprocal structural integrity and may witness their co-evolution. PMID:23457536

  12. Reciprocal influence of protein domains in the cold-adapted acyl aminoacyl peptidase from Sporosarcina psychrophila.

    PubMed

    Parravicini, Federica; Natalello, Antonino; Papaleo, Elena; De Gioia, Luca; Doglia, Silvia Maria; Lotti, Marina; Brocca, Stefania

    2013-01-01

    Acyl aminoacyl peptidases are two-domain proteins composed by a C-terminal catalytic α/β-hydrolase domain and by an N-terminal β-propeller domain connected through a structural element that is at the N-terminus in sequence but participates in the 3D structure of the C-domain. We investigated about the structural and functional interplay between the two domains and the bridge structure (in this case a single helix named α1-helix) in the cold-adapted enzyme from Sporosarcina psychrophila (SpAAP) using both protein variants in which entire domains were deleted and proteins carrying substitutions in the α1-helix. We found that in this enzyme the inter-domain connection dramatically affects the stability of both the whole enzyme and the β-propeller. The α1-helix is required for the stability of the intact protein, as in other enzymes of the same family; however in this psychrophilic enzyme only, it destabilizes the isolated β-propeller. A single charged residue (E10) in the α1-helix plays a major role for the stability of the whole structure. Overall, a strict interaction of the SpAAP domains seems to be mandatory for the preservation of their reciprocal structural integrity and may witness their co-evolution.

  13. Predictive and comparative analysis of Ebolavirus proteins

    PubMed Central

    Cong, Qian; Pei, Jimin; Grishin, Nick V

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus. PMID:26158395

  14. Predictive and comparative analysis of Ebolavirus proteins.

    PubMed

    Cong, Qian; Pei, Jimin; Grishin, Nick V

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus.

  15. Structure of the two-domain hexameric APS kinase from Thiobacillus denitrificans: structural basis for the absence of ATP sulfurylase activity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gay, Sean C.; Segel, Irwin H.; Fisher, Andrew J., E-mail: fisher@chem.ucdavis.edu

    2009-10-01

    APS kinase from Thiobacillus denitrificans contains an inactive N-terminal ATP sulfurylase domain. The structure presented unveils the first hexameric assembly for an APS kinase, and reveals that structural changes in the N-terminal domain disrupt the ATP sulfurylase active site thus prohibiting activity. The Tbd-0210 gene of the chemolithotrophic bacterium Thiobacillus denitrificans is annotated to encode a 60.5 kDa bifunctional enzyme with ATP sulfurylase and APS kinase activity. This putative bifunctional enzyme was cloned, expressed and structurally characterized. The 2.95 Å resolution X-ray crystal structure reported here revealed a hexameric assembly with D{sub 3} symmetry. Each subunit contains a large N-terminalmore » sulfurylase-like domain and a C-terminal APS kinase domain reminiscent of the two-domain fungal ATP sulfurylases of Penicillium chrysogenum and Saccharomyces cerevisiae, which also exhibit a hexameric assembly. However, the T. denitrificans enzyme exhibits numerous structural and sequence differences in the N-terminal domain that render it inactive with respect to ATP sulfurylase activity. Surprisingly, the C-terminal domain does indeed display APS kinase activity, indicating that this gene product is a true APS kinase. Therefore, these results provide the first structural insights into a unique hexameric APS kinase that contains a nonfunctional ATP sulfurylase-like domain of unknown function.« less

  16. Protein structure recognition: From eigenvector analysis to structural threading method

    NASA Astrophysics Data System (ADS)

    Cao, Haibo

    In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  17. Prevalence of the F-type lectin domain.

    PubMed

    Bishnoi, Ritika; Khatri, Indu; Subramanian, Srikrishna; Ramya, T N C

    2015-08-01

    F-type lectins are fucolectins with characteristic fucose and calcium-binding sequence motifs and a unique lectin fold (the "F-type" fold). F-type lectins are phylogenetically widespread with selective distribution. Several eukaryotic F-type lectins have been biochemically and structurally characterized, and the F-type lectin domain (FLD) has also been studied in the bacterial proteins, Streptococcus mitis lectinolysin and Streptococcus pneumoniae SP2159. However, there is little knowledge about the extent of occurrence of FLDs and their domain organization, especially, in bacteria. We have now mined the extensive genomic sequence information available in the public databases with sensitive sequence search techniques in order to exhaustively survey prokaryotic and eukaryotic FLDs. We report 437 FLD sequence clusters (clustered at 80% sequence identity) from eukaryotic, eubacterial and viral proteins. Domain architectures are diverse but mostly conserved in closely related organisms, and domain organizations of bacterial FLD-containing proteins are very different from their eukaryotic counterparts, suggesting unique specialization of FLDs to suit different requirements. Several atypical phylogenetic associations hint at lateral transfer. Among eukaryotes, we observe an expansion of FLDs in terms of occurrence and domain organization diversity in the taxa Mollusca, Hemichordata and Branchiostomi, perhaps coinciding with greater emphasis on innate immune strategies in these organisms. The naturally occurring FLDs with diverse domain organizations that we have identified here will be useful for future studies aimed at creating designer molecular platforms for directing desired biological activities to fucosylated glycoconjugates in target niches. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence

    PubMed Central

    Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra

    2016-01-01

    Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895

  19. The identification of complete domains within protein sequences using accurate E-values for semi-global alignment

    PubMed Central

    Kann, Maricel G.; Sheetlin, Sergey L.; Park, Yonil; Bryant, Stephen H.; Spouge, John L.

    2007-01-01

    The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance. PMID:17596268

  20. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

    PubMed

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2008-09-01

    A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.

  1. Structural Characterization of the Predominant Family of Histidine Kinase Sensor Domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Z.; Hendrickson, W

    2010-01-01

    Histidine kinase (HK) receptors are used ubiquitously by bacteria to monitor environmental changes, and they are also prevalent in plants, fungi, and other protists. Typical HK receptors have an extracellular sensor portion that detects a signal, usually a chemical ligand, and an intracellular transmitter portion that includes both the kinase domain itself and the site for histidine phosphorylation. While kinase domains are highly conserved, sensor domains are diverse. HK receptors function as dimers, but the molecular mechanism for signal transduction across cell membranes remains obscure. In this study, eight crystal structures were determined from five sensor domains representative of themore » most populated family, family HK1, found in a bioinformatic analysis of predicted sensor domains from transmembrane HKs. Each structure contains an inserted repeat of PhoQ/DcuS/CitA (PDC) domains, and similarity between sequence and structure is correlated across these and other double-PDC sensor proteins. Three of the five sensors crystallize as dimers that appear to be physiologically relevant, and comparisons between ligated structures and apo-state structures provide insights into signal transmission. Some HK1 family proteins prove to be sensors for chemotaxis proteins or diguanylate cyclase receptors, implying a combinatorial molecular evolution.« less

  2. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  3. Properties of the intracellular transient receptor potential (TRP) channel in yeast, Yvc1.

    PubMed

    Chang, Yiming; Schlenstedt, Gabriel; Flockerzi, Veit; Beck, Andreas

    2010-05-17

    Transient receptor potential (TRP) channels are found among mammals, flies, worms, ciliates, Chlamydomonas, and yeast but are absent in plants. These channels are believed to be tetramers of proteins containing six transmembrane domains (TMs). Their primary structures are diverse with sequence similarities only in some short amino acid sequence motifs mainly within sequences covering TM5, TM6, and adjacent domains. In the yeast genome, there is one gene encoding a TRP-like sequence. This protein forms an ion channel in the vacuolar membrane and is therefore called Yvc1 for yeast vacuolar conductance 1. In the following we summarize its prominent features. Copyright 2009 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  4. Primary structure and cellular localization of chicken brain myosin-V (p190), an unconventional myosin with calmodulin light chains

    PubMed Central

    1992-01-01

    Recent biochemical studies of p190, a calmodulin (CM)-binding protein purified from vertebrate brain, have demonstrated that this protein, purified as a complex with bound CM, shares a number of properties with myosins (Espindola, F. S., E. M. Espreafico, M. V. Coelho, A. R. Martins, F. R. C. Costa, M. S. Mooseker, and R. E. Larson. 1992. J. Cell Biol. 118:359-368). To determine whether or not p190 was a member of the myosin family of proteins, a set of overlapping cDNAs encoding the full-length protein sequence of chicken brain p190 was isolated and sequenced. Verification that the deduced primary structure was that of p190 was demonstrated through microsequence analysis of a cyanogen bromide peptide generated from chick brain p190. The deduced primary structure of chicken brain p190 revealed that this 1,830-amino acid (aa) 212,509-D) protein is a member of a novel structural class of unconventional myosins that includes the gene products encoded by the dilute locus of mouse and the MYO2 gene of Saccharomyces cerevisiae. We have named the p190-CM complex "myosin-V" based on the results of a detailed sequence comparison of the head domains of 29 myosin heavy chains (hc), which has revealed that this myosin, based on head structure, is the fifth of six distinct structural classes of myosin to be described thus far. Like the presumed products of the mouse dilute and yeast MYO2 genes, the head domain of chicken myosin-V hc (aa 1-764) is linked to a "neck" domain (aa 765-909) consisting of six tandem repeats of an approximately 23-aa "IQ-motif." All known myosins contain at least one such motif at their head-tail junctions; these IQ-motifs may function as calmodulin or light chain binding sites. The tail domain of chicken myosin-V consists of an initial 511 aa predicted to form several segments of coiled-coil alpha helix followed by a terminal 410-aa globular domain (aa, 1,421-1,830). Interestingly, a portion of the tail domain (aa, 1,094-1,830) shares 58% amino acid sequence identity with a 723-aa protein from mouse brain reported to be a glutamic acid decarboxylase. The neck region of chicken myosin-V, which contains the IQ-motifs, was demonstrated to contain the binding sites for CM by analyzing CM binding to bacterially expressed fusion proteins containing the head, neck, and tail domains. Immunolocalization of myosin-V in brain and in cultured cells revealed an unusual distribution for this myosin in both neurons and nonneuronal cells.(ABSTRACT TRUNCATED AT 400 WORDS) PMID:1469047

  5. Structural basis of DNA sequence recognition by the response regulator PhoP in Mycobacterium tuberculosis.

    PubMed

    He, Xiaoyuan; Wang, Liqin; Wang, Shuishu

    2016-04-15

    The transcriptional regulator PhoP is an essential virulence factor in Mycobacterium tuberculosis, and it presents a target for the development of new anti-tuberculosis drugs and attenuated tuberculosis vaccine strains. PhoP binds to DNA as a highly cooperative dimer by recognizing direct repeats of 7-bp motifs with a 4-bp spacer. To elucidate the PhoP-DNA binding mechanism, we determined the crystal structure of the PhoP-DNA complex. The structure revealed a tandem PhoP dimer that bound to the direct repeat. The surprising tandem arrangement of the receiver domains allowed the four domains of the PhoP dimer to form a compact structure, accounting for the strict requirement of a 4-bp spacer and the highly cooperative binding of the dimer. The PhoP-DNA interactions exclusively involved the effector domain. The sequence-recognition helix made contact with the bases of the 7-bp motif in the major groove, and the wing interacted with the adjacent minor groove. The structure provides a starting point for the elucidation of the mechanism by which PhoP regulates the virulence of M. tuberculosis and guides the design of screening platforms for PhoP inhibitors.

  6. Structural and functional characterization of a cell cycle associated HDAC1/2 complex reveals the structural basis for complex assembly and nucleosome targeting

    PubMed Central

    Itoh, Toshimasa; Fairall, Louise; Muskett, Frederick W.; Milano, Charles P.; Watson, Peter J.; Arnaudo, Nadia; Saleh, Almutasem; Millard, Christopher J.; El-Mezgueldi, Mohammed; Martino, Fabrizio; Schwabe, John W.R.

    2015-01-01

    Recent proteomic studies have identified a novel histone deacetylase complex that is upregulated during mitosis and is associated with cyclin A. This complex is conserved from nematodes to man and contains histone deacetylases 1 and 2, the MIDEAS corepressor protein and a protein called DNTTIP1 whose function was hitherto poorly understood. Here, we report the structures of two domains from DNTTIP1. The amino-terminal region forms a tight dimerization domain with a novel structural fold that interacts with and mediates assembly of the HDAC1:MIDEAS complex. The carboxy-terminal domain of DNTTIP1 has a structure related to the SKI/SNO/DAC domain, despite lacking obvious sequence homology. We show that this domain in DNTTIP1 mediates interaction with both DNA and nucleosomes. Thus, DNTTIP1 acts as a dimeric chromatin binding module in the HDAC1:MIDEAS corepressor complex. PMID:25653165

  7. The Domain of Cognition: An Alternative to Bloom's Cognitive Domain within the Framework of an Information Processing Model.

    ERIC Educational Resources Information Center

    Stahl, Robert J.; Murphy, Gary T.

    Weaknesses in the structure, levels, and sequence of Bloom's taxonomy of cognitive domains emphasize the need for both a new model of how individual learners process information and a new taxonomy of the different levels of memory, thinking, and learning. Both the model and the taxonomy should be consistent with current research findings. The…

  8. Contribution of the first K-homology domain of poly(C)-binding protein 1 to its affinity and specificity for C-rich oligonucleotides

    PubMed Central

    Yoga, Yano M. K.; Traore, Daouda A. K.; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R.; Barker, Andrew; Leedman, Peter J.; Wilce, Jacqueline A.; Wilce, Matthew C. J.

    2012-01-01

    Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5′-CCCTCCCT-3′ DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5′-ACCCCA-3′ DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad. PMID:22344691

  9. Contribution of the first K-homology domain of poly(C)-binding protein 1 to its affinity and specificity for C-rich oligonucleotides.

    PubMed

    Yoga, Yano M K; Traore, Daouda A K; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R; Barker, Andrew; Leedman, Peter J; Wilce, Jacqueline A; Wilce, Matthew C J

    2012-06-01

    Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5'-CCCTCCCT-3' DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5'-ACCCCA-3' DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad.

  10. Prelude and Fugue, predicting local protein structure, early folding regions and structural weaknesses.

    PubMed

    Kwasigroch, Jean Marc; Rooman, Marianne

    2006-07-15

    Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude(&Fugue) computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. (Prelude&)Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold. http://babylone.ulb.ac.be/Prelude_and_Fugue.

  11. Expansion of divergent SEA domains in cell surface proteins and nucleoporin 54.

    PubMed

    Pei, Jimin; Grishin, Nick V

    2017-03-01

    SEA (sea urchin sperm protein, enterokinase, agrin) domains, many of which possess autoproteolysis activity, have been found in a number of cell surface and secreted proteins. Despite high sequence divergence, SEA domains were also proposed to be present in dystroglycan based on a conserved autoproteolysis motif and receptor-type protein phosphatase IA-2 based on structural similarity. The presence of a SEA domain adjacent to the transmembrane segment appears to be a recurring theme in quite a number of type I transmembrane proteins on the cell surface, such as MUC1, dystroglycan, IA-2, and Notch receptors. By comparative sequence and structural analyses, we identified dystroglycan-like proteins with SEA domains in Capsaspora owczarzaki of the Filasterea group, one of the closest single-cell relatives of metazoans. We also detected novel and divergent SEA domains in a variety of cell surface proteins such as EpCAM, α/ε-sarcoglycan, PTPRR, collectrin/Tmem27, amnionless, CD34, KIAA0319, fibrocystin-like protein, and a number of cadherins. While these proteins are mostly from metazoans or their single cell relatives such as choanoflagellates and Filasterea, fibrocystin-like proteins with SEA domains were found in several other eukaryotic lineages including green algae, Alveolata, Euglenozoa, and Haptophyta, suggesting an ancient evolutionary origin. In addition, the intracellular protein Nucleoporin 54 (Nup54) acquired a divergent SEA domain in choanoflagellates and metazoans. © 2016 The Protein Society.

  12. A traditional evolutionary history of foot-and-mouth disease viruses in Southeast Asia challenged by analyses of non-structural protein coding sequences

    USDA-ARS?s Scientific Manuscript database

    Molecular epidemiology and evolution of foot-and-mouth disease virus (FMDV) are widely studied using genomic sequences encoding VP1, the capsid protein containing the most relevant antigenic domains. Although sequencing of the full viral genome is not used as a routine diagnostic or surveillance too...

  13. Prediction of glycolipid-binding domains from the amino acid sequence of lipid raft-associated proteins: application to HpaA, a protein involved in the adhesion of Helicobacter pylori to gastrointestinal cells.

    PubMed

    Fantini, Jacques; Garmy, Nicolas; Yahi, Nouara

    2006-09-12

    Protein-glycolipid interactions mediate the attachment of various pathogens to the host cell surface as well as the association of numerous cellular proteins with lipid rafts. Thus, it is of primary importance to identify the protein domains involved in glycolipid recognition. Using structure similarity searches, we could identify a common glycolipid-binding domain in the three-dimensional structure of several proteins known to interact with lipid rafts. Yet the three-dimensional structure of most raft-targeted proteins is still unknown. In the present study, we have identified a glycolipid-binding domain in the amino acid sequence of a bacterial adhesin (Helicobacter pylori adhesin A, HpaA). The prediction was based on the major properties of the glycolipid-binding domains previously characterized by structural searches. A short (15-mer) synthetic peptide corresponding to this putative glycolipid-binding domain was synthesized, and we studied its interaction with glycolipid monolayers at the air-water interface. The synthetic HpaA peptide recognized LacCer but not Gb3. This glycolipid specificity was in line with that of the whole bacterium. Molecular modeling studies gave some insights into this high selectivity of interaction. It also suggested that Phe147 in HpaA played a key role in LacCer recognition, through sugar-aromatic CH-pi stacking interactions with the hydrophobic side of the galactose ring of LacCer. Correspondingly, the replacement of Phe147 with Ala strongly affected LacCer recognition, whereas substitution with Trp did not. Our method could be used to identify glycolipid-binding domains in microbial and cellular proteins interacting with lipid shells, rafts, and other specialized membrane microdomains.

  14. Polarized light microscopy study on the reentrant phase transition in a (Ba 1–xK x)Fe 2As 2 single crystal with x = 0.24

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Yong; Tanatar, Makariy A.; Timmons, Erik

    In this study, a sequence of structural/magnetic transitions on cooling is reported in the literature for hole-doped iron-based superconductor (Ba 1–xK x)Fe 2As 2 with x = 0.24. By using polarized light microscopy, we directly observe the formation of orthorhombic domains in (Ba 1–xK x)Fe 2As 2 (x = 0.24) single crystal below a temperature of simultaneous structural/magnetic transition T N ~ 80 K. The structural domains vanish below ~30 K, but reappear below T = 15 K. Our results are consistent with reentrance transformation sequence from high-temperature tetragonal (HTT) to low temperature orthorhombic (LTO1) structure at T N ~more » 80 K, LTO1 to low temperature tetragonal (LTT) structure at T c ~ 25 K, and LTT to low temperature orthorhombic (LTO2) structure at T ~ 15 K.« less

  15. Polarized light microscopy study on the reentrant phase transition in a (Ba 1–xK x)Fe 2As 2 single crystal with x = 0.24

    DOE PAGES

    Liu, Yong; Tanatar, Makariy A.; Timmons, Erik; ...

    2016-11-09

    In this study, a sequence of structural/magnetic transitions on cooling is reported in the literature for hole-doped iron-based superconductor (Ba 1–xK x)Fe 2As 2 with x = 0.24. By using polarized light microscopy, we directly observe the formation of orthorhombic domains in (Ba 1–xK x)Fe 2As 2 (x = 0.24) single crystal below a temperature of simultaneous structural/magnetic transition T N ~ 80 K. The structural domains vanish below ~30 K, but reappear below T = 15 K. Our results are consistent with reentrance transformation sequence from high-temperature tetragonal (HTT) to low temperature orthorhombic (LTO1) structure at T N ~more » 80 K, LTO1 to low temperature tetragonal (LTT) structure at T c ~ 25 K, and LTT to low temperature orthorhombic (LTO2) structure at T ~ 15 K.« less

  16. Skilled memory in expert figure skaters.

    PubMed

    Deakin, J M; Allard, F

    1991-01-01

    The present studies extend skilled-memory theory to a domain involving the performance of motor sequences. Skilled figure skaters were better able than their less skilled counterparts to perform short skating sequences that were choreographed, rather than randomly constructed. Expert skaters encoded sequences for performance very differently from the way in which they encoded sequences that were verbally presented for verbal recall. Tasks interpolated between sequence and recall showed no significant influence on recall accuracy, implicating long-term memory in skating memory. There was little evidence for the use of retrieval structures when skaters learned the brief sequences used throughout these studies. Finally, expert skaters were able to judge the similarity of two skating elements faster than less skilled skaters, indicating a faster access to semantic memory for experts. The data indicate that skaters show many of the same skilled-memory characteristics as have been described in other skill domains involving memorization, such as digit span and memory for dinner orders.

  17. Combining Structural Modeling with Ensemble Machine Learning to Accurately Predict Protein Fold Stability and Binding Affinity Effects upon Mutation

    PubMed Central

    Garcia Lopez, Sebastian; Kim, Philip M.

    2014-01-01

    Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases. PMID:25243403

  18. Intramolecular interactions regulate SAP97 binding to GKAP

    PubMed Central

    Wu, Hongju; Reissner, Carsten; Kuhlendahl, Sven; Coblentz, Blake; Reuver, Susanne; Kindler, Stefan; Gundelfinger, Eckart D.; Garner, Craig C.

    2000-01-01

    Membrane-associated guanylate kinase homologs (MAGUKs) are multidomain proteins found to be central organizers of cellular junctions. In this study, we examined the molecular mechanisms that regulate the interaction of the MAGUK SAP97 with its GUK domain binding partner GKAP (GUK-associated protein). The GKAP–GUK interaction is regulated by a series of intramolecular interactions. Specifically, the association of the Src homology 3 (SH3) domain and sequences situated between the SH3 and GUK domains with the GUK domain was found to interfere with GKAP binding. In contrast, N-terminal sequences that precede the first PDZ domain in SAP97, facilitated GKAP binding via its association with the SH3 domain. Utilizing crystal structure data available for PDZ, SH3 and GUK domains, molecular models of SAP97 were generated. These models revealed that SAP97 can exist in a compact U-shaped conformation in which the N-terminal domain folds back and interacts with the SH3 and GUK domains. These models support the biochemical data and provide new insights into how intramolecular interactions may regulate the association of SAP97 with its binding partners. PMID:11060025

  19. Epitope mapping of the domains of human angiotensin converting enzyme.

    PubMed

    Kugaevskaya, Elena V; Kolesanova, Ekaterina F; Kozin, Sergey A; Veselovsky, Alexander V; Dedinsky, Ilya R; Elisseeva, Yulia E

    2006-06-01

    Somatic angiotensin converting enzyme (sACE), contains in its single chain two homologous domains (called N- and C-domains), each bearing a functional zinc-dependent active site. The present study aims to define the differences between two sACE domains and to localize experimentally revealed antigenic determinants (B-epitopes) in the recently determined three-dimensional structure of testicular tACE. The predicted linear antigenic determinants of human sACE were determined by peptide scanning ("PEPSCAN") approach. Essential difference was demonstrated between locations of the epitopes in the N- and C-domains. Comparison of arrangement of epitopes in the human domains with the corresponding sequences of some mammalian sACEs enabled to classify the revealed antigenic determinants as variable or conserved areas. The location of antigenic determinants with respect to various structural elements and to functionally important sites of the human sACE C-domain was estimated. The majority of antigenic sites of the C-domain were located at the irregular elements and at the boundaries of secondary structure elements. The data show structural differences between the sACE domains. The experimentally revealed antigenic determinants were in agreement with the recently determined crystal tACE structure. New potential applications are open to successfully produce mono-specific and group-specific antipeptide antibodies.

  20. Evolution of EF-hand calcium-modulated proteins. IV. Exon shuffling did not determine the domain compositions of EF-hand proteins

    NASA Technical Reports Server (NTRS)

    Kretsinger, R. H.; Nakayama, S.

    1993-01-01

    In the previous three reports in this series we demonstrated that the EF-hand family of proteins evolved by a complex pattern of gene duplication, transposition, and splicing. The dendrograms based on exon sequences are nearly identical to those based on protein sequences for troponin C, the essential light chain myosin, the regulatory light chain, and calpain. This validates both the computational methods and the dendrograms for these subfamilies. The proposal of congruence for calmodulin, troponin C, essential light chain, and regulatory light chain was confirmed. There are, however, significant differences in the calmodulin dendrograms computed from DNA and from protein sequences. In this study we find that introns are distributed throughout the EF-hand domain and the interdomain regions. Further, dendrograms based on intron type and distribution bear little resemblance to those based on protein or on DNA sequences. We conclude that introns are inserted, and probably deleted, with relatively high frequency. Further, in the EF-hand family exons do not correspond to structural domains and exon shuffling played little if any role in the evolution of this widely distributed homolog family. Calmodulin has had a turbulent evolution. Its dendrograms based on protein sequence, exon sequence, 3'-tail sequence, intron sequences, and intron positions all show significant differences.

  1. A Bioinformatic Strategy for the Detection, Classification and Analysis of Bacterial Autotransporters

    PubMed Central

    Celik, Nermin; Webb, Chaille T.; Leyton, Denisse L.; Holt, Kathryn E.; Heinz, Eva; Gorrell, Rebecca; Kwok, Terry; Naderer, Thomas; Strugnell, Richard A.; Speed, Terence P.; Teasdale, Rohan D.; Likić, Vladimir A.; Lithgow, Trevor

    2012-01-01

    Autotransporters are secreted proteins that are assembled into the outer membrane of bacterial cells. The passenger domains of autotransporters are crucial for bacterial pathogenesis, with some remaining attached to the bacterial surface while others are released by proteolysis. An enigma remains as to whether autotransporters should be considered a class of secretion system, or simply a class of substrate with peculiar requirements for their secretion. We sought to establish a sensitive search protocol that could identify and characterize diverse autotransporters from bacterial genome sequence data. The new sequence analysis pipeline identified more than 1500 autotransporter sequences from diverse bacteria, including numerous species of Chlamydiales and Fusobacteria as well as all classes of Proteobacteria. Interrogation of the proteins revealed that there are numerous classes of passenger domains beyond the known proteases, adhesins and esterases. In addition the barrel-domain-a characteristic feature of autotransporters-was found to be composed from seven conserved sequence segments that can be arranged in multiple ways in the tertiary structure of the assembled autotransporter. One of these conserved motifs overlays the targeting information required for autotransporters to reach the outer membrane. Another conserved and diagnostic motif maps to the linker region between the passenger domain and barrel-domain, indicating it as an important feature in the assembly of autotransporters. PMID:22905239

  2. Comparison of S. cerevisiae F-BAR domain structures reveals a conserved inositol phosphate binding site

    PubMed Central

    Moravcevic, Katarina; Alvarado, Diego; Schmitz, Karl R.; Kenniston, Jon A.; Mendrola, Jeannine M.; Ferguson, Kathryn M.; Lemmon, Mark A.

    2015-01-01

    SUMMARY F-BAR domains control membrane interactions in endocytosis, cytokinesis, and cell signaling. Although generally thought to bind curved membranes containing negatively charged phospholipids, numerous functional studies argue that differences in lipid-binding selectivities of F-BAR domains are functionally important. Here, we compare membrane-binding properties of the S. cerevisiae F-BAR domains in vitro and in vivo. Whereas some F-BAR domains (such as Bzz1p and Hof1p F-BARs) bind equally well to all phospholipids, the F-BAR domain from the RhoGAP Rgd1p preferentially binds phosphoinositides. We determined X-ray crystal structures of F-BAR domains from Hof1p and Rgd1p, the latter bound to an inositol phosphate. The structures explain phospholipid-binding selectivity differences, and reveal an F-BAR phosphoinositide binding site that is fully conserved in a mammalian RhoGAP called Gmip, and is partly retained in certain other F-BAR domains. Our findings reveal previously unappreciated determinants of F-BAR domain lipid-binding specificity, and provide a basis for its prediction from sequence. PMID:25620000

  3. Dimer formation through domain swapping in the crystal structure of the Grb2-SH2-Ac-pYVNV complex.

    PubMed

    Schiering, N; Casale, E; Caccia, P; Giordano, P; Battistini, C

    2000-11-07

    Src homology 2 (SH2) domains are key modules in intracellular signal transduction. They link activated cell surface receptors to downstream targets by binding to phosphotyrosine-containing sequence motifs. The crystal structure of a Grb2-SH2 domain-phosphopeptide complex was determined at 2.4 A resolution. The asymmetric unit contains four polypeptide chains. There is an unexpected domain swap so that individual chains do not adopt a closed SH2 fold. Instead, reorganization of the EF loop leads to an open, nonglobular fold, which associates with an equivalent partner to generate an intertwined dimer. As in previously reported crystal structures of canonical Grb2-SH2 domain-peptide complexes, each of the four hybrid SH2 domains in the two domain-swapped dimers binds the phosphopeptide in a type I beta-turn conformation. This report is the first to describe domain swapping for an SH2 domain. While in vivo evidence of dimerization of Grb2 exists, our SH2 dimer is metastable and a physiological role of this new form of dimer formation remains to be demonstrated.

  4. Supra-domains: evolutionary units larger than single protein domains.

    PubMed

    Vogel, Christine; Berzuini, Carlo; Bashton, Matthew; Gough, Julian; Teichmann, Sarah A

    2004-02-20

    Domains are the evolutionary units that comprise proteins, and most proteins are built from more than one domain. Domains can be shuffled by recombination to create proteins with new arrangements of domains. Using structural domain assignments, we examined the combinations of domains in the proteins of 131 completely sequenced organisms. We found two-domain and three-domain combinations that recur in different protein contexts with different partner domains. The domains within these combinations have a particular functional and spatial relationship. These units are larger than individual domains and we term them "supra-domains". Amongst the supra-domains, we identified some 1400 (1203 two-domain and 166 three-domain) combinations that are statistically significantly over-represented relative to the occurrence and versatility of the individual component domains. Over one-third of all structurally assigned multi-domain proteins contain these over-represented supra-domains. This means that investigation of the structural and functional relationships of the domains forming these popular combinations would be particularly useful for an understanding of multi-domain protein function and evolution as well as for genome annotation. These and other supra-domains were analysed for their versatility, duplication, their distribution across the three kingdoms of life and their functional classes. By examining the three-dimensional structures of several examples of supra-domains in different biological processes, we identify two basic types of spatial relationships between the component domains: the combined function of the two domains is such that either the geometry of the two domains is crucial and there is a tight constraint on the interface, or the precise orientation of the domains is less important and they are spatially separate. Frequently, the role of the supra-domain becomes clear only once the three-dimensional structure is known. Since this is the case for only a quarter of the supra-domains, we provide a list of the most important unknown supra-domains as potential targets for structural genomics projects.

  5. Structurally detailed coarse-grained model for Sec-facilitated co-translational protein translocation and membrane integration

    PubMed Central

    Miller, Thomas F.

    2017-01-01

    We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943

  6. Transcriptome wide identification and characterization of starch branching enzyme in finger millet.

    PubMed

    Tyagi, Rajhans; Tiwari, Apoorv; Garg, Vijay Kumar; Gupta, Sanjay

    2017-01-01

    Starch-branching enzymes (SBEs) are one of the four major enzyme classes involved in starch biosynthesis in plants and play an important role in determining the structure and physical properties of starch granules. Multiple SBEs are involved in starch biosynthesis in plants. Finger millet is calcium rich important serial crop belongs to grass family and the transcriptome data of developing spikes is available on NCBI. In this study it was try to find out the gene sequence of starch branching enzyme and annotate the sequence and submit the sequence for further use. Rice SBE sequence was taken as reference and for characterization of the sequence different in silico tools were used. Four domains were found in the finger millet Starch branching enzyme like alpha amylase catalytic domain from 925 to2172 with E value 0, N-terminal Early set domain from 634 to 915 with E value 1.62 e-42, Alpha amylase, C-terminal all-beta domain from 2224 to 2511 with E value 5.80e-24 and 1,4-alpha-glucan-branching enzyme from 421 to 2517 with E value 0. Major binding interactions with the GLC (alpha-d-glucose), CA (calcium ion), GOL (glycerol), TRS (2-amino-2-hydroxymethylpropane- 1, 3-diol), MG (magnesium ion) and FLC (citrate anion) are fond with different residues. It was found in the phylogenetic study of the finger millet SBE with the 6 species of grass family that two clusters were form A and B. In cluster A, finger millet showed closeness with Oryzasativa and Setariaitalica, Sorghum bicolour and Zea mays while cluster B was formed with Triticumaestivum and Brachypodium distachyon. The nucleotide sequence of Finger millet SBE was submitted to NCBI with the accession no KY648913 and protein structure of SBE of finger millet was also submitted in PMDB with the PMDB id - PM0080938. This research presents a comparative overview of Finger millet SBE and includes their properties, structural and functional characteristics, and recent developments on their post-translational regulation.

  7. Dynamic anticipatory processing of hierarchical sequential events: a common role for Broca's area and ventral premotor cortex across domains?

    PubMed

    Fiebach, Christian J; Schubotz, Ricarda I

    2006-05-01

    This paper proposes a domain-general model for the functional contribution of ventral premotor cortex (PMv) and adjacent Broca's area to perceptual, cognitive, and motor processing. We propose to understand this frontal region as a highly flexible sequence processor, with the PMv mapping sequential events onto stored structural templates and Broca's Area involved in more complex, hierarchical or hypersequential processing. This proposal is supported by reference to previous functional neuroimaging studies investigating abstract sequence processing and syntactic processing.

  8. Comparative analysis of barophily-related amino acid content in protein domains of Pyrococcus abyssi and Pyrococcus furiosus.

    PubMed

    Yafremava, Liudmila S; Di Giulio, Massimo; Caetano-Anollés, Gustavo

    2013-01-01

    Amino acid substitution patterns between the nonbarophilic Pyrococcus furiosus and its barophilic relative P. abyssi confirm that hydrostatic pressure asymmetry indices reflect the extent to which amino acids are preferred by barophilic archaeal organisms. Substitution patterns in entire protein sequences, shared protein domains defined at fold superfamily level, domains in homologous sequence pairs, and domains of very ancient and very recent origin now provide further clues about the environment that led to the genetic code and diversified life. The pyrococcal proteomes are very similar and share a very early ancestor. Relative amino acid abundance analyses showed that biases in the use of amino acids are due to their shared fold superfamilies. Within these repertoires, only two of the five amino acids that are preferentially barophilic, aspartic acid and arginine, displayed this preference significantly and consistently across structure and in domains appearing in the ancestor. The more primordial asparagine, lysine and threonine displayed a consistent preference for nonbarophily across structure and in the ancestor. Since barophilic preferences are already evident in ancient domains that are at least ~3 billion year old, we conclude that barophily is a very ancient trait that unfolded concurrently with genetic idiosyncrasies in convergence towards a universal code.

  9. The Janus Kinase (JAK) FERM and SH2 Domains: Bringing Specificity to JAK-Receptor Interactions.

    PubMed

    Ferrao, Ryan; Lupardus, Patrick J

    2017-01-01

    The Janus kinases (JAKs) are non-receptor tyrosine kinases essential for signaling in response to cytokines and interferons and thereby control many essential functions in growth, development, and immune regulation. JAKs are unique among tyrosine kinases for their constitutive yet non-covalent association with class I and II cytokine receptors, which upon cytokine binding bring together two JAKs to create an active signaling complex. JAK association with cytokine receptors is facilitated by N-terminal FERM and SH2 domains, both of which are classical mediators of peptide interactions. Together, the JAK FERM and SH2 domains mediate a bipartite interaction with two distinct receptor peptide motifs, the proline-rich "Box1" and hydrophobic "Box2," which are present in the intracellular domain of cytokine receptors. While the general sidechain chemistry of Box1 and Box2 peptides is conserved between receptors, they share very weak primary sequence homology, making it impossible to posit why certain JAKs preferentially interact with and signal through specific subsets of cytokine receptors. Here, we review the structure and function of the JAK FERM and SH2 domains in light of several recent studies that reveal their atomic structure and elucidate interaction mechanisms with both the Box1 and Box2 receptor motifs. These crystal structures demonstrate how evolution has repurposed the JAK FERM and SH2 domains into a receptor-binding module that facilitates interactions with multiple receptors possessing diverse primary sequences.

  10. A Highly Organized Structure Mediating Nuclear Localization of a Myb2 Transcription Factor in the Protozoan Parasite Trichomonas vaginalis ▿ †

    PubMed Central

    Chu, Chien-Hsin; Chang, Lung-Chun; Hsu, Hong-Ming; Wei, Shu-Yi; Liu, Hsing-Wei; Lee, Yu; Kuo, Chung-Chi; Indra, Dharmu; Chen, Chinpan; Ong, Shiou-Jeng; Tai, Jung-Hsiang

    2011-01-01

    Nuclear proteins usually contain specific peptide sequences, referred to as nuclear localization signals (NLSs), for nuclear import. These signals remain unexplored in the protozoan pathogen, Trichomonas vaginalis. The nuclear import of a Myb2 transcription factor was studied here using immunodetection of a hemagglutinin-tagged Myb2 overexpressed in the parasite. The tagged Myb2 was localized to the nucleus as punctate signals. With mutations of its polybasic sequences, 48KKQK51 and 61KR62, Myb2 was localized to the nucleus, but the signal was diffusive. When fused to a C-terminal non-nuclear protein, the Myb2 sequence spanning amino acid (aa) residues 48 to 143, which is embedded within the R2R3 DNA-binding domain (aa 40 to 156), was essential and sufficient for efficient nuclear import of a bacterial tetracycline repressor (TetR), and yet the transport efficiency was reduced with an additional fusion of a firefly luciferase to TetR, while classical NLSs from the simian virus 40 T-antigen had no function in this assay system. Myb2 nuclear import and DNA-binding activity were substantially perturbed with mutation of a conserved isoleucine (I74) in helix 2 to proline that altered secondary structure and ternary folding of the R2R3 domain. Disruption of DNA-binding activity alone by point mutation of a lysine residue, K51, preceding the structural domain had little effect on Myb2 nuclear localization, suggesting that nuclear translocation of Myb2, which requires an ordered structural domain, is independent of its DNA binding activity. These findings provide useful information for testing whether myriad Mybs in the parasite use a common module to regulate nuclear import. PMID:22021237

  11. Mechanistic insights into phosphoprotein-binding FHA domains.

    PubMed

    Liang, Xiangyang; Van Doren, Steven R

    2008-08-01

    [Structure: see text]. FHA domains are protein modules that switch signals in diverse biological pathways by monitoring the phosphorylation of threonine residues of target proteins. As part of the effort to gain insight into cellular avoidance of cancer, FHA domains involved in the cellular response to DNA damage have been especially well-characterized. The complete protein where the FHA domain resides and the interaction partners determine the nature of the signaling. Thus, a key biochemical question is how do FHA domains pick out their partners from among thousands of alternatives in the cell? This Account discusses the structure, affinity, and specificity of FHA domains and the formation of their functional structure. Although FHA domains share sequence identity at only five loop residues, they all fold into a beta-sandwich of two beta-sheets. The conserved arginine and serine of the recognition loops recognize the phosphorylation of the threonine targeted. Side chains emanating from loops that join beta-strand 4 with 5, 6 with 7, or 10 with 11 make specific contacts with amino acids of the ligand that tailor sequence preferences. Many FHA domains choose a partner in extended conformation, somewhat according to the residue three after the phosphothreonine in sequence (pT + 3 position). One group of FHA domains chooses a short carboxylate-containing side chain at pT + 3. Another group chooses a long, branched aliphatic side chain. A third group prefers other hydrophobic or uncharged polar side chains at pT + 3. However, another FHA domain instead chooses on the basis of pT - 2, pT - 3, and pT + 1 positions. An FHA domain from a marker of human cancer instead chooses a much longer protein fragment that adds a beta-strand to its beta-sheet and that presents hydrophobic residues from a novel helix to the usual recognition surface. This novel recognition site and more remote sites for the binding of other types of protein partners were predicted for the entire family of FHA domains by a bioinformatics approach. The phosphopeptide-dependent dynamics of an FHA domain, SH2 domain, and PTB domain suggest a common theme: rigid, preformed binding surfaces support van der Waals contacts that provide favorable binding enthalpy. Despite the lack of pronounced conformational changes in FHA domains linked to binding events, more subtle adjustments may be possible. In the one FHA domain tested, phosphothreonine peptide binding is accompanied by increased flexibility just outside the binding site and increased rigidity across the beta-sandwich. The folding of the same FHA domain progresses through near-native intermediates that stabilize the recognition loops in the center of the phosphoprotein-binding surface; this may promote rigidity in the interface and affinity for targets phosphorylated on threonine.

  12. The Proteome Folding Project: Proteome-scale prediction of structure and function

    PubMed Central

    Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard

    2011-01-01

    The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995

  13. Two different domains of the luciferase gene in the heterotrophic dinoflagellate Noctiluca scintillans occur as two separate genes in photosynthetic species

    PubMed Central

    Liu, Liyun; Hastings, J. Woodland

    2007-01-01

    Noctiluca scintillans, a heterotrophic unarmored unicellular bioluminescent dinoflagellate, occurs widely in the oceans, often as a bloom. Molecular phylogenetic analysis based on 18S ribosomal DNA sequences consistently has placed this species on the basal branch of dinoflagellates. Here, we report that the structural organization of its luciferase gene is strikingly different from that of the seven luminous species previously characterized, all of which are photosynthetic. The Noctiluca gene codes for a polypeptide that consists of two distinct but contiguous domains. One, which is located in the N-terminal portion, is shorter than but similar in sequence to the individual domains of the three-domain luciferases found in all other luminous dinoflagellates studied. The other, situated in the C-terminal part, has sequence similarity to the luciferin-binding protein of the luminous dinoflagellate Lingulodinium polyedrum, encoded there by a separate gene. Western analysis shows that the native protein has the same size (≈100 kDa) as the heterologously expressed polypeptide, indicating that it is not a polyprotein. Thus, sequences found in two proteins in the L. polyedrum bioluminescence system are present in a single polypeptide in Noctiluca. PMID:17130452

  14. The complete mitochondrial genome of Pholis nebulosus (Perciformes: Pholidae).

    PubMed

    Wang, Zhongquan; Qin, Kaili; Liu, Jingxi; Song, Na; Han, Zhiqiang; Gao, Tianxiang

    2016-11-01

    In this study, the complete mitochondrial genome (mitogenome) sequence of Pholis nebulosus has been determined by long polymerase chain reaction and primer-walking methods. The mitogenome is a circular molecule of 16 524 bp in length, including the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 2 non-coding regions (L-strand replication origin and control region), the gene contents of which are identical to those observed in most bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), and the conserved sequence block domain (CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, CSB-A, CSB-1, CSB-2, CSB-3).

  15. Slowing Translation between Protein Domains by Increasing Affinity between mRNAs and the Ribosomal Anti-Shine-Dalgarno Sequence Improves Solubility.

    PubMed

    Vasquez, Kevin A; Hatridge, Taylor A; Curtis, Nicholas C; Contreras, Lydia M

    2016-02-19

    Recent studies have demonstrated that effective protein production requires coordination of multiple cotranslational cellular processes, which are heavily affected by translation timing. Until recently, protein engineering has focused on codon optimization to maximize protein production rates, mostly considering the effect of tRNA abundance. However, as it relates to complex multidomain proteins, it has been hypothesized that strategic translational pauses between domains and between distinct individual structural motifs can prevent interactions between nascent chain fragments that generate kinetically trapped misfolded peptides and thereby enhance protein yields. In this study, we introduce synthetic transient pauses between structural domains in a heterologous model protein based on designed patterns of affinity between the mRNA and the anti-Shine-Dalgarno (aSD) sequence on the ribosome. We demonstrate that optimizing translation attenuation at domain boundaries can predictably affect solubility patterns in bacteria. Exploration of the affinity space showed that modifying less than 1% of the nucleotides (on a small 12 amino acid linker) can vary soluble protein yields up to ∼7-fold without altering the primary sequence of the protein. In the context of longer linkers, where a larger number of distinct structural motifs can fold outside the ribosome, optimal synonymous codon variations resulted in an additional 2.1-fold increase in solubility, relative to that of nonoptimized linkers of the same length. While rational construction of 54 linkers of various affinities showed a significant correlation between protein solubility and predicted affinity, only weaker correlations were observed between tRNA abundance and protein solubility. We also demonstrate that naturally occurring high-affinity clusters are present between structural domains of β-galactosidase, one of Escherichia coli's largest native proteins. Interdomain ribosomal affinity is an important factor that has not previously been explored in the context of protein engineering.

  16. Quantitative theory of hydrophobic effect as a driving force of protein structure

    PubMed Central

    Perunov, Nikolay; England, Jeremy L

    2014-01-01

    Various studies suggest that the hydrophobic effect plays a major role in driving the folding of proteins. In the past, however, it has been challenging to translate this understanding into a predictive, quantitative theory of how the full pattern of sequence hydrophobicity in a protein shapes functionally important features of its tertiary structure. Here, we extend and apply such a phenomenological theory of the sequence-structure relationship in globular protein domains, which had previously been applied to the study of allosteric motion. In an effort to optimize parameters for the model, we first analyze the patterns of backbone burial found in single-domain crystal structures, and discover that classic hydrophobicity scales derived from bulk physicochemical properties of amino acids are already nearly optimal for prediction of burial using the model. Subsequently, we apply the model to studying structural fluctuations in proteins and establish a means of identifying ligand-binding and protein–protein interaction sites using this approach. PMID:24408023

  17. Membrane and Protein Interactions of the Pleckstrin Homology Domain Superfamily

    PubMed Central

    Lenoir, Marc; Kufareva, Irina; Abagyan, Ruben; Overduin, Michael

    2015-01-01

    The human genome encodes about 285 proteins that contain at least one annotated pleckstrin homology (PH) domain. As the first phosphoinositide binding module domain to be discovered, the PH domain recruits diverse protein architectures to cellular membranes. PH domains constitute one of the largest protein superfamilies, and have diverged to regulate many different signaling proteins and modules such as Dbl homology (DH) and Tec homology (TH) domains. The ligands of approximately 70 PH domains have been validated by binding assays and complexed structures, allowing meaningful extrapolation across the entire superfamily. Here the Membrane Optimal Docking Area (MODA) program is used at a genome-wide level to identify all membrane docking PH structures and map their lipid-binding determinants. In addition to the linear sequence motifs which are employed for phosphoinositide recognition, the three dimensional structural features that allow peripheral membrane domains to approach and insert into the bilayer are pinpointed and can be predicted ab initio. The analysis shows that conserved structural surfaces distinguish which PH domains associate with membrane from those that do not. Moreover, the results indicate that lipid-binding PH domains can be classified into different functional subgroups based on the type of membrane insertion elements they project towards the bilayer. PMID:26512702

  18. Structural and functional properties of hemoglobins from unicellular organisms as revealed by resonance Raman spectroscopy.

    PubMed

    Egawa, Tsuyoshi; Yeh, Syun-Ru

    2005-01-01

    Hemoglobins have been discovered in organisms from virtually all kingdoms. Their presence in unicellular organisms suggests that the gene for hemoglobin is very ancient and that the hemoglobins must have functions other than oxygen transport, in view of the fact that O2 delivery is a diffusion-controlled process in these organisms. Based on sequence alignment, three groups of hemoglobins have been characterized in unicellular organisms. The group-one hemoglobins, termed truncated hemoglobins, consist of proteins with 110-140 amino acid residues and a novel two-over-two alpha-helical sandwich motif. The group-two hemoglobins, termed flavohemoglobins, consist of a hemoglobin domain, with a classical three-over-three alpha-helical sandwich motif, and a flavin-containing reductase domain that is covalently attached to it. The group-three hemoglobins consist of myoglobin-like proteins that have high sequence homology and structural similarity to the hemoglobin domain of flavohemoglobins. In this review, recent resonance Raman studies of each group of these proteins are presented. Their implications are discussed in the context of the structural and functional properties of these novel hemoglobins.

  19. A Conserved Structural Module Regulates Transcriptional Responses to Diverse Stress Signals in Bacteria

    PubMed Central

    Campbell, Elizabeth A.; Greenwell, Roger; Anthony, Jennifer R.; Wang, Sheng; Lim, Lionel; Das, Kalyan; Sofia, Heidi J.; Donohue, Timothy J.; Darst, Seth A.

    2008-01-01

    SUMMARY A transcriptional response to singlet oxygen in Rhodobacter sphaeroides is controlled by the group IV σ factor σE and its cognate anti-σ ChrR. Crystal structures of the σE/ChrR complex reveal a modular, two-domain architecture for ChrR. The ChrR N-terminal anti-σ domain (ASD) binds a Zn2+ ion, contacts σE, and is sufficient to inhibit σE-dependent transcription. The ChrR C-terminal domain adopts a cupin fold, can coordinate an additional Zn2+, and is required for the transcriptional response to singlet oxygen. Structure-based sequence analyses predict that the ASD defines a common structural fold among predicted group IV antiσs. These ASDs are fused to diverse C-terminal domains that are likely involved in responding to specific environmental signals that control the activity of their cognate σ factor. PMID:17803943

  20. IFI16 Preferentially Binds to DNA with Quadruplex Structure and Enhances DNA Quadruplex Formation.

    PubMed

    Hároníková, Lucia; Coufal, Jan; Kejnovská, Iva; Jagelská, Eva B; Fojta, Miroslav; Dvořáková, Petra; Muller, Petr; Vojtesek, Borivoj; Brázda, Václav

    2016-01-01

    Interferon-inducible protein 16 (IFI16) is a member of the HIN-200 protein family, containing two HIN domains and one PYRIN domain. IFI16 acts as a sensor of viral and bacterial DNA and is important for innate immune responses. IFI16 binds DNA and binding has been described to be DNA length-dependent, but a preference for supercoiled DNA has also been demonstrated. Here we report a specific preference of IFI16 for binding to quadruplex DNA compared to other DNA structures. IFI16 binds to quadruplex DNA with significantly higher affinity than to the same sequence in double stranded DNA. By circular dichroism (CD) spectroscopy we also demonstrated the ability of IFI16 to stabilize quadruplex structures with quadruplex-forming oligonucleotides derived from human telomere (HTEL) sequences and the MYC promotor. A novel H/D exchange mass spectrometry approach was developed to assess protein interactions with quadruplex DNA. Quadruplex DNA changed the IFI16 deuteration profile in parts of the PYRIN domain (aa 0-80) and in structurally identical parts of both HIN domains (aa 271-302 and aa 586-617) compared to single stranded or double stranded DNAs, supporting the preferential affinity of IFI16 for structured DNA. Our results reveal the importance of quadruplex DNA structure in IFI16 binding and improve our understanding of how IFI16 senses DNA. IFI16 selectivity for quadruplex structure provides a mechanistic framework for IFI16 in immunity and cellular processes including DNA damage responses and cell proliferation.

  1. Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms

    PubMed Central

    Nasir, Arshan; Naeem, Aisha; Khan, Muhammad Jawad; Lopez-Nicora, Horacio D.; Caetano-Anollés, Gustavo

    2011-01-01

    The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups. PMID:24710297

  2. Four distinct types of E.C. 1.2.1.30 enzymes can catalyze the reduction of carboxylic acids to aldehydes.

    PubMed

    Stolterfoht, Holly; Schwendenwein, Daniel; Sensen, Christoph W; Rudroff, Florian; Winkler, Margit

    2017-09-10

    Increasing demand for chemicals from renewable resources calls for the development of new biotechnological methods for the reduction of oxidized bio-based compounds. Enzymatic carboxylate reduction is highly selective, both in terms of chemo- and product selectivity, but not many carboxylate reductase enzymes (CARs) have been identified on the sequence level to date. Thus far, their phylogeny is unexplored and very little is known about their structure-function-relationship. CARs minimally contain an adenylation domain, a phosphopantetheinylation domain and a reductase domain. We have recently identified new enzymes of fungal origin, using similarity searches against genomic sequences from organisms in which aldehydes were detected upon incubation with carboxylic acids. Analysis of sequences with known CAR functionality and CAR enzymes recently identified in our laboratory suggests that the three-domain architecture mentioned above is modular. The construction of a distance tree with a subsequent 1000-replicate bootstrap analysis showed that the CAR sequences included in our study fall into four distinct subgroups (one of bacterial origin and three of fungal origin, respectively), each with a bootstrap value of 100%. The multiple sequence alignment of all experimentally confirmed CAR protein sequences revealed fingerprint sequences of residues which are likely to be involved in substrate and co-substrate binding and one of the three catalytic substeps, respectively. The fingerprint sequences broaden our understanding of the amino acids that might be essential for the reduction of organic acids to the corresponding aldehydes in CAR proteins. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Identifying and reducing error in cluster-expansion approximations of protein energies.

    PubMed

    Hahn, Seungsoo; Ashenberg, Orr; Grigoryan, Gevorg; Keating, Amy E

    2010-12-01

    Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by the cluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence-stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design. © 2010 Wiley Periodicals, Inc.

  4. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  5. Efficient HIV-1 inhibition by a 16 nt-long RNA aptamer designed by combining in vitro selection and in silico optimisation strategies

    PubMed Central

    Sánchez-Luque, Francisco J.; Stich, Michael; Manrubia, Susanna; Briones, Carlos; Berzal-Herranz, Alfredo

    2014-01-01

    The human immunodeficiency virus type-1 (HIV-1) genome contains multiple, highly conserved structural RNA domains that play key roles in essential viral processes. Interference with the function of these RNA domains either by disrupting their structures or by blocking their interaction with viral or cellular factors may seriously compromise HIV-1 viability. RNA aptamers are amongst the most promising synthetic molecules able to interact with structural domains of viral genomes. However, aptamer shortening up to their minimal active domain is usually necessary for scaling up production, what requires very time-consuming, trial-and-error approaches. Here we report on the in vitro selection of 64 nt-long specific aptamers against the complete 5′-untranslated region of HIV-1 genome, which inhibit more than 75% of HIV-1 production in a human cell line. The analysis of the selected sequences and structures allowed for the identification of a highly conserved 16 nt-long stem-loop motif containing a common 8 nt-long apical loop. Based on this result, an in silico designed 16 nt-long RNA aptamer, termed RNApt16, was synthesized, with sequence 5′-CCCCGGCAAGGAGGGG-3′. The HIV-1 inhibition efficiency of such an aptamer was close to 85%, thus constituting the shortest RNA molecule so far described that efficiently interferes with HIV-1 replication. PMID:25175101

  6. The Popeye Domain Containing Genes and Their Function as cAMP Effector Proteins in Striated Muscle.

    PubMed

    Brand, Thomas

    2018-03-13

    The Popeye domain containing (POPDC) genes encode transmembrane proteins, which are abundantly expressed in striated muscle cells. Hallmarks of the POPDC proteins are the presence of three transmembrane domains and the Popeye domain, which makes up a large part of the cytoplasmic portion of the protein and functions as a cAMP-binding domain. Interestingly, despite the prediction of structural similarity between the Popeye domain and other cAMP binding domains, at the protein sequence level they strongly differ from each other suggesting an independent evolutionary origin of POPDC proteins. Loss-of-function experiments in zebrafish and mouse established an important role of POPDC proteins for cardiac conduction and heart rate adaptation after stress. Loss-of function mutations in patients have been associated with limb-girdle muscular dystrophy and AV-block. These data suggest an important role of these proteins in the maintenance of structure and function of striated muscle cells.

  7. Structure of the N-terminal domain of the protein Expansion: an ‘Expansion’ to the Smad MH2 fold

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beich-Frandsen, Mads; Aragón, Eric; Llimargas, Marta

    2015-04-01

    Expansion is a modular protein that is conserved in protostomes. The first structure of the N-terminal domain of Expansion has been determined at 1.6 Å resolution and the new Nα-MH2 domain was found to belong to the Smad/FHA superfamily of structures. Gene-expression changes observed in Drosophila embryos after inducing the transcription factor Tramtrack led to the identification of the protein Expansion. Expansion contains an N-terminal domain similar in sequence to the MH2 domain characteristic of Smad proteins, which are the central mediators of the effects of the TGF-β signalling pathway. Apart from Smads and Expansion, no other type of proteinmore » belonging to the known kingdoms of life contains MH2 domains. To compare the Expansion and Smad MH2 domains, the crystal structure of the Expansion domain was determined at 1.6 Å resolution, the first structure of a non-Smad MH2 domain to be characterized to date. The structure displays the main features of the canonical MH2 fold with two main differences: the addition of an α-helical region and the remodelling of a protein-interaction site that is conserved in the MH2 domain of Smads. Owing to these differences, to the new domain was referred to as Nα-MH2. Despite the presence of the Nα-MH2 domain, Expansion does not participate in TGF-β signalling; instead, it is required for other activities specific to the protostome phyla. Based on the structural similarities to the MH2 fold, it is proposed that the Nα-MH2 domain should be classified as a new member of the Smad/FHA superfamily.« less

  8. Structural and Histone Binding Ability Characterizations of Human PWWP Domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Hong; Zeng, Hong; Lam, Robert

    2013-09-25

    The PWWP domain was first identified as a structural motif of 100-130 amino acids in the WHSC1 protein and predicted to be a protein-protein interaction domain. It belongs to the Tudor domain 'Royal Family', which consists of Tudor, chromodomain, MBT and PWWP domains. While Tudor, chromodomain and MBT domains have long been known to bind methylated histones, PWWP was shown to exhibit histone binding ability only until recently. The PWWP domain has been shown to be a DNA binding domain, but sequence analysis and previous structural studies show that the PWWP domain exhibits significant similarity to other 'Royal Family' members,more » implying that the PWWP domain has the potential to bind histones. In order to further explore the function of the PWWP domain, we used the protein family approach to determine the crystal structures of the PWWP domains from seven different human proteins. Our fluorescence polarization binding studies show that PWWP domains have weak histone binding ability, which is also confirmed by our NMR titration experiments. Furthermore, we determined the crystal structures of the BRPF1 PWWP domain in complex with H3K36me3, and HDGF2 PWWP domain in complex with H3K79me3 and H4K20me3. PWWP proteins constitute a new family of methyl lysine histone binders. The PWWP domain consists of three motifs: a canonical {beta}-barrel core, an insertion motif between the second and third {beta}-strands and a C-terminal {alpha}-helix bundle. Both the canonical {beta}-barrel core and the insertion motif are directly involved in histone binding. The PWWP domain has been previously shown to be a DNA binding domain. Therefore, the PWWP domain exhibits dual functions: binding both DNA and methyllysine histones.« less

  9. Domain organizations of modular extracellular matrix proteins and their evolution.

    PubMed

    Engel, J

    1996-11-01

    Multidomain proteins which are composed of modular units are a rather recent invention of evolution. Domains are defined as autonomously folding regions of a protein, and many of them are similar in sequence and structure, indicating common ancestry. Their modular nature is emphasized by frequent repetitions in identical or in different proteins and by a large number of different combinations with other domains. The extracellular matrix is perhaps the largest biological system composed of modular mosaic proteins, and its astonishing complexity and diversity are based on them. A cluster of minireviews on modular proteins is being published in Matrix Biology. These deal with the evolution of modular proteins, the three-dimensional structure of domains and the ways in which these interact in a multidomain protein. They discuss structure-function relationships in calcium binding domains, collagen helices, alpha-helical coiled-coil domains and C-lectins. The present minireview is focused on some general aspects and serves as an introduction to the cluster.

  10. A multi-objective optimization approach accurately resolves protein domain architectures

    PubMed Central

    Bernardes, J.S.; Vieira, F.R.J.; Zaverucha, G.; Carbone, A.

    2016-01-01

    Motivation: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution. Results: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them. Availability and implementation: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA. Contact: juliana.silva_bernardes@upmc.fr or alessandra.carbone@lip6.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26458889

  11. Investigation of interaction between Pax-5 isoforms and thioredoxin using de novo modelling methods.

    PubMed

    Cuperlovic-Culf, Miroslava; Robichaud, Gilles A; Nardini, Michel; Ouellette, Rodney J

    2003-01-01

    Pax-5 transcription factor plays a crucial role in B-cell development, activation and differentiation. In murine B-cells four different isoforms of Pax-5 have been identified, and their role in the regulation of the activity of the wild-type protein was revealed although still not fully understood. Using theoretical methods, we investigated the properties of one region of the Pax-5e and Pax-5d isoforms (named UDE domain) and we present a possible theoretical model for the interaction of this domain with thioredoxin that have been previously postulated based on the experimental results. Domain UDE (MW 4.8 kDa) is characterised by an extremely high ratio of positively charged residues (8) in comparisons to negatively charged amino acids (3), as well as unusually large concentrations of prolines (11.6%) and cysteines (4.7%). This is indicative of its role in protein-protein interaction. The experimental 3D structure for either UDE domain or for any analogous sequence is not yet available, and therefore we resorted to various bioinformatics methods in order to predict the secondary and 3D structure from the primary sequence of UDE. Physicochemical properties of the predicted UDE structure gave more indication about possibilities for UDE-thioredoxin binding. In addition, UDE domain was shown to have both sequence and structure analogous to a segment of NAD-reducing hydrogenase HOXS a subunit which is believed to interact with thioredoxin. These studies showed that the UDE domain in Pax-5d and Pax-5e represents an ideal binding site for thioredoxin and we developed a model of UDE-TRX complex with two disulphide bridges. The active site of thioredoxin remained exposed after binding to UDE in this model and therefore binding of thioredoxin to Pax-5d could explain the unexpectedly high resistance of this isoform to oxidation. The complex between thioredoxin and Pax-5e can be a method for transportation of thioredoxin into the nucleus and also into the the vicinity of Pax-5a, explaining the observed activator role of Pax-5e.

  12. The structure of the catalytic domain of a plant cellulose synthase and its assembly into dimers

    DOE PAGES

    Olek, Anna T.; Rayon, Catherine; Makowski, Lee; ...

    2014-07-10

    Cellulose microfibrils are para-crystalline arrays of several dozen linear (1→4)-β-d-glucan chains synthesized at the surface of the cell membrane by large, multimeric complexes of synthase proteins. Recombinant catalytic domains of rice ( Oryza sativa) CesA8 cellulose synthase form dimers reversibly as the fundamental scaffold units of architecture in the synthase complex. Specificity of binding to UDP and UDP-Glc indicates a properly folded protein, and binding kinetics indicate that each monomer independently synthesizes single glucan chains of cellulose, i.e., two chains per dimer pair. In contrast to structure modeling predictions, solution x-ray scattering studies demonstrate that the monomer is a two-domain,more » elongated structure, with the smaller domain coupling two monomers into a dimer. The catalytic core of the monomer is accommodated only near its center, with the plant-specific sequences occupying the small domain and an extension distal to the catalytic domain. This configuration is in stark contrast to the domain organization obtained in predicted structures of plant CesA. As a result, the arrangement of the catalytic domain within the CesA monomer and dimer provides a foundation for constructing structural models of the synthase complex and defining the relationship between the rosette structure and the cellulose microfibrils they synthesize.« less

  13. The structure of the catalytic domain of a plant cellulose synthase and its assembly into dimers.

    PubMed

    Olek, Anna T; Rayon, Catherine; Makowski, Lee; Kim, Hyung Rae; Ciesielski, Peter; Badger, John; Paul, Lake N; Ghosh, Subhangi; Kihara, Daisuke; Crowley, Michael; Himmel, Michael E; Bolin, Jeffrey T; Carpita, Nicholas C

    2014-07-01

    Cellulose microfibrils are para-crystalline arrays of several dozen linear (1→4)-β-d-glucan chains synthesized at the surface of the cell membrane by large, multimeric complexes of synthase proteins. Recombinant catalytic domains of rice (Oryza sativa) CesA8 cellulose synthase form dimers reversibly as the fundamental scaffold units of architecture in the synthase complex. Specificity of binding to UDP and UDP-Glc indicates a properly folded protein, and binding kinetics indicate that each monomer independently synthesizes single glucan chains of cellulose, i.e., two chains per dimer pair. In contrast to structure modeling predictions, solution x-ray scattering studies demonstrate that the monomer is a two-domain, elongated structure, with the smaller domain coupling two monomers into a dimer. The catalytic core of the monomer is accommodated only near its center, with the plant-specific sequences occupying the small domain and an extension distal to the catalytic domain. This configuration is in stark contrast to the domain organization obtained in predicted structures of plant CesA. The arrangement of the catalytic domain within the CesA monomer and dimer provides a foundation for constructing structural models of the synthase complex and defining the relationship between the rosette structure and the cellulose microfibrils they synthesize. © 2014 American Society of Plant Biologists. All rights reserved.

  14. A β-solenoid model of the Pmel17 repeat domain: insights to the formation of functional amyloid fibrils

    NASA Astrophysics Data System (ADS)

    Louros, Nikolaos N.; Baltoumas, Fotis A.; Hamodrakas, Stavros J.; Iconomidou, Vassiliki A.

    2016-02-01

    Pmel17 is a multidomain protein involved in biosynthesis of melanin. This process is facilitated by the formation of Pmel17 amyloid fibrils that serve as a scaffold, important for pigment deposition in melanosomes. A specific luminal domain of human Pmel17, containing 10 tandem imperfect repeats, designated as repeat domain (RPT), forms amyloid fibrils in a pH-controlled mechanism in vitro and has been proposed to be essential for the formation of the fibrillar matrix. Currently, no three-dimensional structure has been resolved for the RPT domain of Pmel17. Here, we examine the structure of the RPT domain by performing sequence threading. The resulting model was subjected to energy minimization and validated through extensive molecular dynamics simulations. Structural analysis indicated that the RPT model exhibits several distinct properties of β-solenoid structures, which have been proposed to be polymerizing components of amyloid fibrils. The derived model is stabilized by an extensive network of hydrogen bonds generated by stacking of highly conserved polar residues of the RPT domain. Furthermore, the key role of invariant glutamate residues is proposed, supporting a pH-dependent mechanism for RPT domain assembly. Conclusively, our work attempts to provide structural insights into the RPT domain structure and to elucidate its contribution to Pmel17 amyloid fibril formation.

  15. SH2 domains: modulators of nonreceptor tyrosine kinase activity.

    PubMed

    Filippakopoulos, Panagis; Müller, Susanne; Knapp, Stefan

    2009-12-01

    The Src homology 2 (SH2) domain is a sequence-specific phosphotyrosine-binding module present in many signaling molecules. In cytoplasmic tyrosine kinases, the SH2 domain is located N-terminally to the catalytic kinase domain (SH1) where it mediates cellular localization, substrate recruitment, and regulation of kinase activity. Initially, structural studies established a role of the SH2 domain stabilizing the inactive state of Src family members. However, biochemical characterization showed that the presence of the SH2 domain is frequently required for catalytic activity, suggesting a crucial function stabilizing the active state of many nonreceptor tyrosine kinases. Recently, the structure of the SH2-kinase domain of Fes revealed that the SH2 domain stabilizes the active kinase conformation by direct interactions with the regulatory helix alphaC. Stabilizing interactions between the SH2 and the kinase domains have also been observed in the structures of active Csk and Abl. Interestingly, mutations in the SH2 domain found in human disease can be explained by SH2 domain destabilization or incorrect positioning of the SH2. Here we summarize our understanding of mechanisms that lead to tyrosine kinase activation by direct interactions mediated by the SH2 domain and discuss how mutations in the SH2 domain trigger kinase inactivation.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Yanfeng; Zheng, Yi; Qin, Ling

    Beta-hydroxyacid dehydrogenase (β-HAD) genes have been identified in all sequenced genomes of eukaryotes and prokaryotes. Their gene products catalyze the NAD+- or NADP+-dependent oxidation of various β-hydroxy acid substrates into their corresponding semialdehyde. In many fungal and bacterial genomes, multiple β-HAD genes are observed leading to the hypothesis that these gene products may have unique, uncharacterized metabolic roles specific to their species. The genomes of Geobacter sulfurreducens and Geobacter metallireducens each contain two potential β-HAD genes. The protein sequences of one pair of these genes, Gs-βHAD (Q74DE4) and Gm-βHAD (Q39R98), have 65% sequence identity and 77% sequence similarity with eachmore » other. Both proteins reduce succinic semialdehyde, a metabolite of the GABA shunt. To further explore the structural and functional characteristics of these two β-HADs with a potentially unique substrate specificity, crystal structures for Gs-βHAD and Gm-βHAD in complex with NADP+ were determined to a resolution of 1.89 Å and 2.07 Å, respectively. The structure of both proteins are similar, composed of 14 α-helices and nine β-strands organized into two domains. Domain One (1-165) adopts a typical Rossmann fold composed of two α/β units: a six-strand parallel β-sheet surrounded by six α-helices (α1 – α6) followed by a mixed three-strand β-sheet surrounded by two α-helices (α7 and α8). Domain Two (166-287) is composed of a bundle of seven α-helices (α9 – α14). Four functional regions conserved in all β-HADs are spatially located near each other at the interdomain cleft in both Gs-βHAD and Gm-βHAD with a buried molecule of NADP+. The structural features of Gs-βHAD and Gm-βHAD are described in relation to the four conserved consensus sequences characteristic of β-HADs and the potential biochemical importance of these enzymes as an alternative pathway for the degradation of succinic semialdehyde.« less

  17. C terminal retroviral-type zinc finger domain from the HIV-1 nucleocapsid protein is structurally similar to the N-terminal zinc finger domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    South, T.L.; Blake, P.R.; Hare, D.R.

    Two-dimensional NMR spectroscopic and computational methods were employed for the structure determination of an 18-residue peptide with the amino acid sequence of the C-terminal retriviral-type (r.t.) zinc finger domain from the nucleocapsid protein (NCP) of HIV-1 (Zn(HIV1-F2)). Unlike results obtained for the first retroviral-type zinc finger peptide, Zn (HIV1-F1) broad signals indicative of confomational lability were observed in the {sup 1}H NMR spectrum of An(HIV1-F2) at 25 C. The NMR signals narrowed upon cooling to {minus}2 C, enabling complete {sup 1}H NMR signal assignment via standard two-dimensional (2D) NMR methods. Distance restraints obtained from qualitative analysis of 2D nuclear Overhausermore » effect (NOESY) data were sued to generate 30 distance geometry (DG) structures with penalties in the range 0.02-0.03 {angstrom}{sup 2}. All structures were qualitatively consistent with the experimental NOESY spectrum based on comparisons with 2D NOESY back-calculated spectra. These results indicate that the r.t. zinc finger sequences observed in retroviral NCPs, simple plant virus coat proteins, and in a human single-stranded nucleic acid binding protein share a common structural motif.« less

  18. Crystal structure of P58(IPK) TPR fragment reveals the mechanism for its molecular chaperone activity in UPR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tao, Jiahui; Petrova, Kseniya; Ron, David

    2010-05-25

    P58(IPK) might function as an endoplasmic reticulum molecular chaperone to maintain protein folding homeostasis during unfolded protein responses. P58(IPK) contains nine tetratricopeptide repeat (TPR) motifs and a C-terminal J-domain within its primary sequence. To investigate the mechanism by which P58(IPK) functions to promote protein folding within the endoplasmic reticulum, we have determined the crystal structure of P58(IPK) TPR fragment to 2.5 {angstrom} resolution by the SAD method. The crystal structure of P58(IPK) revealed three domains (I-III) with similar folds and each domain contains three TPR motifs. An ELISA assay indicated that P58(IPK) acts as a molecular chaperone by interacting withmore » misfolded proteins such as luciferase and rhodanese. The P58(IPK) structure reveals a conserved hydrophobic patch located in domain I that might be involved in binding the misfolded polypeptides. Structure-based mutagenesis for the conserved hydrophobic residues located in domain I significantly reduced the molecular chaperone activity of P58(IPK).« less

  19. Cloning and analysis of DnaJ family members in the silkworm, Bombyx mori.

    PubMed

    Li, Yinü; Bu, Cuiyu; Li, Tiantian; Wang, Shibao; Jiang, Feng; Yi, Yongzhu; Yang, Huipeng; Zhang, Zhifang

    2016-01-15

    Heat shock proteins (Hsps) are involved in a variety of critical biological functions, including protein folding, degradation, and translocation and macromolecule assembly, act as molecular chaperones during periods of stress by binding to other proteins. Using expressed sequence tag (EST) and silkworm (Bombyx mori) transcriptome databases, we identified 27 cDNA sequences encoding the conserved J domain, which is found in DnaJ-type Hsps. Of the 27 J domain-containing sequences, 25 were complete cDNA sequences. We divided them into three types according to the number and presence of conserved domains. By analyzing the gene structures, intron numbers, and conserved domains and constructing a phylogenetic tree, we found that the DnaJ family had undergone convergent evolution, obtaining new domains to expand the diversity of its family members. The acquisition of the new DnaJ domains most likely occurred prior to the evolutionary divergence of prokaryotes and eukaryotes. The expression of DnaJ genes in the silkworm was generally higher in the fat body. The tissue distribution of DnaJ1 proteins was detected by western blotting, demonstrating that in the fifth-instar larvae, the DnaJ1 proteins were expressed at their highest levels in hemocytes, followed by the fat body and head. We also found that the DnaJ1 transcripts were likely differentially translated in different tissues. Using immunofluorescence cytochemistry, we revealed that in the blood cells, DnaJ1 was mainly localized in the cytoplasm. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. C-terminal activating and inhibitory domains determine the transactivation potential of BSAP (Pax-5), Pax-2 and Pax-8.

    PubMed Central

    Dörfler, P; Busslinger, M

    1996-01-01

    Pax-5 encodes the transcription factor BSAP which plays an essential role in early B cell development and midbrain patterning. In this study we have analysed the structural requirements for transcriptional activation by BSAP. In vitro mutagenesis and transient transfection experiments indicate that the C-terminal serine/threonine/proline-rich region of BSAP contains a potent transactivation domain of 55 amino acids which is active from promoter and enhancer positions. This transactivation domain was found to be inactivated by a naturally occurring frameshift mutation in one PAX-5 allele of the acute lymphoblastic leukemia cell line REH. The function of the transactivation domain is negatively regulated by adjacent sequences from the extreme C-terminus. The activating and inhibitory domains function together as an independent regulatory module in different cell types as shown by fusion to the GAL4 DNA binding domain. The same arrangement of positively and negatively acting sequences has been conserved in the mammalian Pax-2 and Pax-8, the zebrafish Pax-b as well as the sea urchin Pax-258 proteins. These data demonstrate that the transcriptional competence of a subfamily of Pax proteins is determined by a C-terminal regulatory module composed of activating and inhibitory sequences. Images PMID:8617244

  1. Transformation of BCC and B2 High Temperature Phases to HCP and Orthorhombic Structures in the Ti-Al-Nb System. Part I: Microstructural Predictions Based on a Subgroup Relation Between Phases

    PubMed Central

    Bendersky, L. A.; Roytburd, A.; Boettinger, W. J.

    1993-01-01

    Possible paths for the constant composition coherent transformation of BCC or B2 high temperature phases to low temperature HCP or Orthorhombic phases in the Ti-Al-Nb system are analyzed using a sequence of ciystallographic structural relationships developed from subgroup symmetry relations. Symmetry elements lost in each step of the sequence determine the possibilities for variants of the low symmetry phase and domains that can be present in the microstructure. The orientation of interdomain interfaces is determined by requiring the existence of a strain-free interface between the domains. Polydomain structures are also determined that minimize elastic energy. Microstructural predictions are made for comparison to experimental results given by Benderslcy and Boettinger [J. Res. Natl. Inst. Stand. Technol. 98, 585 (1993)]. PMID:28053487

  2. Structure of an Antibody in Complex with Its Mucin Domain Linear Epitope That Is Protective against Ebola Virus

    PubMed Central

    Olal, Daniel; Kuehne, Ana I.; Bale, Shridhar; Halfmann, Peter; Hashiguchi, Takao; Fusco, Marnie L.; Lee, Jeffrey E.; King, Liam B.; Kawaoka, Yoshihiro; Dye, John M.

    2012-01-01

    Antibody 14G7 is protective against lethal Ebola virus challenge and recognizes a distinct linear epitope in the prominent mucin-like domain of the Ebola virus glycoprotein GP. The structure of 14G7 in complex with its linear peptide epitope has now been determined to 2.8 Å. The structure shows that this GP sequence forms a tandem β-hairpin structure that binds deeply into a cleft in the antibody-combining site. A key threonine at the apex of one turn is critical for antibody interaction and is conserved among all Ebola viruses. This work provides further insight into the mechanism of protection by antibodies that target the protruding, highly accessible mucin-like domain of Ebola virus and the structural framework for understanding and characterizing candidate immunotherapeutics. PMID:22171276

  3. Structure of an antibody in complex with its mucin domain linear epitope that is protective against Ebola virus.

    PubMed

    Olal, Daniel; Kuehne, Ana I; Bale, Shridhar; Halfmann, Peter; Hashiguchi, Takao; Fusco, Marnie L; Lee, Jeffrey E; King, Liam B; Kawaoka, Yoshihiro; Dye, John M; Saphire, Erica Ollmann

    2012-03-01

    Antibody 14G7 is protective against lethal Ebola virus challenge and recognizes a distinct linear epitope in the prominent mucin-like domain of the Ebola virus glycoprotein GP. The structure of 14G7 in complex with its linear peptide epitope has now been determined to 2.8 Å. The structure shows that this GP sequence forms a tandem β-hairpin structure that binds deeply into a cleft in the antibody-combining site. A key threonine at the apex of one turn is critical for antibody interaction and is conserved among all Ebola viruses. This work provides further insight into the mechanism of protection by antibodies that target the protruding, highly accessible mucin-like domain of Ebola virus and the structural framework for understanding and characterizing candidate immunotherapeutics.

  4. Structural and evolutionary relationships of "AT-less" type I polyketide synthase ketosynthases.

    PubMed

    Lohman, Jeremy R; Ma, Ming; Osipiuk, Jerzy; Nocek, Boguslaw; Kim, Youngchang; Chang, Changsoo; Cuff, Marianne; Mack, Jamey; Bigelow, Lance; Li, Hui; Endres, Michael; Babnigg, Gyorgy; Joachimiak, Andrzej; Phillips, George N; Shen, Ben

    2015-10-13

    Acyltransferase (AT)-less type I polyketide synthases (PKSs) break the type I PKS paradigm. They lack the integrated AT domains within their modules and instead use a discrete AT that acts in trans, whereas a type I PKS module minimally contains AT, acyl carrier protein (ACP), and ketosynthase (KS) domains. Structures of canonical type I PKS KS-AT didomains reveal structured linkers that connect the two domains. AT-less type I PKS KSs have remnants of these linkers, which have been hypothesized to be AT docking domains. Natural products produced by AT-less type I PKSs are very complex because of an increased representation of unique modifying domains. AT-less type I PKS KSs possess substrate specificity and fall into phylogenetic clades that correlate with their substrates, whereas canonical type I PKS KSs are monophyletic. We have solved crystal structures of seven AT-less type I PKS KS domains that represent various sequence clusters, revealing insight into the large structural and subtle amino acid residue differences that lead to unique active site topologies and substrate specificities. One set of structures represents a larger group of KS domains from both canonical and AT-less type I PKSs that accept amino acid-containing substrates. One structure has a partial AT-domain, revealing the structural consequences of a type I PKS KS evolving into an AT-less type I PKS KS. These structures highlight the structural diversity within the AT-less type I PKS KS family, and most important, provide a unique opportunity to study the molecular evolution of substrate specificity within the type I PKSs.

  5. Structural and evolutionary relationships of “AT-less” type I polyketide synthase ketosynthases

    PubMed Central

    Lohman, Jeremy R.; Ma, Ming; Osipiuk, Jerzy; Nocek, Boguslaw; Kim, Youngchang; Chang, Changsoo; Cuff, Marianne; Mack, Jamey; Bigelow, Lance; Li, Hui; Endres, Michael; Babnigg, Gyorgy; Joachimiak, Andrzej; Phillips, George N.; Shen, Ben

    2015-01-01

    Acyltransferase (AT)-less type I polyketide synthases (PKSs) break the type I PKS paradigm. They lack the integrated AT domains within their modules and instead use a discrete AT that acts in trans, whereas a type I PKS module minimally contains AT, acyl carrier protein (ACP), and ketosynthase (KS) domains. Structures of canonical type I PKS KS-AT didomains reveal structured linkers that connect the two domains. AT-less type I PKS KSs have remnants of these linkers, which have been hypothesized to be AT docking domains. Natural products produced by AT-less type I PKSs are very complex because of an increased representation of unique modifying domains. AT-less type I PKS KSs possess substrate specificity and fall into phylogenetic clades that correlate with their substrates, whereas canonical type I PKS KSs are monophyletic. We have solved crystal structures of seven AT-less type I PKS KS domains that represent various sequence clusters, revealing insight into the large structural and subtle amino acid residue differences that lead to unique active site topologies and substrate specificities. One set of structures represents a larger group of KS domains from both canonical and AT-less type I PKSs that accept amino acid-containing substrates. One structure has a partial AT-domain, revealing the structural consequences of a type I PKS KS evolving into an AT-less type I PKS KS. These structures highlight the structural diversity within the AT-less type I PKS KS family, and most important, provide a unique opportunity to study the molecular evolution of substrate specificity within the type I PKSs. PMID:26420866

  6. Phylogenetic and specificity studies of two-domain GNA-related lectins: generation of multispecificity through domain duplication and divergent evolution

    PubMed Central

    Van Damme, Els J. M.; Nakamura-Tsuruta, Sachiko; Smith, David F.; Ongenaert, Maté; Winter, Harry C.; Rougé, Pierre; Goldstein, Irwin J.; Mo, Hanqing; Kominami, Junko; Culerrier, Raphaël; Barre, Annick; Hirabayashi, Jun; Peumans, Willy J.

    2007-01-01

    A re-investigation of the occurrence and taxonomic distribution of proteins built up of protomers consisting of two tandem arrayed domains equivalent to the GNA [Galanthus nivalis (snowdrop) agglutinin] revealed that these are widespread among monotyledonous plants. Phylogenetic analysis of the available sequences indicated that these proteins do not represent a monophylogenetic group but most probably result from multiple independent domain duplication/in tandem insertion events. To corroborate the relationship between inter-domain sequence divergence and the widening of specificity range, a detailed comparative analysis was made of the sequences and specificity of a set of two-domain GNA-related lectins. Glycan microarray analyses, frontal affinity chromatography and surface plasmon resonance measurements demonstrated that the two-domain GNA-related lectins acquired a marked diversity in carbohydrate-binding specificity that strikingly contrasts the canonical exclusive specificity of their single domain counterparts towards mannose. Moreover, it appears that most two-domain GNA-related lectins interact with both high mannose and complex N-glycans and that this dual specificity relies on the simultaneous presence of at least two different independently acting binding sites. The combined phylogenetic, specificity and structural data strongly suggest that plants used domain duplication followed by divergent evolution as a mechanism to generate multispecific lectins from a single mannose-binding domain. Taking into account that the shift in specificity of some binding sites from high mannose to complex type N-glycans implies that the two-domain GNA-related lectins are primarily directed against typical animal glycans, it is tempting to speculate that plants developed two-domain GNA-related lectins for defence purposes. PMID:17288538

  7. Adaptive frequency-domain equalization in digital coherent optical receivers.

    PubMed

    Faruk, Md Saifuddin; Kikuchi, Kazuro

    2011-06-20

    We propose a novel frequency-domain adaptive equalizer in digital coherent optical receivers, which can reduce computational complexity of the conventional time-domain adaptive equalizer based on finite-impulse-response (FIR) filters. The proposed equalizer can operate on the input sequence sampled by free-running analog-to-digital converters (ADCs) at the rate of two samples per symbol; therefore, the arbitrary initial sampling phase of ADCs can be adjusted so that the best symbol-spaced sequence is produced. The equalizer can also be configured in the butterfly structure, which enables demultiplexing of polarization tributaries apart from equalization of linear transmission impairments. The performance of the proposed equalization scheme is verified by 40-Gbits/s dual-polarization quadrature phase-shift keying (QPSK) transmission experiments.

  8. Nodal domains of a non-separable problem—the right-angled isosceles triangle

    NASA Astrophysics Data System (ADS)

    Aronovitch, Amit; Band, Ram; Fajman, David; Gnutzmann, Sven

    2012-03-01

    We study the nodal set of eigenfunctions of the Laplace operator on the right-angled isosceles triangle. A local analysis of the nodal pattern provides an algorithm for computing the number νn of nodal domains for any eigenfunction. In addition, an exact recursive formula for the number of nodal domains is found to reproduce all existing data. Eventually, we use the recursion formula to analyse a large sequence of nodal counts statistically. Our analysis shows that the distribution of nodal counts for this triangular shape has a much richer structure than the known cases of regular separable shapes or completely irregular shapes. Furthermore, we demonstrate that the nodal count sequence contains information about the periodic orbits of the corresponding classical ray dynamics.

  9. Consolidation of glycosyl hydrolase family 30 : a dual domain 4/7 hydrolase family consisting of two structurally distinct groups

    Treesearch

    Franz J. St John; Javier M. Gonzalez; Edwin Pozharski

    2010-01-01

    In this work glycosyl hydrolase (GH) family 30 (GH30) is analyzed and shown to consist of its currently classified member sequences as well as several homologous sequence groups currently assigned within family GH5. A large scale amino acid sequence alignment and a phylogenetic tree were generated and GH30 groups and subgroups were designated. A partial rearrangement...

  10. The Pekin duck programmed death-ligand 1: cDNA cloning, genomic structure, molecular characterization and mRNA expression analysis.

    PubMed

    Yao, Q; Fischer, K P; Tyrrell, D L; Gutfreund, K S

    2015-04-01

    Programmed death ligand-1 (PD-L1) plays an important role in the attenuation of adaptive immune responses in higher vertebrates. Here, we describe the identification of the Pekin duck PD-L1 orthologue (duPD-L1) and its gene structure. The duPD-L1 cDNA encodes a 311-amino acid protein that has an amino acid identity of 78% and 42% with chicken and human PD-L1, respectively. Mapping of the duPD-L1 cDNA with duck genomic sequences revealed an exonic structure of its coding sequence similar to those of other vertebrates but lacked a noncoding exon 1. Homology modelling of the duPD-L1 extracellular domain was compatible with the tandem IgV-like and IgC-like IgSF domain structure of human PD-L1 (PDB ID: 3BIS). Residues known to be important for receptor binding of human PD-L1 were mostly conserved in duPD-L1 within the N-terminus and the G sheet, and partially conserved within the F sheet but not within sheets C and C'. DuPD-L1 mRNA was constitutively expressed in all tissues examined with highest expression levels in lung and spleen and very low levels of expression in muscle, kidney and brain. Mitogen stimulation of duck peripheral blood mononuclear cells transiently increased duPD-L1 mRNA expression. Our observations demonstrate evolutionary conservation of the exonic structure of its coding sequence, the extracellular domain structure and residues implicated in receptor binding, but the role of the longer cytoplasmic tail in avian PD-L1 proteins remains to be determined. © 2014 John Wiley & Sons Ltd.

  11. BioSAVE: display of scored annotation within a sequence context.

    PubMed

    Pollock, Richard F; Adryan, Boris

    2008-03-20

    Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage.

  12. BioSAVE: Display of scored annotation within a sequence context

    PubMed Central

    Pollock, Richard F; Adryan, Boris

    2008-01-01

    Background Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. Results We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Conclusion Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage. PMID:18366701

  13. Formin homology 2 domains occur in multiple contexts in angiosperms

    PubMed Central

    Cvrčková, Fatima; Novotný, Marian; Pícková, Denisa; Žárský, Viktor

    2004-01-01

    Background Involvement of conservative molecular modules and cellular mechanisms in the widely diversified processes of eukaryotic cell morphogenesis leads to the intriguing question: how do similar proteins contribute to dissimilar morphogenetic outputs. Formins (FH2 proteins) play a central part in the control of actin organization and dynamics, providing a good example of evolutionarily versatile use of a conserved protein domain in the context of a variety of lineage-specific structural and signalling interactions. Results In order to identify possible plant-specific sequence features within the FH2 protein family, we performed a detailed analysis of angiosperm formin-related sequences available in public databases, with particular focus on the complete Arabidopsis genome and the nearly finished rice genome sequence. This has led to revision of the current annotation of half of the 22 Arabidopsis formin-related genes. Comparative analysis of the two plant genomes revealed a good conservation of the previously described two subfamilies of plant formins (Class I and Class II), as well as several subfamilies within them that appear to predate the separation of monocot and dicot plants. Moreover, a number of plant Class II formins share an additional conserved domain, related to the protein phosphatase/tensin/auxilin fold. However, considerable inter-species variability sets limits to generalization of any functional conclusions reached on a single species such as Arabidopsis. Conclusions The plant-specific domain context of the conserved FH2 domain, as well as plant-specific features of the domain itself, may reflect distinct functional requirements in plant cells. The variability of formin structures found in plants far exceeds that known from both fungi and metazoans, suggesting a possible contribution of FH2 proteins in the evolution of the plant type of multicellularity. PMID:15256004

  14. The structure of a conserved Piezo channel domain reveals a novel beta sandwich fold

    PubMed Central

    Kamajaya, Aron; Kaiser, Jens; Lee, Jonas; Reid, Michelle; Rees, Douglas C.

    2014-01-01

    Summary Piezo has recently been identified as a family of eukaryotic mechanosensitive channels composed of subunits containing over 2000 amino acids, without recognizable sequence similarity to other channels. Here, we present the crystal structure of a large, conserved extramembrane domain located just before the last predicted transmembrane helix of C. elegans PIEZO, which adopts a novel beta sandwich fold. The structure was also determined of a point mutation located on a conserved surface at the position equivalent to the human PIEZO1 mutation found in Dehydrated Hereditary Stomatocytosis (DHS) patients (M2225R). While the point mutation does not change the overall domain structure, it does alter the surface electrostatic potential that may perturb interactions with a yet-to-be identified ligand or protein. The lack of structural similarity between this domain and any previously characterized fold, including those of eukaryotic and bacterial channels, highlights the distinctive nature of the Piezo family of eukaryotic mechanosensitive channels. PMID:25242456

  15. The structure of a conserved piezo channel domain reveals a topologically distinct β sandwich fold.

    PubMed

    Kamajaya, Aron; Kaiser, Jens T; Lee, Jonas; Reid, Michelle; Rees, Douglas C

    2014-10-07

    Piezo has recently been identified as a family of eukaryotic mechanosensitive channels composed of subunits containing over 2,000 amino acids, without recognizable sequence similarity to other channels. Here, we present the crystal structure of a large, conserved extramembrane domain located just before the last predicted transmembrane helix of C. elegans PIEZO, which adopts a topologically distinct β sandwich fold. The structure was also determined of a point mutation located on a conserved surface at the position equivalent to the human PIEZO1 mutation found in dehydrated hereditary stomatocytosis patients (M2225R). While the point mutation does not change the overall domain structure, it does alter the surface electrostatic potential that may perturb interactions with a yet-to-be-identified ligand or protein. The lack of structural similarity between this domain and any previously characterized fold, including those of eukaryotic and bacterial channels, highlights the distinctive nature of the Piezo family of eukaryotic mechanosensitive channels. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.

    PubMed

    Libbrecht, Maxwell W; Bilmes, Jeffrey A; Noble, William Stafford

    2018-04-01

    Selecting a non-redundant representative subset of sequences is a common step in many bioinformatics workflows, such as the creation of non-redundant training sets for sequence and structural models or selection of "operational taxonomic units" from metagenomics data. Previous methods for this task, such as CD-HIT, PISCES, and UCLUST, apply a heuristic threshold-based algorithm that has no theoretical guarantees. We propose a new approach based on submodular optimization. Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success for other representative set selection problems. We demonstrate that the submodular optimization approach results in representative protein sequence subsets with greater structural diversity than sets chosen by existing methods, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by competing approaches. We also show how the optimization framework allows us to design a mixture objective function that performs well for both large and small representative sets. The framework we describe is the best possible in polynomial time (under some assumptions), and it is flexible and intuitive because it applies a suite of generic methods to optimize one of a variety of objective functions. © 2018 Wiley Periodicals, Inc.

  17. Proline Restricts Loop I Conformation of the High Affinity WW Domain from Human Nedd4-1 to a Ligand Binding-Competent Type I β-Turn.

    PubMed

    Schulte, Marianne; Panwalkar, Vineet; Freischem, Stefan; Willbold, Dieter; Dingley, Andrew J

    2018-04-19

    Sequence alignment of the four WW domains from human Nedd4-1 (neuronal precursor cell expressed developmentally down-regulated gene 4-1) reveals that the highest sequence diversity exists in loop I. Three residues in this type I β-turn interact with the PPxY motif of the human epithelial Na + channel (hENaC) subunits, indicating that peptide affinity is defined by the loop I sequence. The third WW domain (WW3*) has the highest ligand affinity and unlike the other three hNedd4-1 WW domains or other WW domains studied contains the highly statistically preferred proline at the ( i + 1) position found in β-turns. In this report, molecular dynamics simulations and experimental data were combined to characterize loop I stability and dynamics. Exchange of the proline to the equivalent residue in WW4 (Thr) results in the presence of a predominantly open seven residue Ω loop rather than the type I β-turn conformation for the wild-type apo-WW3*. In the presence of the ligand, the structure of the mutated loop I is locked into a type I β-turn. Thus, proline in loop I ensures a stable peptide binding-competent β-turn conformation, indicating that amino acid sequence modulates local flexibility to tune binding preferences and stability of dynamic interaction motifs.

  18. Structure of adenovirus bound to cellular receptor car

    DOEpatents

    Freimuth, Paul I.

    2004-05-18

    Disclosed is a mutant adenovirus which has a genome comprising one or more mutations in sequences which encode the fiber protein knob domain wherein the mutation causes the encoded viral particle to have significantly weakened binding affinity for CARD1 relative to wild-type adenovirus. Such mutations may be in sequences which encode either the AB loop, or the HI loop of the fiber protein knob domain. Specific residues and mutations are described. Also disclosed is a method for generating a mutant adenovirus which is characterized by a receptor binding affinity or specificity which differs substantially from wild type. In the method, residues of the adenovirus fiber protein knob domain which are predicted to alter D1 binding when mutated, are identified from the crystal structure coordinates of the AD12knob:CAR-D1 complex. A mutation which alters one or more of the identified residues is introduced into the genome of the adenovirus to generate a mutant adenovirus. Whether or not the mutant produced exhibits altered adenovirus-CAR binding properties is then determined.

  19. Oxyanion Induced Variations in Domain Structure for Amorphous Cobalt Oxide Oxygen Evolving Catalysts, Resolved by X-ray Pair Distribution Function Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwon, Gihan; Kokhan, Oleksandr; Han, Ali

    Amorphous thin film oxygen evolving catalysts, OECs, of first-row transition metals show promise to serve as self-assembling photoanode materials in solar-driven, photoelectrochemical `artificial leaf' devices. This report demonstrates the ability to use high-energy X-ray scattering and atomic pair distribution function analysis, PDF, to resolve structure in amorphous metal oxide catalyst films. The analysis is applied here to resolve domain structure differences induced by oxyanion substitution during the electrochemical assembly of amorphous cobalt oxide catalyst films, Co-OEC. PDF patterns for Co-OEC films formed using phosphate, Pi, methylphosphate, MPi, and borate, Bi, electrolyte buffers show that the resulting domains vary in sizemore » following the sequence Pi < MPi < Bi. The increases in domain size for CoMPi and CoBi were found to be correlated with increases in the contributions from bilayer and trilayer stacked domains having structures intermediate between those of the LiCoOO and CoO(OH) mineral forms. The lattice structures and offset stacking of adjacent layers in the partially stacked CoMPi and CoBi domains were best matched to those in the LiCoOO layered structure. The results demonstrate the ability of PDF analysis to elucidate features of domain size, structure, defect content and mesoscale organization for amorphous metal oxide catalysts that are not readily accessed by other X-ray techniques. Finally, PDF structure analysis is shown to provide a way to characterize domain structures in different forms of amorphous oxide catalysts, and hence provide an opportunity to investigate correlations between domain structure and catalytic activity.« less

  20. Oxyanion Induced Variations in Domain Structure for Amorphous Cobalt Oxide Oxygen Evolving Catalysts, Resolved by X-ray Pair Distribution Function Analysis

    DOE PAGES

    Kwon, Gihan; Kokhan, Oleksandr; Han, Ali; ...

    2015-12-01

    Amorphous thin film oxygen evolving catalysts, OECs, of first-row transition metals show promise to serve as self-assembling photoanode materials in solar-driven, photoelectrochemical `artificial leaf' devices. This report demonstrates the ability to use high-energy X-ray scattering and atomic pair distribution function analysis, PDF, to resolve structure in amorphous metal oxide catalyst films. The analysis is applied here to resolve domain structure differences induced by oxyanion substitution during the electrochemical assembly of amorphous cobalt oxide catalyst films, Co-OEC. PDF patterns for Co-OEC films formed using phosphate, Pi, methylphosphate, MPi, and borate, Bi, electrolyte buffers show that the resulting domains vary in sizemore » following the sequence Pi < MPi < Bi. The increases in domain size for CoMPi and CoBi were found to be correlated with increases in the contributions from bilayer and trilayer stacked domains having structures intermediate between those of the LiCoOO and CoO(OH) mineral forms. The lattice structures and offset stacking of adjacent layers in the partially stacked CoMPi and CoBi domains were best matched to those in the LiCoOO layered structure. The results demonstrate the ability of PDF analysis to elucidate features of domain size, structure, defect content and mesoscale organization for amorphous metal oxide catalysts that are not readily accessed by other X-ray techniques. Finally, PDF structure analysis is shown to provide a way to characterize domain structures in different forms of amorphous oxide catalysts, and hence provide an opportunity to investigate correlations between domain structure and catalytic activity.« less

  1. The Enigmatic Origin of Papillomavirus Protein Domains

    PubMed Central

    Kirsip, Heleri; Gaston, Kevin

    2017-01-01

    Almost a century has passed since the discovery of papillomaviruses. A few decades of research have given a wealth of information on the molecular biology of papillomaviruses. Several excellent studies have been performed looking at the long- and short-term evolution of these viruses. However, when and how papillomaviruses originate is still a mystery. In this study, we systematically searched the (sequenced) biosphere to find distant homologs of papillomaviral protein domains. Our data show that, even including structural information, which allows us to find deeper evolutionary relationships compared to sequence-only based methods, only half of the protein domains in papillomaviruses have relatives in the rest of the biosphere. We show that the major capsid protein L1 and the replication protein E1 have relatives in several viral families, sharing three protein domains with Polyomaviridae and Parvoviridae. However, only the E1 replication protein has connections with cellular organisms. Most likely, the papillomavirus ancestor is of marine origin, a biotope that is not very well sequenced at the present time. Nevertheless, there is no evidence as to how papillomaviruses originated and how they became vertebrate and epithelium specific. PMID:28832519

  2. The Enigmatic Origin of Papillomavirus Protein Domains.

    PubMed

    Puustusmaa, Mikk; Kirsip, Heleri; Gaston, Kevin; Abroi, Aare

    2017-08-23

    Almost a century has passed since the discovery of papillomaviruses. A few decades of research have given a wealth of information on the molecular biology of papillomaviruses. Several excellent studies have been performed looking at the long- and short-term evolution of these viruses. However, when and how papillomaviruses originate is still a mystery. In this study, we systematically searched the (sequenced) biosphere to find distant homologs of papillomaviral protein domains. Our data show that, even including structural information, which allows us to find deeper evolutionary relationships compared to sequence-only based methods, only half of the protein domains in papillomaviruses have relatives in the rest of the biosphere. We show that the major capsid protein L1 and the replication protein E1 have relatives in several viral families, sharing three protein domains with Polyomaviridae and Parvoviridae . However, only the E1 replication protein has connections with cellular organisms. Most likely, the papillomavirus ancestor is of marine origin, a biotope that is not very well sequenced at the present time. Nevertheless, there is no evidence as to how papillomaviruses originated and how they became vertebrate and epithelium specific.

  3. Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dong, Shuyun; Wang, Yang; Cassidy-Amstutz, Caleb

    2011-10-28

    Pumilio/fem-3 mRNA-binding factor (PUF) proteins possess a recognition code for bases A, U, and G, allowing designed RNA sequence specificity of their modular Pumilio (PUM) repeats. However, recognition side chains in a PUM repeat for cytosine are unknown. Here we report identification of a cytosine-recognition code by screening random amino acid combinations at conserved RNA recognition positions using a yeast three-hybrid system. This C-recognition code is specific and modular as specificity can be transferred to different positions in the RNA recognition sequence. A crystal structure of a modified PUF domain reveals specific contacts between an arginine side chain and themore » cytosine base. We applied the C-recognition code to design PUF domains that recognize targets with multiple cytosines and to generate engineered splicing factors that modulate alternative splicing. Finally, we identified a divergent yeast PUF protein, Nop9p, that may recognize natural target RNAs with cytosine. This work deepens our understanding of natural PUF protein target recognition and expands the ability to engineer PUF domains to recognize any RNA sequence.« less

  4. Identification of an Electrostatic Ruler Motif for Sequence-Specific Binding of Collagenase to Collagen.

    PubMed

    Subramanian, Sundar Raman; Singam, Ettayapuram Ramaprasad Azhagiya; Berinski, Michael; Subramanian, Venkatesan; Wade, Rebecca C

    2016-08-25

    Sequence-specific cleavage of collagen by mammalian collagenase plays a pivotal role in cell function. Collagenases are matrix metalloproteinases that cleave the peptide bond at a specific position on fibrillar collagen. The collagenase Hemopexin-like (HPX) domain has been proposed to be responsible for substrate recognition, but the mechanism by which collagenases identify the cleavage site on fibrillar collagen is not clearly understood. In this study, Brownian dynamics simulations coupled with atomic-detail and coarse-grained molecular dynamics simulations were performed to dock matrix metalloproteinase-1 (MMP-1) on a collagen IIIα1 triple helical peptide. We find that the HPX domain recognizes the collagen triple helix at a conserved R-X11-R motif C-terminal to the cleavage site to which the HPX domain of collagen is guided electrostatically. The binding of the HPX domain between the two arginine residues is energetically stabilized by hydrophobic contacts with collagen. From the simulations and analysis of the sequences and structural flexibility of collagen and collagenase, a mechanistic scheme by which MMP-1 can recognize and bind collagen for proteolysis is proposed.

  5. Non-3D domain swapped crystal structure of truncated zebrafish alphaA crystallin

    PubMed Central

    Laganowsky, A; Eisenberg, D

    2010-01-01

    In previous work on truncated alpha crystallins (Laganowsky et al., Protein Sci 2010; 19:1031–1043), we determined crystal structures of the alpha crystallin core, a seven beta-stranded immunoglobulin-like domain, with its conserved C-terminal extension. These extensions swap into neighboring cores forming oligomeric assemblies. The extension is palindromic in sequence, binding in either of two directions. Here, we report the crystal structure of a truncated alphaA crystallin (AAC) from zebrafish (Danio rerio) revealing C-terminal extensions in a non three-dimensional (3D) domain swapped, “closed” state. The extension is quasi-palindromic, bound within its own zebrafish core domain, lying in the opposite direction to that of bovine AAC, which is bound within an adjacent core domain (Laganowsky et al., Protein Sci 2010; 19:1031–1043). Our findings establish that the C-terminal extension of alpha crystallin proteins can be either 3D domain swapped or non-3D domain swapped. This duality provides another molecular mechanism for alpha crystallin proteins to maintain the polydispersity that is crucial for eye lens transparency. PMID:20669149

  6. Comparison of Saccharomyces cerevisiae F-BAR domain structures reveals a conserved inositol phosphate binding site.

    PubMed

    Moravcevic, Katarina; Alvarado, Diego; Schmitz, Karl R; Kenniston, Jon A; Mendrola, Jeannine M; Ferguson, Kathryn M; Lemmon, Mark A

    2015-02-03

    F-BAR domains control membrane interactions in endocytosis, cytokinesis, and cell signaling. Although they are generally thought to bind curved membranes containing negatively charged phospholipids, numerous functional studies argue that differences in lipid-binding selectivities of F-BAR domains are functionally important. Here, we compare membrane-binding properties of the Saccharomyces cerevisiae F-BAR domains in vitro and in vivo. Whereas some F-BAR domains (such as Bzz1p and Hof1p F-BARs) bind equally well to all phospholipids, the F-BAR domain from the RhoGAP Rgd1p preferentially binds phosphoinositides. We determined X-ray crystal structures of F-BAR domains from Hof1p and Rgd1p, the latter bound to an inositol phosphate. The structures explain phospholipid-binding selectivity differences and reveal an F-BAR phosphoinositide binding site that is fully conserved in a mammalian RhoGAP called Gmip and is partly retained in certain other F-BAR domains. Our findings reveal previously unappreciated determinants of F-BAR domain lipid-binding specificity and provide a basis for its prediction from sequence. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Comparison of Saccharomyces cerevisiae F-BAR Domain Structures Reveals a Conserved Inositol Phosphate Binding Site

    DOE PAGES

    Moravcevic, Katarina; Alvarado, Diego; Schmitz, Karl R.; ...

    2015-01-22

    F-BAR domains control membrane interactions in endocytosis, cytokinesis, and cell signaling. Although they are generally thought to bind curved membranes containing negatively charged phospholipids, numerous functional studies argue that differences in lipid-binding selectivities of F-BAR domains are functionally important. Here in this paper, we compare membrane-binding properties of the Saccharomyces cerevisiae F-BAR domains in vitro and in vivo. Whereas some F-BAR domains (such as Bzz1p and Hof1p F-BARs) bind equally well to all phospholipids, the F-BAR domain from the RhoGAP Rgd1p preferentially binds phosphoinositides. We determined X-ray crystal structures of F-BAR domains from Hof1p and Rgd1p, the latter bound tomore » an inositol phosphate. The structures explain phospholipid-binding selectivity differences and reveal an F-BAR phosphoinositide binding site that is fully conserved in a mammalian RhoGAP called Gmip and is partly retained in certain other F-BAR domains. In conclusion, our findings reveal previously unappreciated determinants of F-BAR domain lipid-binding specificity and provide a basis for its prediction from sequence.« less

  8. Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria.

    PubMed

    Cui, Hongli; Wang, Yipeng; Wang, Yinchu; Qin, Song

    2012-11-16

    Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms.

  9. Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria

    PubMed Central

    2012-01-01

    Background Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Results Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. Conclusions The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms. PMID:23157370

  10. Taxonomic distribution, repeats, and functions of the S1 domain-containing proteins as members of the OB-fold family.

    PubMed

    Deryusheva, Evgeniia I; Machulin, Andrey V; Selivanova, Olga M; Galzitskaya, Oxana V

    2017-04-01

    Proteins of the nucleic acid-binding proteins superfamily perform such functions as processing, transport, storage, stretching, translation, and degradation of RNA. It is one of the 16 superfamilies containing the OB-fold in protein structures. Here, we have analyzed the superfamily of nucleic acid-binding proteins (the number of sequences exceeds 200,000) and obtained that this superfamily prevalently consists of proteins containing the cold shock DNA-binding domain (ca. 131,000 protein sequences). Proteins containing the S1 domain compose 57% from the cold shock DNA-binding domain family. Furthermore, we have found that the S1 domain was identified mainly in the bacterial proteins (ca. 83%) compared to the eukaryotic and archaeal proteins, which are available in the UniProt database. We have found that the number of multiple repeats of S1 domain in the S1 domain-containing proteins depends on the taxonomic affiliation. All archaeal proteins contain one copy of the S1 domain, while the number of repeats in the eukaryotic proteins varies between 1 and 15 and correlates with the protein size. In the bacterial proteins, the number of repeats is no more than 6, regardless of the protein size. The large variation of the repeat number of S1 domain as one of the structural variants of the OB-fold is a distinctive feature of S1 domain-containing proteins. Proteins from the other families and superfamilies have either one OB-fold or change slightly the repeat numbers. On the whole, it can be supposed that the repeat number is a vital for multifunctional activity of the S1 domain-containing proteins. Proteins 2017; 85:602-613. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  11. Insights into Strand Exchange in BTB Domain Dimers from the Crystal Structures of FAZF and Miz1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stogios, Peter J.; Cuesta-Seijo, Jose Antonio; Chen, Lu

    2010-09-22

    The BTB domain is a widely distributed protein-protein interaction motif that is often found at the N-terminus of zinc finger transcription factors. Previous crystal structures of BTB domains have revealed tightly interwound homodimers, with the N-terminus from one chain forming a two-stranded anti-parallel {beta}-sheet with a strand from the other chain. We have solved the crystal structures of the BTB domains from Fanconi anemia zinc finger (FAZF) and Miz1 (Myc-interacting zinc finger 1) to resolutions of 2.0 {angstrom} and 2.6 {angstrom}, respectively. Unlike previous examples of BTB domain structures, the FAZF BTB domain is a nonswapped dimer, with each N-terminalmore » {beta}-strand associated with its own chain. As a result, the dimerization interface in the FAZF BTB domain is about half as large as in the domain-swapped dimers. The Miz1 BTB domain resembles a typical swapped BTB dimer, although it has a shorter N-terminus that is not able to form the interchain sheet. Using cysteine cross-linking, we confirmed that the promyelocytic leukemia zinc finger (PLZF) BTB dimer is strand exchanged in solution, while the FAZF BTB dimer is not. A phylogenic tree of the BTB fold based on both sequence and structural features shows that the common ancestor of the BTB domain in BTB-ZF (bric a brac, tramtrack, broad-complex zinc finger) proteins was a domain-swapped dimer. The differences in the N-termini seen in the FAZF and Miz1 BTB domains appear to be more recent developments in the structural evolution of the domain.« less

  12. ExpandplusCrystal Structures of Poly(ADP-ribose) Polymerase-1 (PARP-1) Zinc Fingers Bound to DNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    M Langelier; J Planck; S Roy

    2011-12-31

    Poly(ADP-ribose) polymerase-1 (PARP-1) has two homologous zinc finger domains, Zn1 and Zn2, that bind to a variety of DNA structures to stimulate poly(ADP-ribose) synthesis activity and to mediate PARP-1 interaction with chromatin. The structural basis for interaction with DNA is unknown, which limits our understanding of PARP-1 regulation and involvement in DNA repair and transcription. Here, we have determined crystal structures for the individual Zn1 and Zn2 domains in complex with a DNA double strand break, providing the first views of PARP-1 zinc fingers bound to DNA. The Zn1-DNA and Zn2-DNA structures establish a novel, bipartite mode of sequence-independent DNAmore » interaction that engages a continuous region of the phosphodiester backbone and the hydrophobic faces of exposed nucleotide bases. Biochemical and cell biological analysis indicate that the Zn1 and Zn2 domains perform distinct functions. The Zn2 domain exhibits high binding affinity to DNA compared with the Zn1 domain. However, the Zn1 domain is essential for DNA-dependent PARP-1 activity in vitro and in vivo, whereas the Zn2 domain is not strictly required. Structural differences between the Zn1-DNA and Zn2-DNA complexes, combined with mutational and structural analysis, indicate that a specialized region of the Zn1 domain is re-configured through the hydrophobic interaction with exposed nucleotide bases to initiate PARP-1 activation.« less

  13. The structural analysis of shark IgNAR antibodies reveals evolutionary principles of immunoglobulins.

    PubMed

    Feige, Matthias J; Gräwert, Melissa A; Marcinowski, Moritz; Hennig, Janosch; Behnke, Julia; Ausländer, David; Herold, Eva M; Peschek, Jirka; Castro, Caitlin D; Flajnik, Martin; Hendershot, Linda M; Sattler, Michael; Groll, Michael; Buchner, Johannes

    2014-06-03

    Sharks and other cartilaginous fish are the phylogenetically oldest living organisms that rely on antibodies as part of their adaptive immune system. They produce the immunoglobulin new antigen receptor (IgNAR), a homodimeric heavy chain-only antibody, as a major part of their humoral adaptive immune response. Here, we report the atomic resolution structure of the IgNAR constant domains and a structural model of this heavy chain-only antibody. We find that despite low sequence conservation, the basic Ig fold of modern antibodies is already present in the evolutionary ancient shark IgNAR domains, highlighting key structural determinants of the ubiquitous Ig fold. In contrast, structural differences between human and shark antibody domains explain the high stability of several IgNAR domains and allowed us to engineer human antibodies for increased stability and secretion efficiency. We identified two constant domains, C1 and C3, that act as dimerization modules within IgNAR. Together with the individual domain structures and small-angle X-ray scattering, this allowed us to develop a structural model of the complete IgNAR molecule. Its constant region exhibits an elongated shape with flexibility and a characteristic kink in the middle. Despite the lack of a canonical hinge region, the variable domains are spaced appropriately wide for binding to multiple antigens. Thus, the shark IgNAR domains already display the well-known Ig fold, but apart from that, this heavy chain-only antibody employs unique ways for dimerization and positioning of functional modules.

  14. Structure of adenovirus bound to cellular receptor car

    DOEpatents

    Freimuth, Paul I.

    2007-01-02

    Disclosed is a mutant CAR-DI-binding adenovirus which has a genome comprising one or more mutations in sequences which encode the fiber protein knob domain wherein the mutation causes the encoded viral particle to have a significantly weakened binding affinity for CAR-DI relative to wild-type adenovirus. Such mutations may be in sequences which encode either the AB loop, or the HI loop of the fiber protein knob domain. Specific residues and mutations are described. Also disclosed is a method for generating a mutant adenovirus which is characterized by a receptor binding affinity or specificity which differs substantially from wild type.

  15. The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms

    PubMed Central

    2012-01-01

    Background The entire evolutionary history of life can be studied using myriad sequences generated by genomic research. This includes the appearance of the first cells and of superkingdoms Archaea, Bacteria, and Eukarya. However, the use of molecular sequence information for deep phylogenetic analyses is limited by mutational saturation, differential evolutionary rates, lack of sequence site independence, and other biological and technical constraints. In contrast, protein structures are evolutionary modules that are highly conserved and diverse enough to enable deep historical exploration. Results Here we build phylogenies that describe the evolution of proteins and proteomes. These phylogenetic trees are derived from a genomic census of protein domains defined at the fold family (FF) level of structural classification. Phylogenomic trees of FF structures were reconstructed from genomic abundance levels of 2,397 FFs in 420 proteomes of free-living organisms. These trees defined timelines of domain appearance, with time spanning from the origin of proteins to the present. Timelines are divided into five different evolutionary phases according to patterns of sharing of FFs among superkingdoms: (1) a primordial protein world, (2) reductive evolution and the rise of Archaea, (3) the rise of Bacteria from the common ancestor of Bacteria and Eukarya and early development of the three superkingdoms, (4) the rise of Eukarya and widespread organismal diversification, and (5) eukaryal diversification. The relative ancestry of the FFs shows that reductive evolution by domain loss is dominant in the first three phases and is responsible for both the diversification of life from a universal cellular ancestor and the appearance of superkingdoms. On the other hand, domain gains are predominant in the last two phases and are responsible for organismal diversification, especially in Bacteria and Eukarya. Conclusions The evolution of functions that are associated with corresponding FFs along the timeline reveals that primordial metabolic domains evolved earlier than informational domains involved in translation and transcription, supporting the metabolism-first hypothesis rather than the RNA world scenario. In addition, phylogenomic trees of proteomes reconstructed from FFs appearing in each of the five phases of the protein world show that trees reconstructed from ancient domain structures were consistently rooted in archaeal lineages, supporting the proposal that the archaeal ancestor is more ancient than the ancestors of other superkingdoms. PMID:22284070

  16. A new family of β-helix proteins with similarities to the polysaccharide lyases

    DOE PAGES

    Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.

    2014-09-27

    Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less

  17. A new family of β-helix proteins with similarities to the polysaccharide lyases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.

    Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less

  18. Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences.

    PubMed

    Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A

    1993-01-01

    Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.

  19. Preservation of protein clefts in comparative models.

    PubMed

    Piedra, David; Lois, Sergi; de la Cruz, Xavier

    2008-01-16

    Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein. We studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality - measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues. We have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.

  20. The selectivity of receptor tyrosine kinase signaling is controlled by a secondary SH2 domain binding site.

    PubMed

    Bae, Jae Hyun; Lew, Erin Denise; Yuzawa, Satoru; Tomé, Francisco; Lax, Irit; Schlessinger, Joseph

    2009-08-07

    SH2 domain-mediated interactions represent a crucial step in transmembrane signaling by receptor tyrosine kinases. SH2 domains recognize phosphotyrosine (pY) in the context of particular sequence motifs in receptor phosphorylation sites. However, the modest binding affinity of SH2 domains to pY containing peptides may not account for and likely represents an oversimplified mechanism for regulation of selectivity of signaling pathways in living cells. Here we describe the crystal structure of the activated tyrosine kinase domain of FGFR1 in complex with a phospholipase Cgamma fragment. The structural and biochemical data and experiments with cultured cells show that the selectivity of phospholipase Cgamma binding and signaling via activated FGFR1 are determined by interactions between a secondary binding site on an SH2 domain and a region in FGFR1 kinase domain in a phosphorylation independent manner. These experiments reveal a mechanism for how SH2 domain selectivity is regulated in vivo to mediate a specific cellular process.

  1. The FOXP2 forkhead domain binds to a variety of DNA sequences with different rates and affinities.

    PubMed

    Webb, Helen; Steeb, Olga; Blane, Ashleigh; Rotherham, Lia; Aron, Shaun; Machanick, Philip; Dirr, Heini; Fanucchi, Sylvia

    2017-07-01

    FOXP2 is a member of the P subfamily of FOX transcription factors, the DNA-binding domain of which is the winged helix forkhead domain (FHD). In this work we show that the FOXP2 FHD is able to bind to various DNA sequences, including a novel sequence identified in this work, with different affinities and rates as detected using surface plasmon resonance. Combining the experimental work with molecular docking, we show that high-affinity sequences remain bound to the protein for longer, form a greater number of interactions with the protein and induce a greater structural change in the protein than low-affinity sequences. We propose a binding model for the FOXP2 FHD that involves three types of binding sequence: low affinity sites which allow for rapid scanning of the genome by the protein in a partially unstructured state; moderate affinity sites which serve to locate the protein near target sites and high-affinity sites which secure the protein to the DNA and induce a conformational change necessary for functional binding and the possible initiation of downstream transcriptional events. © The Authors 2017. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.

  2. Megabase replication domains along the human genome: relation to chromatin structure and genome organisation.

    PubMed

    Audit, Benjamin; Zaghloul, Lamia; Baker, Antoine; Arneodo, Alain; Chen, Chun-Long; d'Aubenton-Carafa, Yves; Thermes, Claude

    2013-01-01

    In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.

  3. A cluster of diagnostic Hsp68 amino acid sites that are identified in Drosophila from the melanogaster species group are concentrated around beta-sheet residues involved with substrate binding.

    PubMed

    Kellett, Mark; McKechnie, Stephen W

    2005-04-01

    The coding region of the hsp68 gene has been amplified, cloned, and sequenced from 10 Drosophila species, 5 from the melanogaster subgroup and 5 from the montium subgroup. When the predicted amino acid sequences are compared with available Hsp70 sequences, patterns of conservation suggest that the C-terminal region should be subdivided according to predominant secondary structure. Conservation levels between Hsp68 and Hsp70 proteins were high in the N-terminal ATPase and adjacent beta-sheet domains, medium in the alpha-helix domain, and low in the C-terminal mobile domain (78%, 72%, 41%, and 21% identity, respectively). A number of amino acid sites were found to be "diagnostic" for Hsp68 (28 of approximately 635 residues). A few of these occur in the ATPase domain (385 residues) but most (75%) are concentrated in the beta-sheet and alpha-helix domains (34% of the protein) with none in the short mobile domain. Five of the diagnostic sites in the beta-sheet domain are clustered around, but not coincident with, functional sites known to be involved in substrate binding. Nearly all of the Hsp70 family length variation occurs in the mobile domain. Within montium subgroup species, 2 nearly identical hsp68 PCR products that differed in length are either different alleles or products of an ancestral hsp68 duplication.

  4. Structural basis for different phosphoinositide specificities of the PX domains of sorting nexins regulating G-protein signaling.

    PubMed

    Mas, Caroline; Norwood, Suzanne J; Bugarcic, Andrea; Kinna, Genevieve; Leneva, Natalya; Kovtun, Oleksiy; Ghai, Rajesh; Ona Yanez, Lorena E; Davis, Jasmine L; Teasdale, Rohan D; Collins, Brett M

    2014-10-10

    Sorting nexins (SNXs) or phox homology (PX) domain containing proteins are central regulators of cell trafficking and signaling. A subfamily of PX domain proteins possesses two unique PX-associated domains, as well as a regulator of G protein-coupled receptor signaling (RGS) domain that attenuates Gαs-coupled G protein-coupled receptor signaling. Here we delineate the structural organization of these RGS-PX proteins, revealing a protein family with a modular architecture that is conserved in all eukaryotes. The one exception to this is mammalian SNX19, which lacks the typical RGS structure but preserves all other domains. The PX domain is a sensor of membrane phosphoinositide lipids and we find that specific sequence alterations in the PX domains of the mammalian RGS-PX proteins, SNX13, SNX14, SNX19, and SNX25, confer differential phosphoinositide binding preferences. Although SNX13 and SNX19 PX domains bind the early endosomal lipid phosphatidylinositol 3-phosphate, SNX14 shows no membrane binding at all. Crystal structures of the SNX19 and SNX14 PX domains reveal key differences, with alterations in SNX14 leading to closure of the binding pocket to prevent phosphoinositide association. Our findings suggest a role for alternative membrane interactions in spatial control of RGS-PX proteins in cell signaling and trafficking. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. The auto-inhibitory state of Rho guanine nucleotide exchange factor ARHGEF5/TIM can be relieved by targeting its SH3 domain with rationally designed peptide aptamers.

    PubMed

    He, Ping; Tan, De-Li; Liu, Hong-Xiang; Lv, Feng-Lin; Wu, Wei

    2015-04-01

    The short isoform of Rho guanine nucleotide exchange factor ARHGEF5 is known as TIM, which plays diverse roles in, for example, tumorigenesis, neuronal development and Src-induced podosome formation through the activation of its substrates, the Rho family of GTPases. The activation is auto-inhibited by a putative helix N-terminal to the DH domain of TIM, which is stabilized by the intramolecular interaction of C-terminal SH3 domain with a poly-proline sequence between the putative helix and the DH domain. In this study, we systematically investigated the structural basis, energetic landscape and biological implication underlying TIM auto-inhibition by using atomistic molecular dynamics simulations and binding free energy analysis. The computational study revealed that the binding of SH3 domain to poly-proline sequence is the prerequisite for the stabilization of TIM auto-inhibition. Thus, it is suggested that targeting SH3 domain with competitors of the poly-proline sequence would be a promising strategy to relieve the auto-inhibitory state of TIM. In this consideration, we rationally designed a number of peptide aptamers for competitively inhibiting the SH3 domain based on modeled TIM structure and computationally generated data. Peptide binding test and guanine nucleotide exchange analysis solidified that these designed peptides can both bind to the SH3 domain potently and activate TIM-catalyzed RhoA exchange reaction effectively. Interestingly, a positive correlation between the peptide affinity and induced exchange activity was observed. In addition, separate mutation of three conserved residues Pro49, Pro52 and Lys54 - they are required for peptide recognition by SH3 domain -- in a designed peptide to Ala would completely abolish the capability of this peptide activating TIM. All these come together to suggest an intrinsic relationship between peptide binding to SH3 domain and the activation of TIM. Copyright © 2015 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.

  6. Conservation and diversification of Msx protein in metazoan evolution.

    PubMed

    Takahashi, Hirokazu; Kamiya, Akiko; Ishiguro, Akira; Suzuki, Atsushi C; Saitou, Naruya; Toyoda, Atsushi; Aruga, Jun

    2008-01-01

    Msx (/msh) family genes encode homeodomain (HD) proteins that control ontogeny in many animal species. We compared the structures of Msx genes from a wide range of Metazoa (Porifera, Cnidaria, Nematoda, Arthropoda, Tardigrada, Platyhelminthes, Mollusca, Brachiopoda, Annelida, Echiura, Echinodermata, Hemichordata, and Chordata) to gain an understanding of the role of these genes in phylogeny. Exon-intron boundary analysis suggested that the position of the intron located N-terminally to the HDs was widely conserved in all the genes examined, including those of cnidarians. Amino acid (aa) sequence comparison revealed 3 new evolutionarily conserved domains, as well as very strong conservation of the HDs. Two of the three domains were associated with Groucho-like protein binding in both a vertebrate and a cnidarian Msx homolog, suggesting that the interaction between Groucho-like proteins and Msx proteins was established in eumetazoan ancestors. Pairwise comparison among the collected HDs and their C-flanking aa sequences revealed that the degree of sequence conservation varied depending on the animal taxa from which the sequences were derived. Highly conserved Msx genes were identified in the Vertebrata, Cephalochordata, Hemichordata, Echinodermata, Mollusca, Brachiopoda, and Anthozoa. The wide distribution of the conserved sequences in the animal phylogenetic tree suggested that metazoan ancestors had already acquired a set of conserved domains of the current Msx family genes. Interestingly, although strongly conserved sequences were recovered from the Vertebrata, Cephalochordata, and Anthozoa, the sequences from the Urochordata and Hydrozoa showed weak conservation. Because the Vertebrata-Cephalochordata-Urochordata and Anthozoa-Hydrozoa represent sister groups in the Chordata and Cnidaria, respectively, Msx sequence diversification may have occurred differentially in the course of evolution. We speculate that selective loss of the conserved domains in Msx family proteins contributed to the diversification of animal body organization.

  7. A generalized analysis of hydrophobic and loop clusters within globular protein sequences

    PubMed Central

    Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

    2007-01-01

    Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072

  8. Structure of the TPR Domain of AIP: Lack of Client Protein Interaction with the C-Terminal α-7 Helix of the TPR Domain of AIP Is Sufficient for Pituitary Adenoma Predisposition

    PubMed Central

    Morgan, Rhodri M. L.; Hernández-Ramírez, Laura C.; Trivellin, Giampaolo; Zhou, Lihong; Roe, S. Mark; Korbonits, Márta; Prodromou, Chrisostomos

    2012-01-01

    Mutations of the aryl hydrocarbon receptor interacting protein (AIP) have been associated with familial isolated pituitary adenomas predisposing to young-onset acromegaly and gigantism. The precise tumorigenic mechanism is not well understood as AIP interacts with a large number of independent proteins as well as three chaperone systems, HSP90, HSP70 and TOMM20. We have determined the structure of the TPR domain of AIP at high resolution, which has allowed a detailed analysis of how disease-associated mutations impact on the structural integrity of the TPR domain. A subset of C-terminal α-7 helix (Cα-7h) mutations, R304* (nonsense mutation), R304Q, Q307* and R325Q, a known site for AhR and PDE4A5 client-protein interaction, occur beyond those that interact with the conserved MEEVD and EDDVE sequences of HSP90 and TOMM20. These C-terminal AIP mutations appear to only disrupt client-protein binding to the Cα-7h, while chaperone binding remains unaffected, suggesting that failure of client-protein interaction with the Cα-7h is sufficient to predispose to pituitary adenoma. We have also identified a molecular switch in the AIP TPR-domain that allows recognition of both the conserved HSP90 motif, MEEVD, and the equivalent sequence (EDDVE) of TOMM20. PMID:23300914

  9. An intact SAM-dependent methyltransferase fold is encoded by the human endothelin-converting enzyme-2 gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tempel, W.; Wu, H.; Dombrovsky, L.

    2010-08-17

    A recent survey of protein expression patterns in patients with Alzheimer's disease (AD) has identified ece2 (chromosome: 3; Locations: 3q27.1) as the most significantly downregulated gene within the tested group. ece2 encodes endothelin-converting enzyme ECE2, a metalloprotease with a role in neuropeptide processing. Deficiency in the highly homologous ECE1 has earlier been linked to increased levels of AD-related {beta}-amyloid peptide in mice, consistent with a role for ECE in the degradation of that peptide. Initially, ECE2 was presumed to resemble ECE1, in that it comprises a single transmembrane region of {approx}20 residues flanked by a small amino-terminal cytosolic segment andmore » a carboxy-terminal lumenar peptidase domain. The carboxy-terminal domain has significant sequence similarity to both neutral endopeptidase, for which an X-ray structure has been determined, and Kell blood group protein. After their initial discovery, multiple isoforms of ECE1 and ECE2 were discovered, generated by alternative splicing of multiple exons. The originally described ece2 transcript, RefSeq NM{_}174046, contains the amino-terminal cytosolic portion followed by the transmembrane region and peptidase domain (Fig. 1, isoform B). Another ece2 transcript, available from the Mammalian Gene Collection under MGC2408 (Fig. 1, isoform C), RefSeq accession NM{_}032331, is predicted to be translated into a 255 residue peptide with low but detectable sequence similarity to known S-adenosyl-L-methionine (SAM)-dependent methyltransferases (SAM-MTs), such as the hypothetical protein TT1324 from Thermus thermophilis, PDB code 2GS9, which shares 30% amino acid sequence identity with ECE2 over 138 residues of the sequence. Intriguingly, another 'elongated' ece2 transcript (Fig. 1, isoform A) (RefSeq NM{_}014693) contains an amino-terminal portion of the putative SAM-MT domain, the transmembrane domain, and the protease domain. This suggests the possibility for coexistence of the putative SAM-MT and protease domains in a single polypeptide and their transmembrane interplay. Although sequence conservation across the SAM-MT family is weak, the structural fold is highly conserved. The most conserved part of this fold is the SAM-binding subdomain, which is shared between MGC2408 and hypothetical protein TT1324. Typically, the SAM-binding subdomain is flanked by a variable Nterminal extension and, at the C-terminus, by a substrate- binding subdomain, which varies enormously in size but preserves a conserved topology with three antiparallel b-strands. The 'elongated' transcript of ece2 lacks this substrate-binding subdomain. To test the hypothesis that the 255 residue ece2 gene product MGC2408 represents a complete SAM-MT fold, we have determined a crystal structure of this protein in the presence of SAH.« less

  10. ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae.

    PubMed

    Albornos, Lucía; Martín, Ignacio; Iglesias, Rebeca; Jiménez, Teresa; Labrador, Emilia; Dopico, Berta

    2012-11-07

    Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found.

  11. ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae

    PubMed Central

    2012-01-01

    Background Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. Results ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. Conclusions We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found. PMID:23134664

  12. A nonadaptive origin of a beneficial trait: in silico selection for free energy of folding leads to the neutral emergence of mutational robustness in single domain proteins.

    PubMed

    Pagan, Rafael F; Massey, Steven E

    2014-02-01

    Proteins are regarded as being robust to the deleterious effects of mutations. Here, the neutral emergence of mutational robustness in a population of single domain proteins is explored using computer simulations. A pairwise contact model was used to calculate the ΔG of folding (ΔG folding) using the three dimensional protein structure of leech eglin C. A random amino acid sequence with low mutational robustness, defined as the average ΔΔG resulting from a point mutation (ΔΔG average), was threaded onto the structure. A population of 1,000 threaded sequences was evolved under selection for stability, using an upper and lower energy threshold. Under these conditions, mutational robustness increased over time in the most common sequence in the population. In contrast, when the wild type sequence was used it did not show an increase in robustness. This implies that the emergence of mutational robustness is sequence specific and that wild type sequences may be close to maximal robustness. In addition, an inverse relationship between ∆∆G average and protein stability is shown, resulting partly from a larger average effect of point mutations in more stable proteins. The emergence of mutational robustness was also observed in the Escherichia coli colE1 Rop and human CD59 proteins, implying that the property may be common in single domain proteins under certain simulation conditions. The results indicate that at least a portion of mutational robustness in small globular proteins might have arisen by a process of neutral emergence, and could be an example of a beneficial trait that has not been directly selected for, termed a "pseudaptation."

  13. [Family of ribosomal proteins S1 contains unique conservative domain].

    PubMed

    Deriusheva, E I; Machulin, A V; Selivanova, O M; Serdiuk, I N

    2010-01-01

    Different representatives of bacteria have different number of amino acid residues in the ribosomal proteins S1. This number varies from 111 (Spiroplasma kunkelii) to 863 a.a. (Treponema pallidum). Traditionally and for lack of this protein three-dimensional structure, its architecture is represented as repeating S1 domains. Number of these domains depends on the protein's length. Domain's quantity and its boundaries data are contained in the specialized databases, such as SMART, Pfam and PROSITE. However, for the same object these data may be very different. For search of domain's quantity and its boundaries, new approach, based on the analysis of dicted secondary structure (PsiPred), was used. This approach allowed us to reveal structural domains in amino acid sequences of S1 proteins and at that number varied from one to six. Alignment of S1 proteins, containing different domain's number, with the S1 RNAbinding domain of Escherichia coli PNPase elicited a fact that in family of ribosomal proteins SI one domain has maximal homology with S1 domain from PNPase. This conservative domain migrates along polypeptide chain and locates in proteins, containing different domain's number, according to specified pattern. In this domain as well in the S1 domain from PNPase, residues Phe-19, Phe-22, His-34, Asp-64 and Arg-68 are clustered on the surface and formed RNA binding site.

  14. Music and language perception: expectations, structural integration, and cognitive sequencing.

    PubMed

    Tillmann, Barbara

    2012-10-01

    Music can be described as sequences of events that are structured in pitch and time. Studying music processing provides insight into how complex event sequences are learned, perceived, and represented by the brain. Given the temporal nature of sound, expectations, structural integration, and cognitive sequencing are central in music perception (i.e., which sounds are most likely to come next and at what moment should they occur?). This paper focuses on similarities in music and language cognition research, showing that music cognition research provides insight into the understanding of not only music processing but also language processing and the processing of other structured stimuli. The hypothesis of shared resources between music and language processing and of domain-general dynamic attention has motivated the development of research to test music as a means to stimulate sensory, cognitive, and motor processes. Copyright © 2012 Cognitive Science Society, Inc.

  15. RaptorX server: a resource for template-based protein structure modeling.

    PubMed

    Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo

    2014-01-01

    Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.

  16. The complete nucleotide sequence of the glnALG operon of Escherichia coli K12.

    PubMed Central

    Miranda-Ríos, J; Sánchez-Pescador, R; Urdea, M; Covarrubias, A A

    1987-01-01

    The nucleotide sequence of the E. coli glnALG operon has been determined. The glnL (ntrB) and glnG (ntrC) genes present a high homology, at the nucleotide and aminoacid levels, with the corresponding genes of Klebsiella pneumoniae. The predicted aminoacid sequence for glutamine synthetase allowed us to locate some of the enzyme domains. The structure of this operon is discussed. PMID:2882477

  17. Spontaneous Unfolding-Refolding of Fibronectin Type III Domains Assayed by Thiol Exchange

    PubMed Central

    Shah, Riddhi; Ohashi, Tomoo; Erickson, Harold P.; Oas, Terrence G.

    2017-01-01

    Globular proteins are not permanently folded but spontaneously unfold and refold on time scales that can span orders of magnitude for different proteins. A longstanding debate in the protein-folding field is whether unfolding rates or folding rates correlate to the stability of a protein. In the present study, we have determined the unfolding and folding kinetics of 10 FNIII domains. FNIII domains are one of the most common protein folds and are present in 2% of animal proteins. FNIII domains are ideal for this study because they have an identical seven-strand β-sandwich structure, but they vary widely in sequence and thermodynamic stability. We assayed thermodynamic stability of each domain by equilibrium denaturation in urea. We then assayed the kinetics of domain opening and closing by a technique known as thiol exchange. For this we introduced a buried Cys at the identical location in each FNIII domain and measured the kinetics of labeling with DTNB over a range of urea concentrations. A global fit of the kinetics data gave the kinetics of spontaneous unfolding and refolding in zero urea. We found that the folding rates were relatively similar, ∼0.1–1 s−1, for the different domains. The unfolding rates varied widely and correlated with thermodynamic stability. Our study is the first to address this question using a set of domains that are structurally homologous but evolved with widely varying sequence identity and thermodynamic stability. These data add new evidence that thermodynamic stability correlates primarily with unfolding rate rather than folding rate. The study also has implications for the question of whether opening of FNIII domains contributes to the stretching of fibronectin matrix fibrils. PMID:27909052

  18. Two amino acid residues confer different binding affinities of Abelson family kinase SRC homology 2 domains for phosphorylated cortactin.

    PubMed

    Gifford, Stacey M; Liu, Weizhi; Mader, Christopher C; Halo, Tiffany L; Machida, Kazuya; Boggon, Titus J; Koleske, Anthony J

    2014-07-11

    The closely related Abl family kinases, Arg and Abl, play important non-redundant roles in the regulation of cell morphogenesis and motility. Despite similar N-terminal sequences, Arg and Abl interact with different substrates and binding partners with varying affinities. This selectivity may be due to slight differences in amino acid sequence leading to differential interactions with target proteins. We report that the Arg Src homology (SH) 2 domain binds two specific phosphotyrosines on cortactin, a known Abl/Arg substrate, with over 10-fold higher affinity than the Abl SH2 domain. We show that this significant affinity difference is due to the substitution of arginine 161 and serine 187 in Abl to leucine 207 and threonine 233 in Arg, respectively. We constructed Abl SH2 domains with R161L and S187T mutations alone and in combination and find that these substitutions are sufficient to convert the low affinity Abl SH2 domain to a higher affinity "Arg-like" SH2 domain in binding to a phospho-cortactin peptide. We crystallized the Arg SH2 domain for structural comparison to existing crystal structures of the Abl SH2 domain. We show that these two residues are important determinants of Arg and Abl SH2 domain binding specificity. Finally, we expressed Arg containing an "Abl-like" low affinity mutant Arg SH2 domain (L207R/T233S) and find that this mutant, although properly localized to the cell periphery, does not support wild type levels of cell edge protrusion. Together, these observations indicate that these two amino acid positions confer different binding affinities and cellular functions on the distinct Abl family kinases. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zweifel,M.; Leahy, D.; Barrick, D.

    Deltex is a cytosolic effector of Notch signaling thought to bind through its N-terminal domain to the Notch receptor. Here we report the structure of the Drosophila Deltex N-terminal domain, which contains two tandem WWE sequence repeats. The WWE repeats, which adopt a novel fold, are related by an approximate two-fold axis of rotation. Although the WWE repeats are structurally distinct, they interact extensively and form a deep cleft at their junction that appears well suited for ligand binding. The two repeats are thermodynamically coupled; this coupling is mediated in part by a conserved segment that is immediately C-terminal tomore » the second WWE domain. We demonstrate that although the Deltex WWE tandem is monomeric in solution, it forms a heterodimer with the ankyrin domain of the Notch receptor. These results provide structural and functional insight into how Deltex modulates Notch signaling, and how WWE modules recognize targets for ubiquitination.« less

  20. Structure of Serum Amyloid A Suggests a Mechanism for Selective Lipoprotein Binding and Functions: SAA as a Hub in Macromolecular Interaction Networks

    PubMed Central

    Frame, Nicholas M.; Gursky, Olga

    2016-01-01

    Serum amyloid A is a major acute-phase plasma protein that modulates innate immunity and cholesterol homeostasis. We combine sequence analysis with x-ray crystal structures to postulate that SAA acts as an intrinsically disordered hub mediating interactions among proteins, lipids and proteoglycans. A structural model of lipoprotein-bound SAA monomer is proposed wherein two α-helices from the N-domain form a concave hydrophobic surface that binds lipoproteins. A C-domain, connected to the N-domain via a flexible linker, binds polar/charged ligands including cell receptors, bridging them with lipoproteins and re-routing cholesterol transport. Our model is supported by the SAA cleavage in the inter-domain linker to generate the 1–76 fragment deposited in reactive amyloidosis. This model sheds new light on functions of this enigmatic protein. PMID:26918388

  1. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    PubMed

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  2. In silico characterization and analysis of RTBP1 and NgTRF1 protein through MD simulation and molecular docking - A comparative study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-02-06

    Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  3. In Silico Characterization and Analysis of RTBP1 and NgTRF1 Protein Through MD Simulation and Molecular Docking: A Comparative Study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-09-01

    Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  4. Synthesis and Structural Characterization of Reflectin Proteins

    DTIC Science & Technology

    2012-02-29

    constructs of interest included a reflectin 1a domain 3 (D3) monomer, a domain 3 dimer, subdomain peptides, recombinant reflectin 1b, an elastin -reflectin...diblock copolymer, and an elastin -reflectin-GFP fusion protein. After construction of the sequences of interest at the DNA level, protein expression...characterization was performed. The unique spectral properties associated with recombinant reflectin protein materials make elastin -reflectin

  5. Evolutionary analysis of a novel zinc ribbon in the N-terminal region of threonine synthase.

    PubMed

    Kaur, Gurmeet; Subramanian, Srikrishna

    2017-10-18

    Threonine synthase (TS) catalyzes the terminal reaction in the biosynthetic pathway of threonine and requires pyridoxal phosphate as a cofactor. TSs share a common catalytic domain with other fold type II PALP dependent enzymes. TSs are broadly grouped into two classes based on their sequence, quaternary structure, and enzyme regulation. We report the presence of a novel zinc ribbon domain in the N-terminal region preceding the catalytic core in TS. The zinc ribbon domain is present in TSs belonging to both classes. Our sequence analysis reveals that archaeal TSs possess all zinc chelating residues to bind a metal ion that are lacking in the structurally characterized homologs. Phylogenetic analysis suggests that TSs with an N-terminal zinc ribbon likely represents the ancestral state of the enzyme while TSs without a zinc ribbon must have diverged later in specific lineages. The zinc ribbon and its N- and C-terminal extensions are important for enzyme stability, activity and regulation. It is likely that the zinc ribbon domain is involved in higher order oligomerization or mediating interactions with other biomolecules leading to formation of larger metabolic complexes.

  6. Translational control of ribosomal protein S15.

    PubMed

    Portier, C; Philippe, C; Dondon, L; Grunberg-Manago, M; Ebel, J P; Ehresmann, B; Ehresmann, C

    1990-08-27

    The expression of ribosomal protein S15 is shown to be translationally and negatively autocontrolled using a fusion within a reporter gene. Isolation and characterization of several deregulated mutants indicate that the regulatory site (the translational operator site) overlaps the ribosome loading site of the S15 messenger. In this region, three domains, each exhibiting a stem-loop structure, were determined using chemical and enzymatic probes. The most downstream hairpin carries the Shine-Dalgarno sequence and the initiation codon. Genetic and structural data derived from mutants constructed by site-directed mutagenesis show that the operator is a dynamic structure, two domains of which can form a pseudoknot. Binding of S15 to these two domains suggests that the pseudoknot could be stabilized by S15. A model is presented in which two alternative structures would explain the molecular basis of the S15 autocontrol.

  7. Coiled-coil intermediate filament stutter instability and molecular unfolding.

    PubMed

    Arslan, Melis; Qin, Zhao; Buehler, Markus J

    2011-05-01

    Intermediate filaments (IFs) are the key components of cytoskeleton in eukaryotic cells and are critical for cell mechanics. The building block of IFs is a coiled-coil alpha-helical dimer, consisting of several domains that include linkers and other structural discontinuities. One of the discontinuities in the dimer's coiled-coil region is the so-called 'stutter' region. The stutter is a region where a variation of the amino acid sequence pattern from other parts of the alpha-helical domains of the protein is found. It was suggested in earlier works that due to this sequence variation, the perfect coiled-coil arrangement ceases to exist. Here, we show using explicit water molecular dynamics and well-tempered metadynamics that for the coil2 domain of vimentin IFs the stutter is more stable in a non-alpha-helical, unfolded state. This causes a local structural disturbance in the alpha helix, which has a global effect on the nanomechanics of the structure. Our analysis suggests that the stutter features an enhanced tendency to unfolding even under the absence of external forces, implying a much greater structural instability than previously assumed. As a result it features a smaller local bending stiffness than other segments and presents a seed for the initiation of molecular bending and unfolding at large deformation.

  8. System Specificity of the TpsB Transporters of Coexpressed Two-Partner Secretion Systems of Neisseria meningitidis

    PubMed Central

    ur Rahman, Sadeeq

    2013-01-01

    The two-partner secretion (TPS) systems of Gram-negative bacteria consist of a large secreted exoprotein (TpsA) and a transporter protein (TpsB) located in the outer membrane. TpsA targets TpsB for transport across the membrane via its ∼30-kDa TPS domain located at its N terminus, and this domain is also the minimal secretory unit. Neisseria meningitidis genomes encode up to five TpsAs and two TpsBs. Sequence alignments of TPS domains suggested that these are organized into three systems, while there are two TpsBs, which raised questions on their system specificity. We show here that the TpsB2 transporter of Neisseria meningitidis is able to secrete all types of TPS domains encoded in N. meningitidis and the related species Neisseria lactamica but not domains of Haemophilus influenzae and Pseudomonas aeruginosa. In contrast, the TpsB1 transporter seemed to be specific for its cognate N. meningitidis system and did not secrete the TPS domains of other meningococcal systems. However, TpsB1 did secrete the TPS2b domain of N. lactamica, which is related to the meningococcal TPS2 domains. Apparently, the secretion depends on specific sequences within the TPS domain rather than the overall TPS domain structure. PMID:23222722

  9. Unusual features of fibrillarin cDNA and gene structure in Euglena gracilis: evolutionary conservation of core proteins and structural predictions for methylation-guide box C/D snoRNPs throughout the domain Eucarya.

    PubMed

    Russell, Anthony G; Watanabe, Yoh-ichi; Charette, J Michael; Gray, Michael W

    2005-01-01

    Box C/D ribonucleoprotein (RNP) particles mediate O2'-methylation of rRNA and other cellular RNA species. In higher eukaryotic taxa, these RNPs are more complex than their archaeal counterparts, containing four core protein components (Snu13p, Nop56p, Nop58p and fibrillarin) compared with three in Archaea. This increase in complexity raises questions about the evolutionary emergence of the eukaryote-specific proteins and structural conservation in these RNPs throughout the eukaryotic domain. In protists, the primarily unicellular organisms comprising the bulk of eukaryotic diversity, the protein composition of box C/D RNPs has not yet been extensively explored. This study describes the complete gene, cDNA and protein sequences of the fibrillarin homolog from the protozoon Euglena gracilis, the first such information to be obtained for a nucleolus-localized protein in this organism. The E.gracilis fibrillarin gene contains a mixture of intron types exhibiting markedly different sizes. In contrast to most other E.gracilis mRNAs characterized to date, the fibrillarin mRNA lacks a spliced leader (SL) sequence. The predicted fibrillarin protein sequence itself is unusual in that it contains a glycine-lysine (GK)-rich domain at its N-terminus rather than the glycine-arginine-rich (GAR) domain found in most other eukaryotic fibrillarins. In an evolutionarily diverse collection of protists that includes E.gracilis, we have also identified putative homologs of the other core protein components of box C/D RNPs, thereby providing evidence that the protein composition seen in the higher eukaryotic complexes was established very early in eukaryotic cell evolution.

  10. Crystal structure of the second fibronectin type III (FN3) domain from human collagen α1 type XX.

    PubMed

    Zhao, Jingfeng; Ren, Jixia; Wang, Nan; Cheng, Zhong; Yang, Runmei; Lin, Gen; Guo, Yi; Cai, Dayong; Xie, Yong; Zhao, Xiaohong

    2017-12-01

    Collagen α1 type XX, which contains fibronectin type III (FN3) repeats involving six FN3 domains (referred to as the FN#1-FN#6 domains), is an unusual member of the fibril-associated collagens with interrupted triple helices (FACIT) subfamily of collagens. The results of standard protein BLAST suggest that the FN3 repeats might contribute to collagen α1 type XX acting as a cytokine receptor. To date, solution NMR structures of the FN#3, FN#4 and FN#6 domains have been determined. To obtain further structural evidence to understand the relationship between the structure and function of the FN3 repeats from collagen α1 type XX, the crystal structure of the FN#2 domain from human collagen α1 type XX (residues Pro386-Pro466; referred to as FN2-HCXX) was solved at 2.5 Å resolution. The crystal structure of FN2-HCXX shows an immunoglobulin-like fold containing a β-sandwich structure, which is formed by a three-stranded β-sheet (β1, β2 and β5) packed onto a four-stranded β-sheet (β3, β4, β6 and β7). Two consensus domains, tencon and fibcon, are structural analogues of FN2-HCXX. Fn8, an FN3 domain from human oncofoetal fibronectin, is the closest structural analogue of FN2-HCXX derived from a naturally occurring sequence. Based solely on the structural similarity of FN2-HCXX to other FN3 domains, the detailed functions of FN2-HCXX and the FN3 repeats in collagen α1 type XX cannot be identified.

  11. Structural and evolutionary relationships of "AT-less" type I polyketide synthase ketosynthases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lohman, Jeremy; Ma, Ming; Osipiuk, Jerzy

    2015-10-13

    Acyltransferase (AT)-less type I polyketide synthases (PKSs) break the type I PKS paradigm. They lack the integrated AT domains within their modules and instead use a discrete AT that acts in trans, whereas a type I PKS module minimally contains AT, acyl carrier protein (ACP), and ketosynthase (KS) domains. Structures of canonical type I PKS KS-AT didomains reveal structured linkers that connect the two domains. AT-less type I PKS KSs have remnants of these linkers, which have been hypothesized to be AT docking domains. Natural products produced by AT-less type I PKSs are very complex because of an increased representationmore » of unique modifying domains. AT-less type I PKS KSs possess substrate specificity and fall into phylogenetic clades that correlate with their substrates, whereas canonical type I PKS KSs are monophyletic. We have solved crystal structures of seven AT-less type I PKS KS domains that represent various sequence clusters, revealing insight into the large structural and subtle amino acid residue differences that lead to unique active site topologies and substrate specificities. One set of structures represents a larger group of KS domains from both canonical and AT-less type I PKSs that accept amino acid-containing substrates. One structure has a partial AT-domain, revealing the structural consequences of a type I PKS KS evolving into an AT-less type I PKS KS. These structures highlight the structural diversity within the AT-less type I PKS KS family, and most important, provide a unique opportunity to study the molecular evolution of substrate specificity within the type I PKSs.« less

  12. Insights into Hsp70 Chaperone Activity from a Crystal Structure of the Yeast Hsp110 Sse1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu,Q.; Hendrickson, W.

    2007-01-01

    Classic Hsp70 chaperones assist in diverse processes of protein folding and translocation, and Hsp110s had seemed by sequence to be distant relatives within an Hsp70 superfamily. The 2.4 Angstroms resolution structure of Sse1 with ATP shows that Hsp110s are indeed Hsp70 relatives, and it provides insight into allosteric coupling between sites for ATP and polypeptide-substrate binding in Hsp70s. Subdomain structures are similar in intact Sse1(ATP) and in the separate Hsp70 domains, but conformational dispositions are radically different. Interfaces between Sse1 domains are extensive, intimate, and conservative in sequence with Hsp70s. We propose that Sse1(ATP) may be an evolutionary vestige ofmore » the Hsp70(ATP) state, and an analysis of 64 mutant variants in Sse1 and three Hsp70 homologs supports this hypothesis. An atomic-level understanding of Hsp70 communication between ATP and substrate-binding domains follows. Requirements on Sse1 for yeast viability are in keeping with the distinct function of Hsp110s as nucleotide exchange factors.« less

  13. U14 small nucleolar RNA makes multiple contacts with the pre-ribosomal RNA.

    PubMed

    Morrissey, J P; Tollervey, D

    1997-06-01

    The small nucleolar RNA (snoRNA) U14 has two regions of extended primary sequence complementarity to the 18S rRNA. The 3' region (domain B) shows the consensus structure for the methylation guide class of snoRNAs, whereas base-pairing between the 5' region (domain A) and the 18S rRNA sequence is required for the formation of functional ribosomes. Between domains A and B lies another essential region (domain Y). Here we report that yeast U14 can be cross-linked in vivo to the pre-rRNA; cross-linking is detected exclusively with the 35S primary transcript. Many nucleotides in U14 that lie outside of domains A and B are cross-linked to the pre-rRNA; in particular the essential domain Y region is cross-linked at several sites. U14 is, therefore, in far more extensive contact with the pre-rRNA than predicted from simple base-pairing models. Moreover, U14 can be cross-linked to other small RNA species. The functional interactions made by U14 during ribosome synthesis are likely to be very complex.

  14. Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure.

    PubMed

    Kono, H; Saven, J G

    2001-02-23

    Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.

  15. The structure of transcription termination factor Nrd1 reveals an original mode for GUAA recognition

    PubMed Central

    Franco-Echevarría, Elsa; González-Polo, Noelia; Zorrilla, Silvia; Martínez-Lumbreras, Santiago; Santiveri, Clara M.; Campos-Olivas, Ramón; Sánchez, Mar; Calvo, Olga

    2017-01-01

    Abstract Transcription termination of non-coding RNAs is regulated in yeast by a complex of three RNA binding proteins: Nrd1, Nab3 and Sen1. Nrd1 is central in this process by interacting with Rbp1 of RNA polymerase II, Trf4 of TRAMP and GUAA/G terminator sequences. We lack structural data for the last of these binding events. We determined the structures of Nrd1 RNA binding domain and its complexes with three GUAA-containing RNAs, characterized RNA binding energetics and tested rationally designed mutants in vivo. The Nrd1 structure shows an RRM domain fused with a second α/β domain that we name split domain (SD), because it is formed by two non-consecutive segments at each side of the RRM. The GUAA interacts with both domains and with a pocket of water molecules, trapped between the two stacking adenines and the SD. Comprehensive binding studies demonstrate for the first time that Nrd1 has a slight preference for GUAA over GUAG and genetic and functional studies suggest that Nrd1 RNA binding domain might play further roles in non-coding RNAs transcription termination. PMID:28973465

  16. RUDI, a short interspersed element of the V-SINE superfamily widespread in molluscan genomes.

    PubMed

    Luchetti, Andrea; Šatović, Eva; Mantovani, Barbara; Plohl, Miroslav

    2016-06-01

    Short interspersed elements (SINEs) are non-autonomous retrotransposons that are widespread in eukaryotic genomes. They exhibit a chimeric sequence structure consisting of a small RNA-related head, an anonymous body and an AT-rich tail. Although their turnover and de novo emergence is rapid, some SINE elements found in distantly related species retain similarity in certain core segments (or highly conserved domains, HCD). We have characterized a new SINE element named RUDI in the bivalve molluscs Ruditapes decussatus and R. philippinarum and found this element to be widely distributed in the genomes of a number of mollusc species. An unexpected structural feature of RUDI is the HCD domain type V, which was first found in non-amniote vertebrate SINEs and in the SINE from one cnidarian species. In addition to the V domain, the overall sequence conservation pattern of RUDI elements resembles that found in ancient AmnSINE (~310 Myr old) and Au SINE (~320 Myr old) families, suggesting that RUDI might be among the most ancient SINE families. Sequence conservation suggests a monophyletic origin of RUDI. Nucleotide variability and phylogenetic analyses suggest long-term vertical inheritance combined with at least one horizontal transfer event as the most parsimonious explanation for the observed taxonomic distribution.

  17. Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution.

    PubMed

    Ponting, C P; Mott, R; Bork, P; Copley, R R

    2001-12-01

    Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.

  18. Functional display of platelet-binding VWF fragments on filamentous bacteriophage.

    PubMed

    Yee, Andrew; Tan, Fen-Lai; Ginsburg, David

    2013-01-01

    von Willebrand factor (VWF) tethers platelets to sites of vascular injury via interaction with the platelet surface receptor, GPIb. To further define the VWF sequences required for VWF-platelet interaction, a phage library displaying random VWF protein fragments was screened against formalin-fixed platelets. After 3 rounds of affinity selection, DNA sequencing of platelet-bound clones identified VWF peptides mapping exclusively to the A1 domain. Aligning these sequences defined a minimal, overlapping segment spanning P1254-A1461, which encompasses the C1272-C1458 cystine loop. Analysis of phage carrying a mutated A1 segment (C1272/1458A) confirmed the requirement of the cystine loop for optimal binding. Four rounds of affinity maturation of a randomly mutagenized A1 phage library identified 10 and 14 unique mutants associated with enhanced platelet binding in the presence and absence of botrocetin, respectively, with 2 mutants (S1370G and I1372V) common to both conditions. These results demonstrate the utility of filamentous phage for studying VWF protein structure-function and identify a minimal, contiguous peptide that bind to formalin-fixed platelets, confirming the importance of the VWF A1 domain with no evidence for another independently platelet-binding segment within VWF. These findings also point to key structural elements within the A1 domain that regulate VWF-platelet adhesion.

  19. Domain-General Mechanisms for Speech Segmentation: The Role of Duration Information in Language Learning

    PubMed Central

    2016-01-01

    Speech segmentation is supported by multiple sources of information that may either inform language processing specifically, or serve learning more broadly. The Iambic/Trochaic Law (ITL), where increased duration indicates the end of a group and increased emphasis indicates the beginning of a group, has been proposed as a domain-general mechanism that also applies to language. However, language background has been suggested to modulate use of the ITL, meaning that these perceptual grouping preferences may instead be a consequence of language exposure. To distinguish between these accounts, we exposed native-English and native-Japanese listeners to sequences of speech (Experiment 1) and nonspeech stimuli (Experiment 2), and examined segmentation using a 2AFC task. Duration was manipulated over 3 conditions: sequences contained either an initial-item duration increase, or a final-item duration increase, or items of uniform duration. In Experiment 1, language background did not affect the use of duration as a cue for segmenting speech in a structured artificial language. In Experiment 2, the same results were found for grouping structured sequences of visual shapes. The results are consistent with proposals that duration information draws upon a domain-general mechanism that can apply to the special case of language acquisition. PMID:27893268

  20. Co-Conserved MAPK Features Couple D-Domain Docking Groove to Distal Allosteric Sites via the C-Terminal Flanking Tail

    PubMed Central

    Nguyen, Tuan; Ruan, Zheng; Oruganty, Krishnadev; Kannan, Natarajan

    2015-01-01

    Mitogen activated protein kinases (MAPKs) form a closely related family of kinases that control critical pathways associated with cell growth and survival. Although MAPKs have been extensively characterized at the biochemical, cellular, and structural level, an integrated evolutionary understanding of how MAPKs differ from other closely related protein kinases is currently lacking. Here, we perform statistical sequence comparisons of MAPKs and related protein kinases to identify sequence and structural features associated with MAPK functional divergence. We show, for the first time, that virtually all MAPK-distinguishing sequence features, including an unappreciated short insert segment in the β4-β5 loop, physically couple distal functional sites in the kinase domain to the D-domain peptide docking groove via the C-terminal flanking tail (C-tail). The coupling mediated by MAPK-specific residues confers an allosteric regulatory mechanism unique to MAPKs. In particular, the regulatory αC-helix conformation is controlled by a MAPK-conserved salt bridge interaction between an arginine in the αC-helix and an acidic residue in the C-tail. The salt-bridge interaction is modulated in unique ways in individual sub-families to achieve regulatory specificity. Our study is consistent with a model in which the C-tail co-evolved with the D-domain docking site to allosterically control MAPK activity. Our study provides testable mechanistic hypotheses for biochemical characterization of MAPK-conserved residues and new avenues for the design of allosteric MAPK inhibitors. PMID:25799139

  1. Domain structure, GTP-hydrolyzing activity and 7S RNA binding of Acidianus ambivalens ffh-homologous protein suggest an SRP-like complex in archaea.

    PubMed

    Moll, R; Schmidtke, S; Schäfer, G

    1999-01-01

    In this study we provide, for the first time, experimental evidence that a protein homologous to bacterial Ffh is part of an SRP-like ribonucleoprotein complex in hyperthermophilic archaea. The gene encoding the Ffh homologue in the hyperthermophilic archaeote Acidianus ambivalens has been cloned and sequenced. Recombinant Ffh protein was expressed in E. coli and subjected to biochemical and functional studies. A. ambivalens Ffh encodes a 50.4-kDa protein that is structured by three distinct regions: the N-terminal hydrophilic N-region (N), the GTP/GDP-binding domain (G) and a C-terminal located C-domain (C). The A. ambivalens Ffh sequence shares 44-46% sequence similarity with Ffh of methanogenic archaea, 34-36% similarity with eukaryal SRP54 and 30-34% similarity with bacterial Ffh. A polyclonal antiserum raised against the first two domains of A. ambivalens Ffh reacts specifically with a single protein (apparent molecular mass: 46 kDa, termed p46) present in cytosolic and in plasmamembrane cell fractions of A. ambivalens. Recombinant Ffh has a melting point of tm = 89 degreesC. Its intrinsic GTPase activity obviously depends on neutral pH and low ionic strength with a preference for chloride and acetate salts. Highest rates of GTP hydrolysis have been achieved at 81 degreesC in presence of 0.1-1 mm Mg2+. GTP hydrolysis is significantly inhibited by high glycerol concentrations, and the GTP hydrolysis rate also markedly decreases by addition of detergents. The Km for GTP is 13.7 microm at 70 degreesC and GTP hydrolysis is strongly inhibited by GDP (Ki = 8 microm). A. ambivalens Ffh, which includes an RNA-binding motif in the C-terminal domain, is shown to bind specifically to 7S RNA of the related crenarchaeote Sulfolobus solfataricus. Comparative sequence analysis reveals the presence of typical signal sequences in plasma membrane as well as extracellular proteins of hyperthermophilic crenarchaea which strongly supposes recognition events by an Ffh containing SRP-like particle in these organisms.

  2. Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds

    PubMed Central

    Roessler, Christian G.; Hall, Branwen M.; Anderson, William J.; Ingram, Wendy M.; Roberts, Sue A.; Montfort, William R.; Cordes, Matthew H. J.

    2008-01-01

    Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a “stepping-stone” method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and λ. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and λ. The domains show 40% sequence identity but differ by switching of α-helix to β-sheet in a C-terminal region spanning ≈25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization. PMID:18227506

  3. Serine protease-related proteins in the malaria mosquito, Anopheles gambiae.

    PubMed

    Cao, Xiaolong; Gulati, Mansi; Jiang, Haobo

    2017-09-01

    Insect serine proteases (SPs) and serine protease homologs (SPHs) participate in digestion, defense, development, and other physiological processes. In mosquitoes, some clip-domain SPs and SPHs (i.e. CLIPs) have been investigated for possible roles in antiparasitic responses. In a recent test aimed at improving quality of gene models in the Anopheles gambiae genome using RNA-seq data, we observed various discrepancies between gene models in AgamP4.5 and corresponding sequences selected from those modeled by Cufflinks, Trinity and Bridger. Here we report a comparative analysis of the 337 SP-related proteins in A. gambiae by examining their domain structures, sequence diversity, chromosomal locations, and expression patterns. One hundred and ten CLIPs contain 1 to 5 clip domains in addition to their protease domains (PDs) or non-catalytic, protease-like domains (PLDs). They are divided into five subgroups: CLIPAs (22) are clip 1-5 -PLD; CLIPBs (29), CLIPCs (12) and CLIPDs (14) are mainly clip-PD; most CLIPEs (33) have a domain structure of PD/PLD-PLD-clip-PLD 0-1 . While expression of the CLIP genes in group-1 is generally low and detected in various tissue- and stage-specific RNA-seq libraries, some putative GPs/GPHs (i.e. single domain gut SPs/SPHs) in group-2 are highly expressed in midgut, whole larva or whole adult libraries. In comparison, 46 SPs, 26 SPHs, and 37 multi-domain SPs/SPHs (i.e. PD/PLD-PLD ≥1 ) in group-3 do not seem to be specifically expressed in digestive tract. There are 16 SPs and 2 SPH containing other types of putative regulatory domains (e.g. LDLa, CUB, Gd). Of the 337 SP and SPH genes, 159 were sorted into 46 groups (2-8 members/group) based on similar phylogenetic tree position, chromosomal location, and expression profile. This information and analysis, including improved gene models and protein sequences, constitute a solid foundation for functional analysis of the SP-related proteins in A. gambiae. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Structure of N-acetyl-[beta]-D-glucosaminidase (GcnA) from the Endocarditis Pathogen Streptococcus gordonii and its Complex with the Mechanism-based Inhibitor NAG-thiazoline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Langley, David B.; Harty, Derek W.S.; Jacques, Nicholas A.

    2008-09-17

    The crystal structure of GcnA, an N-acetyl-{beta}-D-glucosaminidase from Streptococcus gordonii, was solved by multiple wavelength anomalous dispersion phasing using crystals of selenomethionine-substituted protein. GcnA is a homodimer with subunits each comprised of three domains. The structure of the C-terminal {alpha}-helical domain has not been observed previously and forms a large dimerization interface. The fold of the N-terminal domain is observed in all structurally related glycosidases although its function is unknown. The central domain has a canonical ({beta}/{alpha}){sub 8} TIM-barrel fold which harbours the active site. The primary sequence and structure of this central domain identifies the enzyme as a familymore » 20 glycosidase. Key residues implicated in catalysis have different conformations in two different crystal forms, which probably represent active and inactive conformations of the enzyme. The catalytic mechanism for this class of glycoside hydrolase, where the substrate rather than the enzyme provides the cleavage-inducing nucleophile, has been confirmed by the structure of GcnA complexed with a putative reaction intermediate analogue, N-acetyl-{beta}-D-glucosamine-thiazoline. The catalytic mechanism is discussed in light of these and other family 20 structures.« less

  5. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

    NASA Astrophysics Data System (ADS)

    Richa, Tambi; Ide, Soichiro; Suzuki, Ryosuke; Ebina, Teppei; Kuroda, Yutaka

    2017-02-01

    Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely 2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.

  6. Structure, organization and expression of common carp (Cyprinus carpio L.) SLP-76 gene.

    PubMed

    Huang, Rong; Sun, Xiao-Feng; Hu, Wei; Wang, Ya-Ping; Guo, Qiong-Lin

    2008-05-01

    SLP-76 is an important member of the SLP-76 family of adapters, and it plays a key role in TCR signaling and T cell function. Partial cDNA sequence of SLP-76 of common carp (Cyprinus carpio L.) was isolated from thymus cDNA library by the method of suppression subtractive hybridization (SSH). Subsequently, the full length cDNA of carp SLP-76 was obtained by means of 3' RACE and 5' RACE, respectively. The full length cDNA of carp SLP-76 was 2007 bp, consisting of a 5'-terminal untranslated region (UTR) of 285 bp, a 3'-terminal UTR of 240 bp, and an open reading frame of 1482 bp. Sequence comparison showed that the deduced amino acid sequence of carp SLP-76 had an overall similarity of 34-73% to that of other species homologues, and it was composed of an NH2-terminal domain, a central proline-rich domain, and a C-terminal SH2 domain. Amino acid sequence analysis indicated the existence of a Gads binding site R-X-X-K, a 10-aa-long sequence which binds to the SH3 domain of LCK in vitro, and three conserved tyrosine-containing sequence in the NH2-terminal domain. Then we used PCR to obtain a genomic DNA which covers the entire coding region of carp SLP-76. In the 9.2k-long genomic sequence, twenty one exons and twenty introns were identified. RT-PCR results showed that carp SLP-76 was expressed predominantly in hematopoietic tissues, and was upregulated in thymus tissue of four-month carp compared to one-year old carp. RT-PCR and virtual northern hybridization results showed that carp SLP-76 was also upregulated in thymus tissue of GH transgenic carp at the age of four-months. These results suggest that the expression level of SLP-76 gene may be related to thymocyte development in teleosts.

  7. Putative Monofunctional Type I Polyketide Synthase Units: A Dinoflagellate-Specific Feature?

    PubMed Central

    Eichholz, Karsten; Beszteri, Bánk; John, Uwe

    2012-01-01

    Marine dinoflagellates (alveolata) are microalgae of which some cause harmful algal blooms and produce a broad variety of most likely polyketide synthesis derived phycotoxins. Recently, novel polyketide synthesase (PKS) transcripts have been described from the Florida red tide dinoflagellate Karenia brevis (gymnodiniales) which are evolutionarily related to Type I PKS but were apparently expressed as monofunctional proteins, a feature typical of Type II PKS. Here, we investigated expression units of PKS I-like sequences in Alexandrium ostenfeldii (gonyaulacales) and Heterocapsa triquetra (peridiniales) at the transcript and protein level. The five full length transcripts we obtained were all characterized by polyadenylation, a 3′ UTR and the dinoflagellate specific spliced leader sequence at the 5′end. Each of the five transcripts encoded a single ketoacylsynthase (KS) domain showing high similarity to K. brevis KS sequences. The monofunctional structure was also confirmed using dinoflagellate specific KS antibodies in Western Blots. In a maximum likelihood phylogenetic analysis of KS domains from diverse PKSs, dinoflagellate KSs formed a clade placed well within the protist Type I PKS clade between apicomplexa, haptophytes and chlorophytes. These findings indicate that the atypical PKS I structure, i.e., expression as putative monofunctional units, might be a dinoflagellate specific feature. In addition, the sequenced transcripts harbored a previously unknown, apparently dinoflagellate specific conserved N-terminal domain. We discuss the implications of this novel region with regard to the putative monofunctional organization of Type I PKS in dinoflagellates. PMID:23139807

  8. Genetic diversity and natural selection of Plasmodium knowlesi merozoite surface protein 1 paralog gene in Malaysia.

    PubMed

    Ahmed, Md Atique; Fauzi, Muh; Han, Eun-Taek

    2018-03-14

    Human infections due to the monkey malaria parasite Plasmodium knowlesi is on the rise in most Southeast Asian countries specifically Malaysia. The C-terminal 19 kDa domain of PvMSP1P is a potential vaccine candidate, however, no study has been conducted in the orthologous gene of P. knowlesi. This study investigates level of polymorphisms, haplotypes and natural selection of full-length pkmsp1p in clinical samples from Malaysia. A total of 36 full-length pkmsp1p sequences along with the reference H-strain and 40 C-terminal pkmsp1p sequences from clinical isolates of Malaysia were downloaded from published genomes. Genetic diversity, polymorphism, haplotype and natural selection were determined using DnaSP 5.10 and MEGA 5.0 software. Genealogical relationships were determined using haplotype network tree in NETWORK software v5.0. Population genetic differentiation index (F ST ) and population structure of parasite was determined using Arlequin v3.5 and STRUCTURE v2.3.4 software. Comparison of 36 full-length pkmsp1p sequences along with the H-strain identified 339 SNPs (175 non-synonymous and 164 synonymous substitutions). The nucleotide diversity across the full-length gene was low compared to its ortholog pvmsp1p. The nucleotide diversity was higher toward the N-terminal domains (pkmsp1p-83 and 30) compared to the C-terminal domains (pkmsp1p-38, 33 and 19). Phylogenetic analysis of full-length genes identified 2 distinct clusters of P. knowlesi from Malaysian Borneo. The 40 pkmsp1p-19 sequences showed low polymorphisms with 16 polymorphisms leading to 18 haplotypes. In total there were 10 synonymous and 6 non-synonymous substitutions and 12 cysteine residues were intact within the two EGF domains. Evidence of strong purifying selection was observed within the full-length sequences as well in all the domains. Shared haplotypes of 40 pkmsp1p-19 were identified within Malaysian Borneo haplotypes. This study is the first to report on the genetic diversity and natural selection of pkmsp1p. A low level of genetic diversity and strong evidence of negative selection was detected and observed in all the domains of pkmsp1p of P. knowlesi indicating functional constrains. Shared haplotypes were identified within pkmsp1p-19 highlighting further evaluation using larger number of clinical samples from Malaysia.

  9. Crystal structure of the DNA-binding domain of the LysR-type transcriptional regulator CbnR in complex with a DNA fragment of the recognition-binding site in the promoter region.

    PubMed

    Koentjoro, Maharani Pertiwi; Adachi, Naruhiko; Senda, Miki; Ogawa, Naoto; Senda, Toshiya

    2018-03-01

    LysR-type transcriptional regulators (LTTRs) are among the most abundant transcriptional regulators in bacteria. CbnR is an LTTR derived from Cupriavidus necator (formerly Alcaligenes eutrophus or Ralstonia eutropha) NH9 and is involved in transcriptional activation of the cbnABCD genes encoding chlorocatechol degradative enzymes. CbnR interacts with a cbnA promoter region of approximately 60 bp in length that contains the recognition-binding site (RBS) and activation-binding site (ABS). Upon inducer binding, CbnR seems to undergo conformational changes, leading to the activation of the transcription. Since the interaction of an LTTR with RBS is considered to be the first step of the transcriptional activation, the CbnR-RBS interaction is responsible for the selectivity of the promoter to be activated. To understand the sequence selectivity of CbnR, we determined the crystal structure of the DNA-binding domain of CbnR in complex with RBS of the cbnA promoter at 2.55 Å resolution. The crystal structure revealed details of the interactions between the DNA-binding domain and the promoter DNA. A comparison with the previously reported crystal structure of the DNA-binding domain of BenM in complex with its cognate RBS showed several differences in the DNA interactions, despite the structural similarity between CbnR and BenM. These differences explain the observed promoter sequence selectivity between CbnR and BenM. Particularly, the difference between Thr33 in CbnR and Ser33 in BenM appears to affect the conformations of neighboring residues, leading to the selective interactions with DNA. Atomic coordinates and structure factors for the DNA-binding domain of Cupriavidus necatorNH9 CbnR in complex with RBS are available in the Protein Data Bank under the accession code 5XXP. © 2018 Federation of European Biochemical Societies.

  10. Genetic diversity of the captive Asian tapir population in Thailand, based on mitochondrial control region sequence data and the comparison of its nucleotide structure with Brazilian tapir.

    PubMed

    Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat

    2017-07-01

    The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.

  11. Spliced leader RNA of trypanosomes: in vivo mutational analysis reveals extensive and distinct requirements for trans splicing and cap4 formation.

    PubMed Central

    Lücke, S; Xu, G L; Palfi, Z; Cross, M; Bellofatto, V; Bindereif, A

    1996-01-01

    In trypanosomes mRNAs are generated through trans splicing. The spliced leader (SL) RNA, which donates the 5'-terminal mini-exon to each of the protein coding exons, plays a central role in the trans splicing process. We have established in vivo assays to study in detail trans splicing, cap4 modification, and RNP assembly of the SL RNA in the trypanosomatid species Leptomonas seymouri. First, we found that extensive sequences within the mini-exon are required for SL RNA function in vivo, although a conserved length of 39 nt is not essential. In contrast, the intron sequence appears to be surprisingly tolerant to mutation; only the stem-loop II structure is indispensable. The asymmetry of the sequence requirements in the stem I region suggests that this domain may exist in different functional conformations. Second, distinct mini-exon sequences outside the modification site are important for efficient cap4 formation. Third, all SL RNA mutations tested allowed core RNP assembly, suggesting flexible requirements for core protein binding. In sum, the results of our mutational analysis provide evidence for a discrete domain structure of the SL RNA and help to explain the strong phylogenetic conservation of the mini-exon sequence and of the overall SL RNA secondary structure; they also suggest that there may be certain differences between trans splicing in nematodes and trypanosomes. This approach provides a basis for studying RNA-RNA interactions in the trans spliceosome. Images PMID:8861965

  12. Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.

    PubMed

    Kippert, Fred; Gerloff, Dietlind L

    2009-09-24

    HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains.

  13. Highly Sensitive Detection of Individual HEAT and ARM Repeats with HHpred and COACH

    PubMed Central

    Kippert, Fred; Gerloff, Dietlind L.

    2009-01-01

    Background HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. Methodology and Principal Findings Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. Significance A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains. PMID:19777061

  14. Application of Wavelet Transform for PDZ Domain Classification

    PubMed Central

    Daqrouq, Khaled; Alhmouz, Rami; Balamesh, Ahmed; Memic, Adnan

    2015-01-01

    PDZ domains have been identified as part of an array of signaling proteins that are often unrelated, except for the well-conserved structural PDZ domain they contain. These domains have been linked to many disease processes including common Avian influenza, as well as very rare conditions such as Fraser and Usher syndromes. Historically, based on the interactions and the nature of bonds they form, PDZ domains have most often been classified into one of three classes (class I, class II and others - class III), that is directly dependent on their binding partner. In this study, we report on three unique feature extraction approaches based on the bigram and trigram occurrence and existence rearrangements within the domain's primary amino acid sequences in assisting PDZ domain classification. Wavelet packet transform (WPT) and Shannon entropy denoted by wavelet entropy (WE) feature extraction methods were proposed. Using 115 unique human and mouse PDZ domains, the existence rearrangement approach yielded a high recognition rate (78.34%), which outperformed our occurrence rearrangements based method. The recognition rate was (81.41%) with validation technique. The method reported for PDZ domain classification from primary sequences proved to be an encouraging approach for obtaining consistent classification results. We anticipate that by increasing the database size, we can further improve feature extraction and correct classification. PMID:25860375

  15. Recognition of chimeric small-subunit ribosomal DNAs composed of genes from uncultivated microorganisms

    NASA Technical Reports Server (NTRS)

    Kopczynski, E. D.; Bateson, M. M.; Ward, D. M.

    1994-01-01

    When PCR was used to recover small-subunit (SSU) rRNA genes from a hot spring cyanobacterial mat community, chimeric SSU rRNA sequences which exhibited little or no secondary structural abnormality were recovered. They were revealed as chimeras of SSU rRNA genes of uncultivated species through separate phylogenetic analysis of short sequence domains.

  16. The Glucoamylase Inhibitor Acarbose Is a Direct Activator of Phosphorylase Kinase

    PubMed Central

    Nadeau, Owen W.; Liu, Weiya; Boulatnikov, Igor G.; Sage, Jessica M.; Peters, Jennifer L.; Carlson, Gerald M.

    2011-01-01

    Phosphorylase kinase (PhK), an (αβγδ)4 complex, stimulates energy production from glycogen in the cascade activation of glycogenolysis. Its large homologous α and β subunits regulate the activity of the catalytic γ subunit and account for 81% of PhK’s mass. Both subunits are thought to be multi-domain structures, and recent predictions based on their sequences suggest the presence of potentially functional glucoamylase (GH15)-like domains near their amino-termini. We present the first experimental evidence for such a domain in PhK, by demonstrating that the glucoamylase inhibitor acarbose binds PhK, perturbs its structure, and stimulates its kinase activity. PMID:20604537

  17. Comparative analysis of the L, M, and S RNA segments of Crimean-Congo haemorrhagic fever virus isolates from southern Africa.

    PubMed

    Goedhals, Dominique; Bester, Phillip A; Paweska, Janusz T; Swanepoel, Robert; Burt, Felicity J

    2015-05-01

    Crimean-Congo haemorrhagic fever virus (CCHFV) is a member of the Bunyaviridae family with a tripartite, negative sense RNA genome. This study used predictive software to analyse the L (large), M (medium), and S (small) segments of 14 southern African CCHFV isolates. The OTU-like cysteine protease domain and the RdRp domain of the L segment are highly conserved among southern African CCHFV isolates. The M segment encodes the structural glycoproteins, GN and GC, and the non-structural glycoproteins which are post-translationally cleaved at highly conserved furin and subtilase SKI-1 cleavage sites. All of the sites previously identified were shown to be conserved among southern African CCHFV isolates. The heavily O-glycosylated N-terminal variable mucin-like domain of the M segment shows the highest sequence variability of the CCHFV proteins. Five transmembrane domains are predicted in the M segment polyprotein resulting in three regions internal to and three regions external to the membrane across the G(N), NS(M) and G(C) glycoproteins. The corroboration of conserved genome domains and sequence identity among geographically diverse isolates may assist in the identification of protein function and pathogenic mechanisms, as well as the identification of potential targets for antiviral therapy and vaccine design. As detailed functional studies are lacking for many of the CCHFV proteins, identification of functional domains by prediction of protein structure, and identification of amino acid level similarity to functionally characterised proteins of related viruses or viruses with similar pathogenic mechanisms are a necessary step for selection of areas for further study. © 2015 Wiley Periodicals, Inc.

  18. Structural model of FeoB, the iron transporter from Pseudomonas aeruginosa, predicts a cysteine lined, GTP-gated pore

    PubMed Central

    Seyedmohammad, Saeed; Fuentealba, Natalia Alveal; Marriott, Robert A.J.; Goetze, Tom A.; Edwardson, J. Michael; Barrera, Nelson P.; Venter, Henrietta

    2016-01-01

    Iron is essential for the survival and virulence of pathogenic bacteria. The FeoB transporter allows the bacterial cell to acquire ferrous iron from its environment, making it an excellent drug target in intractable pathogens. The protein consists of an N-terminal GTP-binding domain and a C-terminal membrane domain. Despite the availability of X-ray crystal structures of the N-terminal domain, many aspects of the structure and function of FeoB remain unclear, such as the structure of the membrane domain, the oligomeric state of the protein, the molecular mechanism of iron transport, and how this is coupled to GTP hydrolysis at the N-terminal domain. In the present study, we describe the first homology model of FeoB. Due to the lack of sequence homology between FeoB and other transporters, the structures of four different proteins were used as templates to generate the homology model of full-length FeoB, which predicts a trimeric structure. We confirmed this trimeric structure by both blue-native-PAGE (BN-PAGE) and AFM. According to our model, the membrane domain of the trimeric protein forms a central pore lined by highly conserved cysteine residues. This pore aligns with a central pore in the N-terminal GTPase domain (G-domain) lined by aspartate residues. Biochemical analysis of FeoB from Pseudomonas aeruginosa further reveals a putative iron sensor domain that could connect GTP binding/hydrolysis to the opening of the pore. These results indicate that FeoB might not act as a transporter, but rather as a GTP-gated channel. PMID:26934982

  19. Fas Apoptosis Inhibitory Molecule (FAIM) Contains a Novel Beta Sandwich in Contact with a Partially Ordered Domain

    PubMed Central

    Hemond, Michael; Rothstein, Thomas L.; Wagner, Gerhard

    2009-01-01

    Summary Fas apoptosis inhibitory molecule (FAIM) is a soluble cytosolic protein inhibitor of programmed cell death and is found in organisms throughout the animal kingdom. A short isoform (FAIM-S) is expressed in all tissue types, while an alternatively spliced long isoform (FAIM-L) is specifically expressed in the brain. Here FAIM-S is shown to consist of two independently folding domains in contact with one another. The NMR solution structure of the C-terminal domain of murine FAIM is solved in isolation and revealed to be a novel protein fold, a noninterleaved seven-stranded beta sandwich. The structure and sequence reveal several residues that are likely to be involved in functionally significant interactions with the N-terminal domain or other binding partners. Chemical shift perturbation is used to elucidate contacts made between the N- and C-terminal domains. PMID:19168072

  20. Crystal Structures of the Glutamate Receptor Ion Channel GluK3 and GluK5 Amino-Terminal Domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumar, Janesh; Mayer, Mark L.

    2010-11-30

    Ionotropic glutamate receptors (iGluRs) mediate the majority of fast excitatory synaptic neurotransmission in the central nervous system. The selective assembly of iGluRs into AMPA, kainate, and N-methyl-d-aspartic acid (NMDA) receptor subtypes is regulated by their extracellular amino-terminal domains (ATDs). Kainate receptors are further classified into low-affinity receptor families (GluK1-GluK3) and high-affinity receptor families (GluK4-GluK5) based on their affinity for the neurotoxin kainic acid. These two families share a 42% sequence identity for the intact receptor but only a 27% sequence identity at the level of ATD. We have determined for the first time the high-resolution crystal structures of GluK3 andmore » GluK5 ATDs, both of which crystallize as dimers but with a strikingly different dimer assembly at the R1 interface. By contrast, for both GluK3 and GluK5, the R2 domain dimer assembly is similar to those reported previously for other non-NMDA iGluRs. This observation is consistent with the reports that GluK4-GluK5 cannot form functional homomeric ion channels and require obligate coassembly with GluK1-GluK3. Our analysis also reveals that the relative orientation of domains R1 and R2 in individual non-NMDA receptor ATDs varies by up to 10{sup o}, in contrast to the 50{sup o} difference reported for the NMDA receptor GluN2B subunit. This restricted domain movement in non-NMDA receptor ATDs seems to result both from extensive intramolecular contacts between domain R1 and domain R2 and from their assembly as dimers, which interact at both R1 and R2 domains. Our results provide the first insights into the structure and function of GluK4-GluK5, the least understood family of iGluRs.« less

  1. Amyloid fibril formation from sequences of a natural beta-structured fibrous protein, the adenovirus fiber.

    PubMed

    Papanikolopoulou, Katerina; Schoehn, Guy; Forge, Vincent; Forsyth, V Trevor; Riekel, Christian; Hernandez, Jean-François; Ruigrok, Rob W H; Mitraki, Anna

    2005-01-28

    Amyloid fibrils are fibrous beta-structures that derive from abnormal folding and assembly of peptides and proteins. Despite a wealth of structural studies on amyloids, the nature of the amyloid structure remains elusive; possible connections to natural, beta-structured fibrous motifs have been suggested. In this work we focus on understanding amyloid structure and formation from sequences of a natural, beta-structured fibrous protein. We show that short peptides (25 to 6 amino acids) corresponding to repetitive sequences from the adenovirus fiber shaft have an intrinsic capacity to form amyloid fibrils as judged by electron microscopy, Congo Red binding, infrared spectroscopy, and x-ray fiber diffraction. In the presence of the globular C-terminal domain of the protein that acts as a trimerization motif, the shaft sequences adopt a triple-stranded, beta-fibrous motif. We discuss the possible structure and arrangement of these sequences within the amyloid fibril, as compared with the one adopted within the native structure. A 6-amino acid peptide, corresponding to the last beta-strand of the shaft, was found to be sufficient to form amyloid fibrils. Structural analysis of these amyloid fibrils suggests that perpendicular stacking of beta-strand repeat units is an underlying common feature of amyloid formation.

  2. β-Propeller Blades as Ancestral Peptides in Protein Evolution

    PubMed Central

    Kopec, Klaus O.; Lupas, Andrei N.

    2013-01-01

    Proteins of the β-propeller fold are ubiquitous in nature and widely used as structural scaffolds for ligand binding and enzymatic activity. This fold comprises between four and twelve four-stranded β-meanders, the so called blades that are arranged circularly around a central funnel-shaped pore. Despite the large size range of β-propellers, their blades frequently show sequence similarity indicative of a common ancestry and it has been proposed that the majority of β-propellers arose divergently by amplification and diversification of an ancestral blade. Given the structural versatility of β-propellers and the hypothesis that the first folded proteins evolved from a simpler set of peptides, we investigated whether this blade may have given rise to other folds as well. Using sequence comparisons, we identified proteins of four other folds as potential homologs of β-propellers: the luminal domain of inositol-requiring enzyme 1 (IRE1-LD), type II β-prisms, β-pinwheels, and WW domains. Because, with increasing evolutionary distance and decreasing sequence length, the statistical significance of sequence comparisons becomes progressively harder to distinguish from the background of convergent similarities, we complemented our analyses with a new method that evaluates possible homology based on the correlation between sequence and structure similarity. Our results indicate a homologous relationship of IRE1-LD and type II β-prisms with β-propellers, and an analogous one for β-pinwheels and WW domains. Whereas IRE1-LD most likely originated by fold-changing mutations from a fully formed PQQ motif β-propeller, type II β-prisms originated by amplification and differentiation of a single blade, possibly also of the PQQ type. We conclude that both β-propellers and type II β-prisms arose by independent amplification of a blade-sized fragment, which represents a remnant of an ancient peptide world. PMID:24143202

  3. Crystal Structure of a Josephin-Ubiquitin Complex: Evolutionary Restraints on Ataxin-3 Deubiquitinating Activity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    S Weeks; K Grasty; L Hernandez-Cuebas

    2011-12-31

    The Josephin domain is a conserved cysteine protease domain found in four human deubiquitinating enzymes: ataxin-3, the ataxin-3-like protein (ATXN3L), Josephin-1, and Josephin-2. Josephin domains from these four proteins were purified and assayed for their ability to cleave ubiquitin substrates. Reaction rates differed markedly both among the different proteins and for different substrates with a given protein. The ATXN3L Josephin domain is a significantly more efficient enzyme than the ataxin-3 domain despite their sharing 85% sequence identity. To understand the structural basis of this difference, the 2.6 {angstrom} x-ray crystal structure of the ATXN3L Josephin domain in complex with ubiquitinmore » was determined. Although ataxin-3 and ATXN3L adopt similar folds, they bind ubiquitin in different, overlapping sites. Mutations were made in ataxin-3 at selected positions, introducing the corresponding ATXN3L residue. Only three such mutations are sufficient to increase the catalytic activity of the ataxin-3 domain to levels comparable with that of ATXN3L, suggesting that ataxin-3 has been subject to evolutionary restraints that keep its deubiquitinating activity in check.« less

  4. The length but not the sequence of peptide linker modules exerts the primary influence on the conformations of protein domains in cellulosome multi-enzyme complexes.

    PubMed

    Różycki, Bartosz; Cazade, Pierre-André; O'Mahony, Shane; Thompson, Damien; Cieplak, Marek

    2017-08-16

    Cellulosomes are large multi-protein catalysts produced by various anaerobic microorganisms to efficiently degrade plant cell-wall polysaccharides down into simple sugars. X-ray and physicochemical structural characterisations show that cellulosomes are composed of numerous protein domains that are connected by unstructured polypeptide segments, yet the properties and possible roles of these 'linker' peptides are largely unknown. We have performed coarse-grained and all-atom molecular dynamics computer simulations of a number of cellulosomal linkers of different lengths and compositions. Our data demonstrates that the effective stiffness of the linker peptides, as quantified by the equilibrium fluctuations in the end-to-end distances, depends primarily on the length of the linker and less so on the specific amino acid sequence. The presence of excluded volume - provided by the domains that are connected - dampens the motion of the linker residues and reduces the effective stiffness of the linkers. Simultaneously, the presence of the linkers alters the conformations of the protein domains that are connected. We demonstrate that short, stiff linkers induce significant rearrangements in the folded domains of the mini-cellulosome composed of endoglucanase Cel8A in complex with scaffoldin ScafT (Cel8A-ScafT) of Clostridium thermocellum as well as in a two-cohesin system derived from the scaffoldin ScaB of Acetivibrio cellulolyticus. We give experimentally testable predictions on structural changes in protein domains that depend on the length of linkers.

  5. Structure of 5-hydroxymethylcytosine-specific restriction enzyme, AbaSI, in complex with DNA.

    PubMed

    Horton, John R; Borgaro, Janine G; Griggs, Rose M; Quimby, Aine; Guan, Shengxi; Zhang, Xing; Wilson, Geoffrey G; Zheng, Yu; Zhu, Zhenyu; Cheng, Xiaodong

    2014-07-01

    AbaSI, a member of the PvuRts1I-family of modification-dependent restriction endonucleases, cleaves deoxyribonucleic acid (DNA) containing 5-hydroxymethylctosine (5hmC) and glucosylated 5hmC (g5hmC), but not DNA containing unmodified cytosine. AbaSI has been used as a tool for mapping the genomic locations of 5hmC, an important epigenetic modification in the DNA of higher organisms. Here we report the crystal structures of AbaSI in the presence and absence of DNA. These structures provide considerable, although incomplete, insight into how this enzyme acts. AbaSI appears to be mainly a homodimer in solution, but interacts with DNA in our structures as a homotetramer. Each AbaSI subunit comprises an N-terminal, Vsr-like, cleavage domain containing a single catalytic site, and a C-terminal, SRA-like, 5hmC-binding domain. Two N-terminal helices mediate most of the homodimer interface. Dimerization brings together the two catalytic sites required for double-strand cleavage, and separates the 5hmC binding-domains by ∼70 Å, consistent with the known activity of AbaSI which cleaves DNA optimally between symmetrically modified cytosines ∼22 bp apart. The eukaryotic SET and RING-associated (SRA) domains bind to DNA containing 5-methylcytosine (5mC) in the hemi-methylated CpG sequence. They make contacts in both the major and minor DNA grooves, and flip the modified cytosine out of the helix into a conserved binding pocket. In contrast, the SRA-like domain of AbaSI, which has no sequence specificity, contacts only the minor DNA groove, and in our current structures the 5hmC remains intra-helical. A conserved, binding pocket is nevertheless present in this domain, suitable for accommodating 5hmC and g5hmC. We consider it likely, therefore, that base-flipping is part of the recognition and cleavage mechanism of AbaSI, but that our structures represent an earlier, pre-flipped stage, prior to actual recognition. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Structure of 5-hydroxymethylcytosine-specific restriction enzyme, AbaSI, in complex with DNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Horton, John R.; Borgaro, Janine G.; Griggs, Rose M.

    2014-07-03

    AbaSI, a member of the PvuRts1I-family of modification-dependent restriction endonucleases, cleaves DNA containing 5-hydroxymethylctosine (5hmC) and glucosylated 5hmC (g5hmC), but not DNA containing unmodified cytosine. AbaSI has been used as a tool for mapping the genomic locations of 5hmC, an important epigenetic modification in the DNA of higher organisms. Here we report the crystal structures of AbaSI in the presence and absence of DNA. These structures provide considerable, although incomplete, insight into how this enzyme acts. AbaSI appears to be mainly a homodimer in solution, but interacts with DNA in our structures as a homotetramer. Each AbaSI subunit comprises anmore » N-terminal, Vsr-like, cleavage domain containing a single catalytic site, and a C-terminal, SRA-like, 5hmC-binding domain. Two N-terminal helices mediate most of the homodimer interface. Dimerization brings together the two catalytic sites required for double-strand cleavage, and separates the 5hmC binding-domains by ~ 70 Å, consistent with the known activity of AbaSI which cleaves DNA optimally between symmetrically modified cytosines ~ 22 bp apart. The eukaryotic SET and RING-associated (SRA) domains bind to DNA containing 5-methylcytosine (5mC) in the hemi-methylated CpG sequence. They make contacts in both the major and minor DNA grooves, and flip the modified cytosine out of the helix into a conserved binding pocket. In contrast, the SRA-like domain of AbaSI, which has no sequence specificity, contacts only the minor DNA groove, and in our current structures the 5hmC remains intra-helical. A conserved, binding pocket is nevertheless present in this domain, suitable for accommodating 5hmC and g5hmC. We consider it likely, therefore, that base-flipping is part of the recognition and cleavage mechanism of AbaSI, but that our structures represent an earlier, pre-flipped stage, prior to actual recognition.« less

  7. Structure of the human protein kinase MPSK1 reveals an atypical activation loop architecture.

    PubMed

    Eswaran, Jeyanthy; Bernad, Antonio; Ligos, Jose M; Guinea, Barbara; Debreczeni, Judit E; Sobott, Frank; Parker, Sirlester A; Najmanovich, Rafael; Turk, Benjamin E; Knapp, Stefan

    2008-01-01

    The activation segment of protein kinases is structurally highly conserved and central to regulation of kinase activation. Here we report an atypical activation segment architecture in human MPSK1 comprising a beta sheet and a large alpha-helical insertion. Sequence comparisons suggested that similar activation segments exist in all members of the MPSK1 family and in MAST kinases. The consequence of this nonclassical activation segment on substrate recognition was studied using peptide library screens that revealed a preferred substrate sequence of X-X-P/V/I-phi-H/Y-T*-N/G-X-X-X (phi is an aliphatic residue). In addition, we identified the GTPase DRG1 as an MPSK1 interaction partner and specific substrate. The interaction domain in DRG1 was mapped to the N terminus, leading to recruitment and phosphorylation at Thr100 within the GTPase domain. The presented data reveal an atypical kinase structural motif and suggest a role of MPSK1 regulating DRG1, a GTPase involved in regulation of cellular growth.

  8. Can We Improve Structured Sequence Processing? Exploring the Direct and Indirect Effects of Computerized Training Using a Mediational Model

    PubMed Central

    Smith, Gretchen N. L.; Conway, Christopher M.; Bauernschmidt, Althea; Pisoni, David B.

    2015-01-01

    Recent research suggests that language acquisition may rely on domain-general learning abilities, such as structured sequence processing, which is the ability to extract, encode, and represent structured patterns in a temporal sequence. If structured sequence processing supports language, then it may be possible to improve language function by enhancing this foundational learning ability. The goal of the present study was to use a novel computerized training task as a means to better understand the relationship between structured sequence processing and language function. Participants first were assessed on pre-training tasks to provide baseline behavioral measures of structured sequence processing and language abilities. Participants were then quasi-randomly assigned to either a treatment group involving adaptive structured visuospatial sequence training, a treatment group involving adaptive non-structured visuospatial sequence training, or a control group. Following four days of sequence training, all participants were assessed with the same pre-training measures. Overall comparison of the post-training means revealed no group differences. However, in order to examine the potential relations between sequence training, structured sequence processing, and language ability, we used a mediation analysis that showed two competing effects. In the indirect effect, adaptive sequence training with structural regularities had a positive impact on structured sequence processing performance, which in turn had a positive impact on language processing. This finding not only identifies a potential novel intervention to treat language impairments but also may be the first demonstration that structured sequence processing can be improved and that this, in turn, has an impact on language processing. However, in the direct effect, adaptive sequence training with structural regularities had a direct negative impact on language processing. This unexpected finding suggests that adaptive training with structural regularities might potentially interfere with language processing. Taken together, these findings underscore the importance of pursuing designs that promote a better understanding of the mechanisms underlying training-related changes, so that regimens can be developed that help reduce these types of negative effects while simultaneously maximizing the benefits to outcome measures of interest. PMID:25946222

  9. Can we improve structured sequence processing? Exploring the direct and indirect effects of computerized training using a mediational model.

    PubMed

    Smith, Gretchen N L; Conway, Christopher M; Bauernschmidt, Althea; Pisoni, David B

    2015-01-01

    Recent research suggests that language acquisition may rely on domain-general learning abilities, such as structured sequence processing, which is the ability to extract, encode, and represent structured patterns in a temporal sequence. If structured sequence processing supports language, then it may be possible to improve language function by enhancing this foundational learning ability. The goal of the present study was to use a novel computerized training task as a means to better understand the relationship between structured sequence processing and language function. Participants first were assessed on pre-training tasks to provide baseline behavioral measures of structured sequence processing and language abilities. Participants were then quasi-randomly assigned to either a treatment group involving adaptive structured visuospatial sequence training, a treatment group involving adaptive non-structured visuospatial sequence training, or a control group. Following four days of sequence training, all participants were assessed with the same pre-training measures. Overall comparison of the post-training means revealed no group differences. However, in order to examine the potential relations between sequence training, structured sequence processing, and language ability, we used a mediation analysis that showed two competing effects. In the indirect effect, adaptive sequence training with structural regularities had a positive impact on structured sequence processing performance, which in turn had a positive impact on language processing. This finding not only identifies a potential novel intervention to treat language impairments but also may be the first demonstration that structured sequence processing can be improved and that this, in turn, has an impact on language processing. However, in the direct effect, adaptive sequence training with structural regularities had a direct negative impact on language processing. This unexpected finding suggests that adaptive training with structural regularities might potentially interfere with language processing. Taken together, these findings underscore the importance of pursuing designs that promote a better understanding of the mechanisms underlying training-related changes, so that regimens can be developed that help reduce these types of negative effects while simultaneously maximizing the benefits to outcome measures of interest.

  10. NMR binding and crystal structure reveal that intrinsically-unstructured regulatory domain auto-inhibits PAK4 by a mechanism different from that of PAK1.

    PubMed

    Wang, Wei; Lim, Liangzhong; Baskaran, Yohendran; Manser, Ed; Song, Jianxing

    2013-08-16

    Six human PAK members are classified into groups I (PAKs 1-3) and II (PAK4-6). Previously, only group I PAKs were thought to be auto-inhibited but very recently PAK4, the prototype of group II PAKs, has also been shown to be auto-inhibited by its N-terminal regulatory domain. However, the complete auto-inhibitory domain (AID) sequence remains undefined and the mechanism underlying its auto-inhibition is largely elusive. Here, the N-terminal regulatory domain of PAK4 sufficient for auto-inhibiting and binding Cdc42/Rac was characterized to be intrinsically unstructured, but nevertheless we identified the entire AID sequence by NMR. Strikingly, an AID peptide was derived by deleting the binding-unnecessary residues, which has a Kd of 320 nM to the PAK4 catalytic domain. Consequently, the PAK4 crystal structure complexed with the entire AID has been determined, which reveals that the complete kinase cleft is occupied by 20 AID residuescomposed of an N-terminal α-helix and a previously-identified pseudosubstrate motif, thus achieving auto-inhibition. Our study reveals that PAK4 is auto-inhibited by a novel mechanism which is completely different from that for PAK1, thus bearing critical implications for design of inhibitors specific for group II PAKs. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Global Organization of a Positive-strand RNA Virus Genome

    PubMed Central

    Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew

    2013-01-01

    The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202

  12. Novel Mechanism of Hemin Capture by Hbp2, the Hemoglobin-binding Hemophore from Listeria monocytogenes*

    PubMed Central

    Malmirchegini, G. Reza; Sjodt, Megan; Shnitkind, Sergey; Sawaya, Michael R.; Rosinski, Justin; Newton, Salete M.; Klebba, Phillip E.; Clubb, Robert T.

    2014-01-01

    Iron is an essential nutrient that is required for the growth of the bacterial pathogen Listeria monocytogenes. In cell cultures, this microbe secretes hemin/hemoglobin-binding protein 2 (Hbp2; Lmo2185) protein, which has been proposed to function as a hemophore that scavenges heme from the environment. Based on its primary sequence, Hbp2 contains three NEAr transporter (NEAT) domains of unknown function. Here we show that each of these domains mediates high affinity binding to ferric heme (hemin) and that its N- and C-terminal domains interact with hemoglobin (Hb). The results of hemin transfer experiments are consistent with Hbp2 functioning as an Hb-binding hemophore that delivers hemin to other Hbp2 proteins that are attached to the cell wall. Surprisingly, our work reveals that the central NEAT domain in Hbp2 binds hemin even though its primary sequence lacks a highly conserved YXXXY motif that is used by all other previously characterized NEAT domains to coordinate iron in the hemin molecule. To elucidate the mechanism of hemin binding by Hbp2, we determined crystal structures of its central NEAT domain (Hbp2N2; residues 183–303) in its free and hemin-bound states. The structures reveal an unprecedented mechanism of hemin binding in which Hbp2N2 undergoes a major conformational rearrangement that facilitates metal coordination by a non-canonical tyrosine residue. These studies highlight previously unrecognized plasticity in the hemin binding mechanism of NEAT domains and provide insight into how L. monocytogenes captures heme iron. PMID:25315777

  13. Phylogenetic and Structural Analysis of the Pluripotency Factor Sex-Determining Region Y box2 Gene of Camelus dromedarius (cSox2).

    PubMed

    Alawad, Abdullah; Alharbi, Sultan; Alhazzaa, Othman; Alagrafi, Faisal; Alkhrayef, Mohammed; Alhamdan, Ziyad; Alenazi, Abdullah; Al-Johi, Hasan; Alanazi, Ibrahim O; Hammad, Mohamed

    2016-01-01

    Although the sequencing information of Sox2 cDNA for many mammalian is available, the Sox2 cDNA of Camelus dromedaries has not yet been characterized. The objective of this study was to sequence and characterize Sox2 cDNA from the brain of C. dromedarius (also known as Arabian camel). A full coding sequence of the Sox2 gene from the brain of C. dromedarius was amplified by reverse transcription PCRjmc and then sequenced using the 3730XL series platform Sequencer (Applied Biosystem) for the first time. The cDNA sequence displayed an open reading frame of 822 nucleotides, encoding a protein of 273 amino acids. The molecular weight and the isoelectric point of the translated protein were calculated as 29.825 kDa and 10.11, respectively, using bioinformatics analysis. The predicted cSox2 protein sequence exhibited high identity: 99% for Homo sapiens, Mus musculus, Bos taurus, and Vicugna pacos; 98% for Sus scrofa and 93% for Camelus ferus. A 3D structure was built based on the available crystal structure of the HMG-box domain of human stem cell transcription factor Sox2 (PDB: 2 LE4) with 81 residues and predicting bioinformatics software for 273 amino acid residues. The comparison confirms the presence of the HMG-box domain in the cSox2 protein. The orthologous phylogenetic analysis showed that the Sox2 isoform from C. dromedarius was grouped with humans, alpacas, cattle, and pigs. We believe that this genetic and structural information will be a helpful source for the annotation. Furthermore, Sox2 is one of the transcription factors that contributes to the generation-induced pluripotent stem cells (iPSCs), which in turn will probably help generate camel induced pluripotent stem cells (CiPSCs).

  14. Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

    PubMed

    Hsing, Michael; Cherkasov, Artem

    2008-06-25

    Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

  15. The X-ray Crystallographic Structure and Activity Analysis of a Pseudomonas-Specific Subfamily of the HAD Enzyme Superfamily Evidences a Novel Biochemical Function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peisach,E.; Wang, L.; Burroughs, A.

    2008-01-01

    The haloacid dehalogenase (HAD) superfamily is a large family of proteins dominated by phosphotransferases. Thirty-three sequence families within the HAD superfamily (HADSF) have been identified to assist in function assignment. One such family includes the enzyme phosphoacetaldehyde hydrolase (phosphonatase). Phosphonatase possesses the conserved Rossmanniod core domain and a C1-type cap domain. Other members of this family do not possess a cap domain and because the cap domain of phosphonatase plays an important role in active site desolvation and catalysis, the function of the capless family members must be unique. A representative of the capless subfamily, PSPTO{_}2114, from the plant pathogenmore » Pseudomonas syringae, was targeted for catalytic activity and structure analyses. The X-ray structure of PSPTO{_}2114 reveals a capless homodimer that conserves some but not all of the intersubunit contacts contributed by the core domains of the phosphonatase homodimer. The region of the PSPTO{_}2114 that corresponds to the catalytic scaffold of phosphonatase (and other HAD phosphotransfereases) positions amino acid residues that are ill suited for Mg+2 cofactor binding and mediation of phosphoryl group transfer between donor and acceptor substrates. The absence of phosphotransferase activity in PSPTO{_}2114 was confirmed by kinetic assays. To explore PSPTO{_}2114 function, the conservation of sequence motifs extending outside of the HADSF catalytic scaffold was examined. The stringently conserved residues among PSPTO{_}2114 homologs were mapped onto the PSPTO{_}2114 three-dimensional structure to identify a surface region unique to the family members that do not possess a cap domain. The hypothesis that this region is used in protein-protein recognition is explored to define, for the first time, HADSF proteins which have acquired a function other than that of a catalyst. Proteins 2008.« less

  16. COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

    DOE PAGES

    Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.; ...

    2016-09-20

    There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less

  17. COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.

    There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less

  18. Role of Modular Polyketide Synthases in the Production of Polyether Ladder Compounds in Ciguatoxin-Producing Gambierdiscus polynesiensis and G. excentricus (Dinophyceae).

    PubMed

    Kohli, Gurjeet S; Campbell, Katrina; John, Uwe; Smith, Kirsty F; Fraga, Santiago; Rhodes, Lesley L; Murray, Shauna A

    2017-09-01

    Gambierdiscus, a benthic dinoflagellate, produces ciguatoxins that cause the human illness Ciguatera. Ciguatoxins are polyether ladder compounds that have a polyketide origin, indicating that polyketide synthases (PKS) are involved in their production. We sequenced transcriptomes of Gambierdiscus excentricus and Gambierdiscus polynesiensis and found 264 contigs encoding single domain ketoacyl synthases (KS; G. excentricus: 106, G. polynesiensis: 143) and ketoreductases (KR; G. excentricus: 7, G. polynesiensis: 8) with sequence similarity to type I PKSs, as reported in other dinoflagellates. In addition, 24 contigs (G. excentricus: 3, G. polynesiensis: 21) encoding multiple PKS domains (forming typical type I PKSs modules) were found. The proposed structure produced by one of these megasynthases resembles a partial carbon backbone of a polyether ladder compound. Seventeen contigs encoding single domain KS, KR, s-malonyltransacylase, dehydratase and enoyl reductase with sequence similarity to type II fatty acid synthases (FAS) in plants were found. Type I PKS and type II FAS genes were distinguished based on the arrangement of domains on the contigs and their sequence similarity and phylogenetic clustering with known PKS/FAS genes in other organisms. This differentiation of PKS and FAS pathways in Gambierdiscus is important, as it will facilitate approaches to investigating toxin biosynthesis pathways in dinoflagellates. © 2017 The Author(s) Journal of Eukaryotic Microbiology © 2017 International Society of Protistologists.

  19. Conservation of tubulin-binding sequences in TRPV1 throughout evolution.

    PubMed

    Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan

    2012-01-01

    Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Our analysis identifies the regions of TRPV1, which are important for structure-function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol®-induced neuropathy.

  20. The Identification and Structure of an N-Terminal PR Domain Show that FOG1 Is a Member of the PRDM Family of Proteins

    PubMed Central

    Clifton, Molly K.; Westman, Belinda J.; Thong, Sock Yue; O’Connell, Mitchell R.; Webster, Michael W.; Shepherd, Nicholas E.; Quinlan, Kate G.; Crossley, Merlin; Blobel, Gerd A.; Mackay, Joel P.

    2014-01-01

    FOG1 is a transcriptional regulator that acts in concert with the hematopoietic master regulator GATA1 to coordinate the differentiation of platelets and erythrocytes. Despite considerable effort, however, the mechanisms through which FOG1 regulates gene expression are only partially understood. Here we report the discovery of a previously unrecognized domain in FOG1: a PR (PRD-BF1 and RIZ) domain that is distantly related in sequence to the SET domains that are found in many histone methyltransferases. We have used NMR spectroscopy to determine the solution structure of this domain, revealing that the domain shares close structural similarity with SET domains. Titration with S-adenosyl-L-homocysteine, the cofactor product synonymous with SET domain methyltransferase activity, indicated that the FOG PR domain is not, however, likely to function as a methyltransferase in the same fashion. We also sought to define the function of this domain using both pulldown experiments and gel shift assays. However, neither pulldowns from mammalian nuclear extracts nor yeast two-hybrid assays reproducibly revealed binding partners, and we were unable to detect nucleic-acid-binding activity in this domain using our high-diversity Pentaprobe oligonucleotides. Overall, our data demonstrate that FOG1 is a member of the PRDM (PR domain containing proteins, with zinc fingers) family of transcriptional regulators. The function of many PR domains, however, remains somewhat enigmatic for the time being. PMID:25162672

  1. The crystal structure of the Hsp90 co-chaperone Cpr7 from Saccharomyces cerevisiae.

    PubMed

    Qiu, Yu; Ge, Qiangqiang; Wang, Mingxing; Lv, Hui; Ebrahimi, Mohammad; Niu, Liwen; Teng, Maikun; Li, Xu

    2017-03-01

    The versatility of Hsp90 can be attributed to the variety of co-chaperone proteins that modulate the role of Hsp90 in many cellular processes. As a co-chaperone of Hsp90, Cpr7 is essential for accelerating the cell growth in an Hsp90-containing trimeric complex. Here, we report the crystal structure of Cpr7 at a resolution of 1.8Å. It consists of an N-terminal PPI domain and a C-terminal TPR domain, and exhibits a U-shape conformation. Our studies revealed the aggregation state of Cpr7 in solution and the interaction properties between Cpr7 and the MEEVD sequence from the C-terminus of Hsp90. In addition, the structure and sequence analysis between Cpr7 and homologues revealed the structure basis both for the function differences between Cpr6 and Cpr7 and the functional complements between Cns1 and Cpr7. Our studies facilitate the understanding of Cpr7 and provide decent insights into the molecular mechanisms of the Hsp90 co-chaperone pathway. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Chromosome ends: different sequences may provide conserved functions.

    PubMed

    Louis, Edward J; Vershinin, Alexander V

    2005-07-01

    The structures of specific chromosome regions, centromeres and telomeres, present a number of puzzles. As functions performed by these regions are ubiquitous and essential, their DNA, proteins and chromatin structure are expected to be conserved. Recent studies of centromeric DNA from human, Drosophila and plant species have demonstrated that a hidden universal centromere-specific sequence is highly unlikely. The DNA of telomeres is more conserved consisting of a tandemly repeated 6-8 bp Arabidopsis-like sequence in a majority of organisms as diverse as protozoan, fungi, mammals and plants. However, there are alternatives to short DNA repeats at the ends of chromosomes and for telomere elongation by telomerase. Here we focus on the similarities and diversity that exist among the structural elements, DNA sequences and proteins, that make up terminal domains (telomeres and subtelomeres), and how organisms use these in different ways to fulfil the functions of end-replication and end-protection. Copyright (c) 2005 Wiley Periodicals, Inc.

  3. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    PubMed

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  4. Intramolecular control of transcriptional activity by the NK2-specific domain in NK-2 homeodomain proteins

    PubMed Central

    Watada, Hirotaka; Mirmira, Raghavendra G.; Kalamaras, Julie; German, Michael S.

    2000-01-01

    The developmentally important homeodomain transcription factors of the NK-2 class contain a highly conserved region, the NK2-specific domain (NK2-SD). The function of this domain, however, remains unknown. The primary structure of the NK2-SD suggests that it might function as an accessory DNA-binding domain or as a protein–protein interaction interface. To assess the possibility that the NK2-SD may contribute to DNA-binding specificity, we used a PCR-based approach to identify a consensus DNA-binding sequences for Nkx2.2, an NK-2 family member involved in pancreas and central nervous system development. The consensus sequence (TCTAAGTGAGCTT) is similar to the known binding sequences for other NK-2 homeodomain proteins, but we show that the NK2-SD does not contribute significantly to specific DNA binding to this sequence. To determine whether the NK2-SD contributes to transactivation, we used GAL4-Nkx2.2 fusion constructs to map a powerful transcriptional activation domain in the C-terminal region beyond the conserved NK2-SD. Interestingly, this C-terminal region functions as a transcriptional activator only in the absence of an intact NK2-SD. The NK2-SD also can mask transactivation from the paired homeodomain transcription factor Pax6, but it has no effect on transcription by itself. These results demonstrate that the NK2-SD functions as an intramolecular regulator of the C-terminal activation domain in Nkx2.2 and support a model in which interactions through the NK2-SD regulate the ability of NK-2-class proteins to activate specific genes during development. PMID:10944215

  5. Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India.

    PubMed

    Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

    2017-03-01

    Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability.

  6. Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India

    PubMed Central

    Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

    2017-01-01

    Aim: Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. Materials and Methods: The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. Results: The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Conclusion: Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability. PMID:28435199

  7. Structure and function of echinoderm telomerase RNA

    PubMed Central

    Podlevsky, Joshua D.; Li, Yang; Chen, Julian J.-L.

    2016-01-01

    Telomerase is a ribonucleoprotein (RNP) enzyme that requires an integral telomerase RNA (TR) subunit, in addition to the catalytic telomerase reverse transcriptase (TERT), for enzymatic function. The secondary structures of TRs from the three major groups of species, ciliates, fungi, and vertebrates, have been studied extensively and demonstrate dramatic diversity. Herein, we report the first comprehensive secondary structure of TR from echinoderms—marine invertebrates closely related to vertebrates—determined by phylogenetic comparative analysis of 16 TR sequences from three separate echinoderm classes. Similar to vertebrate TR, echinoderm TR contains the highly conserved template/pseudoknot and H/ACA domains. However, echinoderm TR lacks the ancestral CR4/5 structural domain found throughout vertebrate and fungal TRs. Instead, echinoderm TR contains a distinct simple helical region, termed eCR4/5, that is functionally equivalent to the CR4/5 domain. The urchin and brittle star eCR4/5 domains bind specifically to their respective TERT proteins and stimulate telomerase activity. Distinct from vertebrate telomerase, the echinoderm TR template/pseudoknot domain with the TERT protein is sufficient to reconstitute significant telomerase activity. This gain-of-function of the echinoderm template/pseudoknot domain for conferring telomerase activity presumably facilitated the rapid structural evolution of the eCR4/5 domain throughout the echinoderm lineage. Additionally, echinoderm TR utilizes the template-adjacent P1.1 helix as a physical template boundary element to prevent nontelomeric DNA synthesis, a mechanism used by ciliate and fungal TRs. Thus, the chimeric and eccentric structural features of echinoderm TR provide unparalleled insights into the rapid evolution of telomerase RNP structure and function. PMID:26598712

  8. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    PubMed

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Pre-lithification tectonic foliation development in a clastic sedimentary sequence

    NASA Astrophysics Data System (ADS)

    Meere, Patrick; Mulchrone, Kieran; McCarthy, David; Timmermann, Martin; Dewey, John

    2016-04-01

    The current view regarding the timing of regionally developed penetrative tectonic fabrics in sedimentary rocks is that their development postdates lithification of those rocks. In this case fabric development is achieved by a number of deformation mechanisms including grain rigid body rotation, crystal-plastic deformation and pressure solution (wet diffusion). The latter is believed to be the primary mechanism responsible for shortening and the domainal structure of cleavage development commonly observed in low grade metamorphic rocks. In this study we combine field observations with strain analysis and modelling to fully characterise considerable (>50%) mid-Devonian Acadian crustal shortening in a Devonian clastic sedimentary sequence from south west Ireland. Despite these high levels of shortening and associated penetrative tectonic fabric there is a marked absence of the expected domainal cleavage structure and intra-clast deformation, which are expected with this level of deformation. In contrast to the expected deformation processes associated with conventional cleavage development, fabrics in these rocks are a product of translation, rigid body rotation and repacking of extra-formational clasts during deformation of an un-lithified clastic sedimentary sequence.

  10. The rearrangement of motif F in the flavivirus RNA-directed RNA polymerase.

    PubMed

    Potapova, Ulyana; Feranchuk, Sergey; Leonova, Galina; Belikov, Sergei

    2018-03-01

    In the flavivirus genus, the non-structural protein NS5 plays a central role in RNA viral replication and constitutes a major target for drug discovery. One of the prime challenges in the study of NS5 protein is to investigate the interplay between the two protein domains, namely, the RNA-dependent RNA polymerase (RdRp) domain and the methyltransferase (MTase) domain. These investigations could clarify the multiple roles of NS5 protein in the virus life cycle. Here we present the results of sequence analyses and structural bioinformatics studies of NS5 protein, which suggest that the conserved motif F in the NS5 protein could act as a lock which controls the rearrangement of the domains and as a switch in the protein enzymatic activity. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.

    PubMed

    Schmidt Am Busch, Marcel; Sedano, Audrey; Simonson, Thomas

    2010-05-05

    Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.

  12. Gene and domain duplication in the chordate Otx gene family: insights from amphioxus Otx.

    PubMed

    Williams, N A; Holland, P W

    1998-05-01

    We report the genomic organization and deduced protein sequence of a cephalochordate member of the Otx homeobox gene family (AmphiOtx) and show its probable single-copy state in the genome. We also present molecular phylogenetic analysis indicating that there was single ancestral Otx gene in the first chordates which was duplicated in the vertebrate lineage after it had split from the lineage leading to the cephalochordates. Duplication of a C-terminal protein domain has occurred specifically in the vertebrate lineage, strengthening the case for a single Otx gene in an ancestral chordate whose gene structure has been retained in an extant cephalochordate. Comparative analysis of protein sequences and published gene expression patterns suggest that the ancestral chordate Otx gene had roles in patterning the anterior mesendoderm and central nervous system. These roles were elaborated following Otx gene duplication in vertebrates, accompanied by regulatory and structural divergence, particularly of Otx1 descendant genes.

  13. Structures of Staphylococcus aureus D-tagatose-6-phosphate kinase implicate domain motions in specificity and mechanism.

    PubMed

    Miallau, Linda; Hunter, William N; McSweeney, Sean M; Leonard, Gordon A

    2007-07-06

    High resolution structures of Staphylococcus aureus d-tagatose-6-phosphate kinase (LacC) in two crystal forms are herein reported. The structures define LacC in apoform, in binary complexes with ADP or the co-factor analogue AMP-PNP, and in a ternary complex with AMP-PNP and D-tagatose-6-phosphate. The tertiary structure of the LacC monomer, which is closely related to other members of the pfkB subfamily of carbohydrate kinases, is composed of a large alpha/beta core domain and a smaller, largely beta "lid." Four extended polypeptide segments connect these two domains. Dimerization of LacC occurs via interactions between lid domains, which come together to form a beta-clasp structure. Residues from both subunits contribute to substrate binding. LacC adopts a closed structure required for phosphoryl transfer only when both substrate and co-factor are bound. A reaction mechanism similar to that used by other phosphoryl transferases is proposed, although unusually, when both substrate and co-factor are bound to the enzyme two Mg(2+) ions are observed in the active site. A new motif of amino acid sequence conservation common to the pfkB subfamily of carbohydrate kinases is identified.

  14. Structural Studies of the Nedd4 WW Domains and Their Selectivity for the Connexin43 (Cx43) Carboxyl Terminus*

    PubMed Central

    Spagnol, Gaelle; Kieken, Fabien; Kopanic, Jennifer L.; Li, Hanjun; Zach, Sydney; Stauch, Kelly L.; Grosely, Rosslyn; Sorgen, Paul L.

    2016-01-01

    Neuronal precursor cell-expressed developmentally down-regulated 4 (Nedd4) was the first ubiquitin protein ligase identified to interact with connexin43 (Cx43), and its suppressed expression results in accumulation of gap junction plaques at the plasma membrane. Nedd4-mediated ubiquitination of Cx43 is required to recruit Eps15 and target Cx43 to the endocytic pathway. Although the Cx43 residues that undergo ubiquitination are still unknown, in this study we address other unresolved questions pertaining to the molecular mechanisms mediating the direct interaction between Nedd4 (WW1–3 domains) and Cx43 (carboxyl terminus (CT)). All three WW domains display a similar three antiparallel β-strand structure and interact with the same Cx43CT 283PPXY286 sequence. Although Tyr286 is essential for the interaction, MAPK phosphorylation of the preceding serine residues (Ser(P)279 and Ser(P)282) increases the binding affinity by 2-fold for the WW domains (WW2 > WW3 ≫ WW1). The structure of the WW2·Cx43CT276–289(Ser(P)279, Ser(P)282) complex reveals that coordination of Ser(P)282 with the end of β-strand 3 enables Ser(P)279 to interact with the back face of β-strand 3 (Tyr286 is on the front face) and loop 2, forming a horseshoe-shaped arrangement. The close sequence identity of WW2 with WW1 and WW3 residues that interact with the Cx43CT PPXY motif and Ser(P)279/Ser(P)282 strongly suggests that the significantly lower binding affinity of WW1 is the result of a more rigid structure. This study presents the first structure illustrating how phosphorylation of the Cx43CT domain helps mediate the interaction with a molecular partner involved in gap junction regulation. PMID:26841867

  15. Structural Studies of the Nedd4 WW Domains and Their Selectivity for the Connexin43 (Cx43) Carboxyl Terminus.

    PubMed

    Spagnol, Gaelle; Kieken, Fabien; Kopanic, Jennifer L; Li, Hanjun; Zach, Sydney; Stauch, Kelly L; Grosely, Rosslyn; Sorgen, Paul L

    2016-04-01

    Neuronal precursor cell-expressed developmentally down-regulated 4 (Nedd4) was the first ubiquitin protein ligase identified to interact with connexin43 (Cx43), and its suppressed expression results in accumulation of gap junction plaques at the plasma membrane. Nedd4-mediated ubiquitination of Cx43 is required to recruit Eps15 and target Cx43 to the endocytic pathway. Although the Cx43 residues that undergo ubiquitination are still unknown, in this study we address other unresolved questions pertaining to the molecular mechanisms mediating the direct interaction between Nedd4 (WW1-3 domains) and Cx43 (carboxyl terminus (CT)). All three WW domains display a similar three antiparallel β-strand structure and interact with the same Cx43CT(283)PPXY(286)sequence. Although Tyr(286)is essential for the interaction, MAPK phosphorylation of the preceding serine residues (Ser(P)(279)and Ser(P)(282)) increases the binding affinity by 2-fold for the WW domains (WW2 > WW3 ≫ WW1). The structure of the WW2·Cx43CT(276-289)(Ser(P)(279), Ser(P)(282)) complex reveals that coordination of Ser(P)(282)with the end of β-strand 3 enables Ser(P)(279)to interact with the back face of β-strand 3 (Tyr(286)is on the front face) and loop 2, forming a horseshoe-shaped arrangement. The close sequence identity of WW2 with WW1 and WW3 residues that interact with the Cx43CT PPXY motif and Ser(P)(279)/Ser(P)(282)strongly suggests that the significantly lower binding affinity of WW1 is the result of a more rigid structure. This study presents the first structure illustrating how phosphorylation of the Cx43CT domain helps mediate the interaction with a molecular partner involved in gap junction regulation. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  16. Structure and function of small heat shock/alpha-crystallin proteins: established concepts and emerging ideas.

    PubMed

    MacRae, T H

    2000-06-01

    Small heat shock/alpha-crystallin proteins are defined by conserved sequence of approximately 90 amino acid residues, termed the alpha-crystallin domain, which is bounded by variable amino- and carboxy-terminal extensions. These proteins form oligomers, most of uncertain quaternary structure, and oligomerization is prerequisite to their function as molecular chaperones. Sequence modelling and physical analyses show that the secondary structure of small heat shock/alpha-crystallin proteins is predominately beta-pleated sheet. Crystallography, site-directed spin-labelling and yeast two-hybrid selection demonstrate regions of secondary structure within the alpha-crystallin domain that interact during oligomer assembly, a process also dependent on the amino terminus. Oligomers are dynamic, exhibiting subunit exchange and organizational plasticity, perhaps leading to functional diversity. Exposure of hydrophobic residues by structural modification facilitates chaperoning where denaturing proteins in the molten globule state associate with oligomers. The flexible carboxy-terminal extension contributes to chaperone activity by enhancing the solubility of small heat shock/alpha-crystallin proteins. Site-directed mutagenesis has yielded proteins where the effect of the change on structure and function depends upon the residue modified, the organism under study and the analytical techniques used. Most revealing, substitution of a conserved arginine residue within the alpha-crystallin domain has a major impact on quaternary structure and chaperone action probably through realignment of beta-sheets. These mutations are linked to inherited diseases. Oligomer size is regulated by a stress-responsive cascade including MAPKAP kinase 2/3 and p38. Phosphorylation of small heat shock/alpha-crystallin proteins has important consequences within stressed cells, especially for microfilaments.

  17. Structurally complex and highly active RNA ligases derived from random RNA sequences

    NASA Technical Reports Server (NTRS)

    Ekland, E. H.; Szostak, J. W.; Bartel, D. P.

    1995-01-01

    Seven families of RNA ligases, previously isolated from random RNA sequences, fall into three classes on the basis of secondary structure and regiospecificity of ligation. Two of the three classes of ribozymes have been engineered to act as true enzymes, catalyzing the multiple-turnover transformation of substrates into products. The most complex of these ribozymes has a minimal catalytic domain of 93 nucleotides. An optimized version of this ribozyme has a kcat exceeding one per second, a value far greater than that of most natural RNA catalysts and approaching that of comparable protein enzymes. The fact that such a large and complex ligase emerged from a very limited sampling of sequence space implies the existence of a large number of distinct RNA structures of equivalent complexity and activity.

  18. Divergence of Structure and Function in the Haloacid Dehalogenase Enzyme Superfamily: Bacteroides thetaiotaomicron BT2127 is an Inorganic Pyrophosphatase+

    PubMed Central

    Huang, Hua; Yury, Patskovsky; Toro, Rafael; Farelli, Jeremiah D.; Pandya, Chetanya; Almo, Steven C.; Allen, Karen N.; Dunaway-Mariano, Debra

    2012-01-01

    The explosion of protein sequence information requires that current strategies for function assignment must evolve to complement experimental approaches with computationally-based function prediction. This necessitates the development of strategies based on the identification of sequence markers in the form of specificity determinants and a more informed definition of orthologues. Herein, we have undertaken the function assignment of the unknown Haloalkanoate Dehalogenase superfamily member BT2127 (Uniprot accession # Q8A5V9) from Bacteroides thetaiotaomicron using an integrated bioinformatics/structure/mechanism approach. The substrate specificity profile and steady-state rate constants of BT2127 (with kcat/Km value for pyrophosphate of ∼1 × 105 M−1 s−1), together with the gene context, supports the assigned in vivo function as an inorganic pyrophosphatase. The X-ray structural analysis of the wild-type BT2127 and several variants generated by site-directed mutagenesis shows that substrate discrimination is based, in part, on active site space restrictions imposed by the cap domain (specifically by residues Tyr76 and Glu47). Structure guided site directed mutagenesis coupled with kinetic analysis of the mutant enzymes identified the residues required for catalysis, substrate binding, and domain-domain association. Based on this structure-function analysis, the catalytic residues Asp11, Asp13, Thr113, and Lys147 as well the metal binding residues Asp171, Asn172 and Glu47 were used as markers to confirm BT2127 orthologues identified via sequence searches. This bioinformatic analysis demonstrated that the biological range of BT2127 orthologue is restricted to the phylum Bacteroidetes/Chlorobi. The key structural determinants in the divergence of BT2127 and its closest homologue β-phosphoglucomutase control the leaving group size (phosphate vs. glucose-phosphate) and the position of the Asp acid/base in the open vs. closed conformations. HADSF pyrophosphatases represent a third mechanistic and fold type for bacterial pyrophosphatases. PMID:21894910

  19. Identification of a Unique Fe-S Cluster Binding Site in a Glycyl-Radical Type Microcompartment Shell Protein

    PubMed Central

    Thompson, Michael C.; Wheatley, Nicole M.; Jorda, Julien; Sawaya, Michael R.; Gidaniyan, Soheil D.; Ahmed, Hoda; Yang, Zhongyu; McCarty, Krystal N.; Whitelegge, Julian P.; Yeates, Todd O.

    2014-01-01

    Recently, progress has been made toward understanding the functional diversity of bacterial microcompartment (MCP) systems, which serve as protein-based metabolic organelles in diverse microbes. New types of MCPs have been identified, including the glycyl-radical propanediol (Grp) MCP. Within these elaborate protein complexes, BMC-domain shell proteins assemble to form a polyhedral barrier that encapsulates the enzymatic contents of the MCP. Interestingly, the Grp MCP contains a number of shell proteins with unusual sequence features. GrpU is one such shell protein, whose amino acid sequence is particularly divergent from other members of the BMC-domain superfamily of proteins that effectively defines all MCPs. Expression, purification, and subsequent characterization of the protein showed, unexpectedly, that it binds an iron-sulfur cluster. We determined X-ray crystal structures of two GrpU orthologs, providing the first structural insight into the homohexameric BMC-domain shell proteins of the Grp system. The X-ray structures of GrpU, both obtained in the apo form, combined with spectroscopic analyses and computational modeling, show that the metal cluster resides in the central pore of the BMC shell protein at a position of broken 6-fold symmetry. The result is a structurally polymorphic iron-sulfur cluster binding site that appears to be unique among metalloproteins studied to date. PMID:25102080

  20. Conserved domains and SINE diversity during animal evolution.

    PubMed

    Luchetti, Andrea; Mantovani, Barbara

    2013-10-01

    Eukaryotic genomes harbour a number of mobile genetic elements (MGEs); moving from one genomic location to another, they are known to impact on the host genome. Short interspersed elements (SINEs) are well-represented, non-autonomous retroelements and they are likely the most diversified MGEs. In some instances, sequence domains conserved across unrelated SINEs have been identified; remarkably, one of these, called Nin, has been conserved since the Radiata-Bilateria splitting. Here we report on two new domains: Inv, derived from Nin, identified in insects and in deuterostomes, and Pln, restricted to polyneopteran insects. The identification of Inv and Pln sequences allowed us to retrieve new SINEs, two in insects and one in a hemichordate. The diverse structural combination of the different domains in different SINE families, during metazoan evolution, offers a clearer view of SINE diversity and their frequent de novo emergence through module exchange, possibly underlying the high evolutionary success of SINEs. © 2013 Elsevier Inc. All rights reserved.

  1. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, Paulina M.; Ciszak, Ewa M.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  2. Solution structure of the catalytic domain of RICH protein from goldfish.

    PubMed

    Kozlov, Guennadi; Denisov, Alexey Y; Pomerantseva, Ekaterina; Gravel, Michel; Braun, Peter E; Gehring, Kalle

    2007-03-01

    Regeneration-induced CNPase homolog (RICH) is an axonal growth-associated protein, which is induced in teleost fish upon optical nerve injury. RICH consists of a highly acidic N-terminal domain, a catalytic domain with 2',3'-cyclic nucleotide 3'-phosphodiesterase (CNPase) activity and a C-terminal isoprenylation site. In vitro RICH and mammalian brain CNPase specifically catalyze the hydrolysis of 2',3'-cyclic nucleotides to produce 2'-nucleotides, but the physiologically relevant in vivo substrate remains unknown. Here, we report the NMR structure of the catalytic domain of goldfish RICH and describe its binding to CNPase inhibitors. The structure consists of a twisted nine-stranded antiparallel beta-sheet surrounded by alpha-helices on both sides. Despite significant local differences mostly arising from a seven-residue insert in the RICH sequence, the active site region is highly similar to that of human CNPase. Likewise, refinement of the catalytic domain of rat CNPase using residual dipolar couplings gave improved agreement with the published crystal structure. NMR titrations of RICH with inhibitors point to a similar catalytic mechanism for RICH and CNPase. The results suggest a functional importance for the evolutionarily conserved phosphodiesterase activity and hint of a link with pre-tRNA splicing.

  3. Solution Structure of Homology Region (HR) Domain of Type II Secretion System*

    PubMed Central

    Gu, Shuang; Kelly, Geoff; Wang, Xiaohui; Frenkiel, Tom; Shevchik, Vladimir E.; Pickersgill, Richard W.

    2012-01-01

    The type II secretion system of Gram-negative bacteria is important for bacterial pathogenesis and survival; it is composed of 12 mostly multimeric core proteins, which build a sophisticated secretion machine spanning both bacterial membranes. OutC is the core component of the inner membrane subcomplex thought to be involved in both recognition of substrate and interaction with the outer membrane secretin OutD. Here, we report the solution structure of the HR domain of OutC and explore its interaction with the secretin. The HR domain adopts a β-sandwich-like fold consisting of two β-sheets each composed of three anti-parallel β-strands. This structure is strikingly similar to the periplasmic region of PilP, an inner membrane lipoprotein from the type IV pilus system highlighting the common evolutionary origin of these two systems and showing that all the core components of the type II secretion system have a structural or sequence ortholog within the type IV pili system. The HR domain is shown to interact with the N0 domain of the secretin. The importance of this interaction is explored in the context of the functional secretion system. PMID:22253442

  4. Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains

    PubMed Central

    Williams, Robert W; Xue, Bin; Uversky, Vladimir N; Dunker, A Keith

    2013-01-01

    The Pfam database groups regions of proteins by how well hidden Markov models (HMMs) can be trained to recognize similarities among them. Conservation pressure is probably in play here. The Pfam seed training set includes sequence and structure information, being drawn largely from the PDB. A long standing hypothesis among intrinsically disordered protein (IDP) investigators has held that conservation pressures are also at play in the evolution of different kinds of intrinsic disorder, but we find that predicted intrinsic disorder (PID) is not always conserved across Pfam domains. Here we analyze distributions and clusters of PID regions in 193024 members of the version 23.0 Pfam seed database. To include the maximum information available for proteins that remain unfolded in solution, we employ the 10 linearly independent Kidera factors1–3 for the amino acids, combined with PONDR4 predictions of disorder tendency, to transform the sequences of these Pfam members into an 11 column matrix where the number of rows is the length of each Pfam region. Cluster analyses of the set of all regions, including those that are folded, show 6 groupings of domains. Cluster analyses of domains with mean VSL2b scores greater than 0.5 (half predicted disorder or more) show at least 3 separated groups. It is hypothesized that grouping sets into shorter sequences with more uniform length will reveal more information about intrinsic disorder and lead to more finely structured and perhaps more accurate predictions. HMMs could be trained to include this information. PMID:28516017

  5. Structure of the dimeric exonuclease TREX1 in complex with DNA displays a proline-rich binding site for WW Domains.

    PubMed

    Brucet, Marina; Querol-Audí, Jordi; Serra, Maria; Ramirez-Espain, Ximena; Bertlik, Kamila; Ruiz, Lidia; Lloberas, Jorge; Macias, Maria J; Fita, Ignacio; Celada, Antonio

    2007-05-11

    TREX1 is the most abundant mammalian 3' --> 5' DNA exonuclease. It has been described to form part of the SET complex and is responsible for the Aicardi-Goutières syndrome in humans. Here we show that the exonuclease activity is correlated to the binding preferences toward certain DNA sequences. In particular, we have found three motifs that are selected, GAG, ACA, and CTGC. To elucidate how the discrimination occurs, we determined the crystal structures of two murine TREX1 complexes, with a nucleotide product of the exonuclease reaction, and with a single-stranded DNA substrate. Using confocal microscopy, we observed TREX1 both in nuclear and cytoplasmic subcellular compartments. Remarkably, the presence of TREX1 in the nucleus requires the loss of a C-terminal segment, which we named leucine-rich repeat 3. Furthermore, we detected the presence of a conserved proline-rich region on the surface of TREX1. This observation points to interactions with proline-binding domains. The potential interacting motif "PPPVPRPP" does not contain aromatic residues and thus resembles other sequences that select SH3 and/or Group 2 WW domains. By means of nuclear magnetic resonance titration experiments, we show that, indeed, a polyproline peptide derived from the murine TREX1 sequence interacted with the WW2 domain of the elongation transcription factor CA150. Co-immunoprecipitation studies confirmed this interaction with the full-length TREX1 protein, thereby suggesting that TREX1 participates in more functional complexes than previously thought.

  6. Identification of an additional member of the protein-tyrosine-phosphatase family: evidence for alternative splicing in the tyrosine phosphatase domain.

    PubMed Central

    Matthews, R J; Cahir, E D; Thomas, M L

    1990-01-01

    Protein-tyrosine-phosphatases (protein-tyrosine-phosphate phosphohydrolase, EC 3.13.48) have been implicated in the regulation of cell growth; however, to date few tyrosine phosphatases have been characterized. To identify additional family members, the cDNA for the human tyrosine phosphatase leukocyte common antigen (LCA; CD45) was used to screen, under low stringency, a mouse pre-B-cell cDNA library. Two cDNA clones were isolated and sequence analysis predicts a protein sequence of 793 amino acids. We have named the molecule LRP (LCA-related phosphatase). RNA transfer analysis indicates that the cDNAs were derived from a 3.2-kilobase mRNA. The LRP mRNA is transcribed in a wide variety of tissues. The predicted protein structure can be divided into the following structural features: a short 19-amino acid leader sequence, an exterior domain of 123 amino acids that is predicted to be highly glycosylated, a 24-amino acid membrane-spanning region, and a 627-amino acid cytoplasmic region. The cytoplasmic region contains two approximately 260-amino acid domains, each with homology to the tyrosine phosphatase family. One of the cDNA clones differed in that it had a 108-base-pair insertion that, while preserving the reading frame, would disrupt the first protein-tyrosine-phosphatase domain. Analysis of genomic DNA indicates that the insertion is due to an alternatively spliced exon. LRP appears to be evolutionarily conserved as a putative homologue has been identified in the invertebrate Styela plicata. Images PMID:2162042

  7. A Coincidence Detection Mechanism Controls PX-BAR Domain-Mediated Endocytic Membrane Remodeling via an Allosteric Structural Switch.

    PubMed

    Lo, Wen-Ting; Vujičić Žagar, Andreja; Gerth, Fabian; Lehmann, Martin; Puchkov, Dymtro; Krylova, Oxana; Freund, Christian; Scapozza, Leonardo; Vadas, Oscar; Haucke, Volker

    2017-11-20

    Clathrin-mediated endocytosis occurs by bending and remodeling of the membrane underneath the coat. Bin-amphiphysin-rvs (BAR) domain proteins are crucial for endocytic membrane remodeling, but how their activity is spatiotemporally controlled is largely unknown. We demonstrate that the membrane remodeling activity of sorting nexin 9 (SNX9), a late-acting endocytic PX-BAR domain protein required for constriction of U-shaped endocytic intermediates, is controlled by an allosteric structural switch involving coincident detection of the clathrin adaptor AP2 and phosphatidylinositol-3,4-bisphosphate (PI(3,4)P 2 ) at endocytic sites. Structural, biochemical, and cell biological data show that SNX9 is autoinhibited in solution. Binding to PI(3,4)P 2 via its PX-BAR domain, and concomitant association with AP2 via sequences in the linker region, releases SNX9 autoinhibitory contacts to enable membrane constriction. Our results reveal a mechanism for restricting the latent membrane remodeling activity of BAR domain proteins to allow spatiotemporal coupling of membrane constriction to the progression of the endocytic pathway. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Crystal structure of a chimaeric bacterial glutamate dehydrogenase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oliveira, Tânia; Sharkey, Michael A.; Engel, Paul C.

    2016-05-23

    Glutamate dehydrogenases (EC 1.4.1.2–4) catalyse the oxidative deamination of L-glutamate to α-ketoglutarate using NAD(P) +as a cofactor. The bacterial enzymes are hexameric, arranged with 32 symmetry, and each polypeptide consists of an N-terminal substrate-binding segment (domain I) followed by a C-terminal cofactor-binding segment (domain II). The catalytic reaction takes place in the cleft formed at the junction of the two domains. Distinct signature sequences in the nucleotide-binding domain have been linked to the binding of NAD +versusNADP +, but they are not unambiguous predictors of cofactor preference. In the absence of substrate, the two domains move apart as rigid bodies,more » as shown by the apo structure of glutamate dehydrogenase fromClostridium symbiosum. Here, the crystal structure of a chimaeric clostridial/Escherichia colienzyme has been determined in the apo state. The enzyme is fully functional and reveals possible determinants of interdomain flexibility at a hinge region following the pivot helix. The enzyme retains the preference for NADP +cofactor from the parentE. colidomain II, although there are subtle differences in catalytic activity.« less

  9. Crystal structure of Toll-like receptor adaptor MAL/TIRAP reveals the molecular basis for signal transduction and disease protection

    PubMed Central

    Valkov, Eugene; Stamp, Anna; DiMaio, Frank; Baker, David; Verstak, Brett; Roversi, Pietro; Kellie, Stuart; Sweet, Matthew J.; Mansell, Ashley; Gay, Nicholas J.; Martin, Jennifer L.; Kobe, Bostjan

    2011-01-01

    Initiation of the innate immune response requires agonist recognition by pathogen-recognition receptors such as the Toll-like receptors (TLRs). Toll/interleukin-1 receptor (TIR) domain-containing adaptors are critical in orchestrating the signal transduction pathways after TLR and interleukin-1 receptor activation. Myeloid differentiation primary response gene 88 (MyD88) adaptor-like (MAL)/TIR domain-containing adaptor protein (TIRAP) is involved in bridging MyD88 to TLR2 and TLR4 in response to bacterial infection. Genetic studies have associated a number of unique single-nucleotide polymorphisms in MAL with protection against invasive microbial infection, but a molecular understanding has been hampered by a lack of structural information. The present study describes the crystal structure of MAL TIR domain. Significant structural differences exist in the overall fold of MAL compared with other TIR domain structures: A sequence motif comprising a β-strand in other TIR domains instead corresponds to a long loop, placing the functionally important “BB loop” proline motif in a unique surface position in MAL. The structure suggests possible dimerization and MyD88-interacting interfaces, and we confirm the key interface residues by coimmunoprecipitation using site-directed mutants. Jointly, our results provide a molecular and structural basis for the role of MAL in TLR signaling and disease protection. PMID:21873236

  10. Analysis of Structures, Functions, and Epitopes of Cysteine Protease from Spirometra erinaceieuropaei Spargana

    PubMed Central

    Liu, Li Na; Cui, Jing; Zhang, Xi; Wei, Tong; Jiang, Peng; Wang, Zhong Quan

    2013-01-01

    Spirometra erinaceieuropaei cysteine protease (SeCP) in sparganum ES proteins recognized by early infection sera was identified by MALDI-TOF/TOF-MS. The aim of this study was to predict the structures and functions of SeCP protein by using the full length cDNA sequence of SeCP gene with online sites and software programs. The SeCP gene sequence was of 1 053 bp length with a 1011 bp biggest ORF encoding 336-amino acid protein with a complete cathepsin propeptide inhibitor domain and a peptidase C1A conserved domain. The predicted molecular weight and isoelectric point of SeCP were 37.87 kDa and 6.47, respectively. The SeCP has a signal peptide site and no transmembrane domain, located outside the membrane. The secondary structure of SeCP contained 8 α-helixes, 7 β-strands, and 20 coils. The SeCP had 15 potential antigenic epitopes and 19 HLA-I restricted epitopes. Based on the phylogenetic analysis of SeCP, S. erinaceieuropaei has the closest evolutionary status with S. mansonoides. SeCP was a kind of proteolytic enzyme with a variety of biological functions and its antigenic epitopes could provide important insights on the diagnostic antigens and target molecular of antisparganum drugs. PMID:24392448

  11. DNA Recognition by a σ 54 Transcriptional Activator from Aquifex aeolicus

    DOE PAGES

    Vidangos, Natasha K.; Heideker, Johanna; Lyubimov, Artem; ...

    2014-08-23

    Transcription initiation by bacterial σ 54-polymerase requires the action of a transcriptional activator protein. Activators bind sequence-specifically upstream of the transcription initiation site via a DNA-binding domain. The structurally characterized DNA-binding domains from activators all belong to the Factor for Inversion Stimulation (Fis) family of helix-turn-helix DNA-binding proteins. We report here structures of the free and DNA-bound forms of the DNA-binding domain of NtrC4 (4DBD) from Aquifex aeolicus, a member of the NtrC family of σ 54 activators. Two NtrC4 binding sites were identified upstream (-145 and -85 base pairs) from the start of the lpxC gene, which is responsiblemore » for the first committed step in Lipid A biosynthesis. This is the first experimental evidence for σ 54 regulation in lpxC expression. 4DBD was crystallized both without DNA and in complex with the -145 binding site. The structures, together with biochemical data, indicate that NtrC4 binds to DNA in a manner that is similar to that of its close homologue, Fis. Ultimately, the greater sequence specificity for the binding of 4DBD relative to Fis seems to arise from a larger number of base specific contacts contributing to affinity than for Fis.« less

  12. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  13. Rebelling for a Reason: Protein Structural “Outliers”

    PubMed Central

    Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini

    2013-01-01

    Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209

  14. Sequence of ligand binding and structure change in the diphtheria toxin repressor upon activation by divalent transition metals.

    PubMed

    Rangachari, Vijayaraghavan; Marin, Vedrana; Bienkiewicz, Ewa A; Semavina, Maria; Guerrero, Luis; Love, John F; Murphy, John R; Logan, Timothy M

    2005-04-19

    The diphtheria toxin repressor (DtxR) is an Fe(II)-activated transcriptional regulator of iron homeostatic and virulence genes in Corynebacterium diphtheriae. DtxR is a two-domain protein that contains two structurally and functionally distinct metal binding sites. Here, we investigate the molecular steps associated with activation by Ni(II)Cl(2) and Cd(II)Cl(2). Equilibrium binding energetics for Ni(II) were obtained from isothermal titration calorimetry, indicating apparent metal dissociation constants of 0.2 and 1.7 microM for two independent sites. The binding isotherms for Ni(II) and Cd(II) exhibited a characteristic exothermic-endothermic pattern that was used to infer the metal binding sequence by comparing the wild-type isotherm with those of several binding site mutants. These data were complemented by measuring the distance between specific backbone amide nitrogens and the first equivalent of metal through heteronuclear NMR relaxation measurements. Previous studies indicated that metal binding affects a disordered to ordered transition in the metal binding domain. The coupling between metal binding and structure change was investigated using near-UV circular dichroism spectroscopy. Together, the data show that the first equivalent of metal is bound by the primary metal binding site. This binding orients the DNA binding helices and begins to fold the N-terminal domain. Subsequent binding at the ancillary site completes the folding of this domain and formation of the dimer interface. This model is used to explain the behavior of several mutants.

  15. Domain organization, genomic structure, evolution, and regulation of expression of the aggrecan gene family.

    PubMed

    Schwartz, N B; Pirok, E W; Mensch, J R; Domowicz, M S

    1999-01-01

    Proteoglycans are complex macromolecules, consisting of a polypeptide backbone to which are covalently attached one or more glycosaminoglycan chains. Molecular cloning has allowed identification of the genes encoding the core proteins of various proteoglycans, leading to a better understanding of the diversity of proteoglycan structure and function, as well as to the evolution of a classification of proteoglycans on the basis of emerging gene families that encode the different core proteins. One such family includes several proteoglycans that have been grouped with aggrecan, the large aggregating chondroitin sulfate proteoglycan of cartilage, based on a high number of sequence similarities within the N- and C-terminal domains. Thus far these proteoglycans include versican, neurocan, and brevican. It is now apparent that these proteins, as a group, are truly a gene family with shared structural motifs on the protein and nucleotide (mRNA) levels, and with nearly identical genomic organizations. Clearly a common ancestral origin is indicated for the members of the aggrecan family of proteoglycans. However, differing patterns of amplification and divergence have also occurred within certain exons across species and family members, leading to the class-characteristic protein motifs in the central carbohydrate-rich region exclusively. Thus the overall domain organization strongly suggests that sequence conservation in the terminal globular domains underlies common functions, whereas differences in the central portions of the genes account for functional specialization among the members of this gene family.

  16. Web-ware bioinformatical analysis and structure modelling of N-terminus of human multisynthetase complex auxiliary component protein p43.

    PubMed

    Deineko, Viktor

    2006-01-01

    Human multisynthetase complex auxiliary component, protein p43 is an endothelial monocyte-activating polypeptide II precursor. In this study, comprehensive sequence analysis of N-terminus has been performed to identify structural domains, motifs, sites of post-translation modification and other functionally important parameters. The spatial structure model of full-chain protein p43 is obtained.

  17. In silico structural analysis of group 3, 6 and 9 allergens from Dermatophagoides farinae.

    PubMed

    Teng, Feixiang; Yu, Lili; Bian, Yonghua; Sun, Jinxia; Wu, Juansong; Ling, Cunbao; Yang, Li; Wang, Yungang; Cui, Yubao

    2015-05-01

    Dermatophagoides farinae (Hughes; Acari: Pyroglyphidae) are the predominant source of dust mite allergens, which provoke allergic diseases, such as rhinitis, asthma and eczema. Of the 30 allergen groups produced by D. farinae, the Der f 3, Der f 6 and Der f 9 allergens are all trypsin‑associated proteins, however little else is currently known about them. The present study used in silico tools to compare the amino acid sequences, and predict the secondary and tertiary structures of Der f 3, Der f 6 and Der f 9 allergens. Protein sequence alignment detected ~46% identity between Der f 3, Der f 6 and Der f 9. Furthermore, each protein was shown to contain three active sites and two highly conserved trypsin functional domains. Predictions of the secondary and tertiary structure identified α‑helices, β‑sheets and random coils. The active sites of the three proteins appeared to fold onto each other in a three‑dimensional model, constituting the active site of the enzyme. Epitope analysis demonstrated that Der f 3, Der f 6 and Der f 9 have 4‑5 potential epitopes located in random coils, and the epitope sequences of Der f 3, Der f 6 and Der f 9 were shown to overlap in two domains (at amino acids 83‑87 and 179‑180); however the residues in these two domains were not identical. The present study aimed to conduct a biochemical and genetic analysis of these three allergens, and to potentially contribute to the development of vaccines for allergen‑specific immunotherapy.

  18. Cellobiose dehydrogenase of Chaetomium sp. INBI 2-26(-): structural basis of enhanced activity toward glucose at neutral pH.

    PubMed

    Vasilchenko, Liliya G; Karapetyan, Karen N; Yershevich, Olga P; Ludwig, Roland; Zamocky, Marcel; Peterbauer, Clemens K; Haltrich, Dietmar; Rabinovich, Mikhail L

    2011-05-01

    Cellobiose dehydrogenase (CDH) is an extracellular fungal flavocytochrome specifically oxidizing cellooligosaccharides and lactose to corresponding (-lactones by a variety of electron acceptors. In contrast to basidiomycetous CDHs, CDHs of ascomycetes also display certain activity toward glucose. The objective of this study was to establish the structural reasons of such an activity of CDH from mesophilic ascomycete Chaetomium sp. INBI 2-26 (ChCDH). The complete amino acid sequence of ChCDH displayed high levels of similarity with the amino acid sequences of CDHs from the thermophilic fungi Thielavia heterotallica and Myriococcum thermophilum. Peptide mass fingerprinting of purified ChCDH provided evidence for the oxidation of methionine residues in the FAD-domain. Comparative homology modeling of the structure of the ChCDH FAD-domain in complex with the transition state analog based on the structure of the same complex of basidiomycetous CDH (1NAA) as template indicated possible structural reasons for the enhanced activity of ascomycetous CDHs toward glucose at neutral pH, which is a prerequisite for application of CDH in a variety of biocompatible biosensors and biofuel cells. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Allosteric control of transcription in GntR family of transcription regulators: A structural overview.

    PubMed

    Jain, Deepti

    2015-07-01

    The GntR family of transcription regulators constitutes one of the most abundant family of transcription factors. These modulators are involved in a variety of mechanisms controlling various metabolic processes. GntR family members are typically two domain proteins with a smaller N-terminus domain (NTD) with conserved architecture of winged-helix-turn-helix (wHTH) for DNA binding and a larger C-terminus domain (CTD) or the effector binding domain which is also involved in oligomerization. Interestingly, the CTD shows structural heterogeneity depending upon the type of effector molecule that it binds and displays structural homology to various classes of proteins. Binding of the effector molecule to the CTD brings about a conformational change in the transcription factor such that its affinity for its cognate DNA sequence is altered. This review summarizes the structural information available on the members of GntR family and discusses the common features of the DNA binding and operator recognition within the family. The variation in the allosteric mechanism employed by the members of this family is also discussed. © 2015 International Union of Biochemistry and Molecular Biology.

  20. The Leptospiral Antigen Lp49 is a Two-Domain Protein with Putative Protein Binding Function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oliveira Giuseppe,P.; Oliveira Neves, F.; Nascimento, A.

    2008-01-01

    Pathogenic Leptospira is the etiological agent of leptospirosis, a life-threatening disease that affects populations worldwide. Currently available vaccines have limited effectiveness and therapeutic interventions are complicated by the difficulty in making an early diagnosis of leptospirosis. The genome of Leptospira interrogans was recently sequenced and comparative genomic analysis contributed to the identification of surface antigens, potential candidates for development of new vaccines and serodiagnosis. Lp49 is a membrane-associated protein recognized by antibodies present in sera from early and convalescent phases of leptospirosis patients. Its crystal structure was determined by single-wavelength anomalous diffraction using selenomethionine-labelled crystals and refined at 2.0 Angstromsmore » resolution. Lp49 is composed of two domains and belongs to the all-beta-proteins class. The N-terminal domain folds in an immunoglobulin-like beta-sandwich structure, whereas the C-terminal domain presents a seven-bladed beta-propeller fold. Structural analysis of Lp49 indicates putative protein-protein binding sites, suggesting a role in Leptospira-host interaction. This is the first crystal structure of a leptospiral antigen described to date.« less

  1. Mapping of the minimal inorganic phosphate transporting unit of human PiT2 suggests a structure universal to PiT-related proteins from all kingdoms of life

    PubMed Central

    2011-01-01

    Background The inorganic (Pi) phosphate transporter (PiT) family comprises known and putative Na+- or H+-dependent Pi-transporting proteins with representatives from all kingdoms. The mammalian members are placed in the outer cell membranes and suggested to supply cells with Pi to maintain house-keeping functions. Alignment of protein sequences representing PiT family members from all kingdoms reveals the presence of conserved amino acids and that bacterial phosphate permeases and putative phosphate permeases from archaea lack substantial parts of the protein sequence when compared to the mammalian PiT family members. Besides being Na+-dependent Pi (NaPi) transporters, the mammalian PiT paralogs, PiT1 and PiT2, also are receptors for gamma-retroviruses. We have here exploited the dual-function of PiT1 and PiT2 to study the structure-function relationship of PiT proteins. Results We show that the human PiT2 histidine, H502, and the human PiT1 glutamate, E70, - both conserved in eukaryotic PiT family members - are critical for Pi transport function. Noticeably, human PiT2 H502 is located in the C-terminal PiT family signature sequence, and human PiT1 E70 is located in ProDom domains characteristic for all PiT family members. A human PiT2 truncation mutant, which consists of the predicted 10 transmembrane (TM) domain backbone without a large intracellular domain (human PiT2ΔR254-V483), was found to be a fully functional Pi transporter. Further truncation of the human PiT2 protein by additional removal of two predicted TM domains together with the large intracellular domain created a mutant that resembles a bacterial phosphate permease and an archaeal putative phosphate permease. This human PiT2 truncation mutant (human PiT2ΔL183-V483) did also support Pi transport albeit at very low levels. Conclusions The results suggest that the overall structure of the Pi-transporting unit of the PiT family proteins has remained unchanged during evolution. Moreover, in combination, our studies of the gene structure of the human PiT1 and PiT2 genes (SLC20A1 and SLC20A2, respectively) and alignment of protein sequences of PiT family members from all kingdoms, along with the studies of the dual functions of the human PiT paralogs show that these proteins are excellent as models for studying the evolution of a protein's structure-function relationship. PMID:21586110

  2. Mapping of the minimal inorganic phosphate transporting unit of human PiT2 suggests a structure universal to PiT-related proteins from all kingdoms of life.

    PubMed

    Bøttger, Pernille; Pedersen, Lene

    2011-05-17

    The inorganic (Pi) phosphate transporter (PiT) family comprises known and putative Na(+)- or H(+)-dependent Pi-transporting proteins with representatives from all kingdoms. The mammalian members are placed in the outer cell membranes and suggested to supply cells with Pi to maintain house-keeping functions. Alignment of protein sequences representing PiT family members from all kingdoms reveals the presence of conserved amino acids and that bacterial phosphate permeases and putative phosphate permeases from archaea lack substantial parts of the protein sequence when compared to the mammalian PiT family members. Besides being Na(+)-dependent P(i) (NaP(i)) transporters, the mammalian PiT paralogs, PiT1 and PiT2, also are receptors for gamma-retroviruses. We have here exploited the dual-function of PiT1 and PiT2 to study the structure-function relationship of PiT proteins. We show that the human PiT2 histidine, H(502), and the human PiT1 glutamate, E(70),--both conserved in eukaryotic PiT family members--are critical for P(i) transport function. Noticeably, human PiT2 H(502) is located in the C-terminal PiT family signature sequence, and human PiT1 E(70) is located in ProDom domains characteristic for all PiT family members.A human PiT2 truncation mutant, which consists of the predicted 10 transmembrane (TM) domain backbone without a large intracellular domain (human PiT2ΔR(254)-V(483)), was found to be a fully functional P(i) transporter. Further truncation of the human PiT2 protein by additional removal of two predicted TM domains together with the large intracellular domain created a mutant that resembles a bacterial phosphate permease and an archaeal putative phosphate permease. This human PiT2 truncation mutant (human PiT2ΔL(183)-V(483)) did also support P(i) transport albeit at very low levels. The results suggest that the overall structure of the P(i)-transporting unit of the PiT family proteins has remained unchanged during evolution. Moreover, in combination, our studies of the gene structure of the human PiT1 and PiT2 genes (SLC20A1 and SLC20A2, respectively) and alignment of protein sequences of PiT family members from all kingdoms, along with the studies of the dual functions of the human PiT paralogs show that these proteins are excellent as models for studying the evolution of a protein's structure-function relationship. © 2011 Bøttger and Pedersen; licensee BioMed Central Ltd.

  3. A definition of the domains Archaea, Bacteria and Eucarya in terms of small subunit ribosomal RNA characteristics

    NASA Technical Reports Server (NTRS)

    Winker, S.; Woese, C. R.

    1991-01-01

    The number of small subunit rRNA sequences is now great enough that the three domains Archaea, Bacteria and Eucarya (Woese et al., 1990) can be reliably defined in terms of their sequence "signatures". Approximately 50 homologous positions (or nucleotide pairs) in the small subunit rRNA characterize and distinguish among the three. In addition, the three can be recognized by a variety of nonhomologous rRNA characters, either individual positions and/or higher-order structural features. The Crenarchaeota and the Euryarchaeota, the two archaeal kingdoms, can also be defined and distinguished by their characteristic compositions at approximately fifteen positions in the small subunit rRNA molecule.

  4. Structural And Functional Studies of ALIX Interactions With YPXnL Late Domains of HIV-1 And EIAV

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhai, Q.; Fisher, R.D.; Chung, H.-Y.

    2009-05-28

    Retrovirus budding requires short peptide motifs (late domains) located within the viral Gag protein that function by recruiting cellular factors. The YPX{sub n}L late domains of HIV and other lentiviruses recruit the protein ALIX (also known as AIP1), which also functions in vesicle formation at the multivesicular body and in the abscission stage of cytokinesis. Here, we report the crystal structures of ALIX in complex with the YPX{sub n}L late domains from HIV-1 and EIAV. The two distinct late domains bind at the same site on the ALIX V domain but adopt different conformations that allow them to make equivalentmore » contacts. Binding studies and functional assays verified the importance of key interface residues and revealed that binding affinities are tuned by context-dependent effects. These results reveal how YPX{sub n}L late domains recruit ALIX to facilitate virus budding and how ALIX can bind YPX{sub n}L sequences with both n = 1 and n = 3.« less

  5. The structure of the cyanobactin domain of unknown function from PatG in the patellamide gene cluster

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mann, Greg; Koehnke, Jesko; Bent, Andrew F.

    The highly conserved domain of unknown function in the cyanobactin superfamily has a novel fold. The protein does not appear to bind the most plausible substrates, leaving questions as to its role. Patellamides are members of the cyanobactin family of ribosomally synthesized and post-translationally modified cyclic peptide natural products, many of which, including some patellamides, are biologically active. A detailed mechanistic understanding of the biosynthetic pathway would enable the construction of a biotechnological ‘toolkit’ to make novel analogues of patellamides that are not found in nature. All but two of the protein domains involved in patellamide biosynthesis have been characterized.more » The two domains of unknown function (DUFs) are homologous to each other and are found at the C-termini of the multi-domain proteins PatA and PatG. The domain sequence is found in all cyanobactin-biosynthetic pathways characterized to date, implying a functional role in cyanobactin biosynthesis. Here, the crystal structure of the PatG DUF domain is reported and its binding interactions with plausible substrates are investigated.« less

  6. The structures of non-CG-repeat Z-DNAs co-crystallized with the Z-DNA-binding domain, hZ alpha(ADAR1).

    PubMed

    Ha, Sung Chul; Choi, Jongkeun; Hwang, Hye-Yeon; Rich, Alexander; Kim, Yang-Gyun; Kim, Kyeong Kyu

    2009-02-01

    The Z-DNA conformation preferentially occurs at alternating purine-pyrimidine repeats, and is specifically recognized by Z alpha domains identified in several Z-DNA-binding proteins. The binding of Z alpha to foreign or chromosomal DNA in various sequence contexts is known to influence various biological functions, including the DNA-mediated innate immune response and transcriptional modulation of gene expression. For these reasons, understanding its binding mode and the conformational diversity of Z alpha bound Z-DNAs is of considerable importance. However, structural studies of Z alpha bound Z-DNA have been mostly limited to standard CG-repeat DNAs. Here, we have solved the crystal structures of three representative non-CG repeat DNAs, d(CACGTG)(2), d(CGTACG)(2) and d(CGGCCG)(2) complexed to hZ alpha(ADAR1) and compared those structures with that of hZ alpha(ADAR1)/d(CGCGCG)(2) and the Z alpha-free Z-DNAs. hZ alpha(ADAR1) bound to each of the three Z-DNAs showed a well conserved binding mode with very limited structural deviation irrespective of the DNA sequence, although varying numbers of residues were in contact with Z-DNA. Z-DNAs display less structural alterations in the Z alpha-bound state than in their free form, thereby suggesting that conformational diversities of Z-DNAs are restrained by the binding pocket of Z alpha. These data suggest that Z-DNAs are recognized by Z alpha through common conformational features regardless of the sequence and structural alterations.

  7. Basis of altered RNA-binding specificity by PUF proteins revealed by crystal structures of yeast Puf4p

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, Matthew T.; Higgin, Joshua J.; Hall, Traci M.Tanaka

    2008-06-06

    Pumilio/FBF (PUF) family proteins are found in eukaryotic organisms and regulate gene expression post-transcriptionally by binding to sequences in the 3' untranslated region of target transcripts. PUF proteins contain an RNA binding domain that typically comprises eight {alpha}-helical repeats, each of which recognizes one RNA base. Some PUF proteins, including yeast Puf4p, have altered RNA binding specificity and use their eight repeats to bind to RNA sequences with nine or ten bases. Here we report the crystal structures of Puf4p alone and in complex with a 9-nucleotide (nt) target RNA sequence, revealing that Puf4p accommodates an 'extra' nucleotide by modestmore » adaptations allowing one base to be turned away from the RNA binding surface. Using structural information and sequence comparisons, we created a mutant Puf4p protein that preferentially binds to an 8-nt target RNA sequence over a 9-nt sequence and restores binding of each protein repeat to one RNA base.« less

  8. A single gene for lycopene cyclase, phytoene synthase, and regulation of carotene biosynthesis in Phycomyces

    PubMed Central

    Arrach, Nabil; Fernández-Martín, Rafael; Cerdá-Olmedo, Enrique; Avalos, Javier

    2001-01-01

    Previous complementation and mapping of mutations that change the usual yellow color of the Zygomycete Phycomyces blakesleeanus to white or red led to the definition of two structural genes for carotene biosynthesis. We have cloned one of these genes, carRA, by taking advantage of its close linkage to the other, carB, responsible for phytoene dehydrogenase. The sequences of the wild type and six mutants have been established, compared with sequences in other organisms, and correlated with the mutant phenotypes. The carRA and carB coding sequences are separated by 1,381 untranslated nucleotides and are divergently transcribed. Gene carRA contains separate domains for two enzymes, lycopene cyclase and phytoene synthase, and regulates the overall activity of the pathway and its response to physical and chemical stimuli from the environment. The lycopene cyclase domain of carRA derived from a duplication of a gene from a common ancestor of fungi and Brevibacterium linens; the phytoene synthase domain is similar to the phytoene and squalene synthases of many organisms; but the regulatory functions appear to be specific to Phycomyces. PMID:11172012

  9. The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence.

    PubMed

    Lahr, Roni M; Mack, Seshat M; Héroux, Annie; Blagden, Sarah P; Bousquet-Antonelli, Cécile; Deragon, Jean-Marc; Berman, Andrea J

    2015-09-18

    La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence

    DOE PAGES

    Lahr, Roni M.; Mack, Seshat M.; Heroux, Annie; ...

    2015-07-22

    La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. Amore » putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. Ultimately, these studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.« less

  11. Defining functional distance using manifold embeddings of gene ontology annotations

    PubMed Central

    Lerman, Gilad; Shakhnovich, Boris E.

    2007-01-01

    Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities. Finally, we show that manual classification of structures into folds and superfamilies is mirrored by proximity in the newly defined function space. We show how functional distances place structure–function relationships in biological context resulting in insight into divergent and convergent evolution. The methods and results in this paper can be readily generalized and applied to a wide array of biologically relevant investigations, such as accuracy of annotation transference, the relationship between sequence, structure, and function, or coherence of expression modules. PMID:17595300

  12. Non-B-DNA structures on the interferon-beta promoter?

    PubMed

    Robbe, K; Bonnefoy, E

    1998-01-01

    The high mobility group (HMG) I protein intervenes as an essential factor during the virus induced expression of the interferon-beta (IFN-beta) gene. It is a non-histone chromatine associated protein that has the dual capacity of binding to a non-B-DNA structure such as cruciform-DNA as well as to AT rich B-DNA sequences. In this work we compare the binding affinity of HMGI for a synthetic cruciform-DNA to its binding affinity for the HMGI-binding-site present in the positive regulatory domain II (PRDII) of the IFN-beta promoter. Using gel retardation experiments, we show that HMGI protein binds with at least ten times more affinity to the synthetic cruciform-DNA structure than to the PRDII B-DNA sequence. DNA hairpin sequences are present in both the human and the murine PRDII-DNAs. We discuss in this work the presence of, yet putative, non-B-DNA structures in the IFN-beta promoter.

  13. The crystal structure of the streptococcal collagen-like protein 2 globular domain from invasive M3-type group A Streptococcus shows significant similarity to immunomodulatory HIV protein gp41.

    PubMed

    Squeglia, Flavia; Bachert, Beth; De Simone, Alfonso; Lukomski, Slawomir; Berisio, Rita

    2014-02-21

    The arsenal of virulence factors deployed by streptococci includes streptococcal collagen-like (Scl) proteins. These proteins, which are characterized by a globular domain and a collagen-like domain, play key roles in host adhesion, host immune defense evasion, and biofilm formation. In this work, we demonstrate that the Scl2.3 protein is expressed on the surface of invasive M3-type strain MGAS315 of Streptococcus pyogenes. We report the crystal structure of Scl2.3 globular domain, the first of any Scl. This structure shows a novel fold among collagen trimerization domains of either bacterial or human origin. Despite there being low sequence identity, we observed that Scl2.3 globular domain structurally resembles the gp41 subunit of the envelope glycoprotein from human immunodeficiency virus type 1, an essential subunit for viral fusion to human T cells. We combined crystallographic data with modeling and molecular dynamics techniques to gather information on the entire lollipop-like Scl2.3 structure. Molecular dynamics data evidence a high flexibility of Scl2.3 with remarkable interdomain motions that are likely instrumental to the protein biological function in mediating adhesive or immune-modulatory functions in host-pathogen interactions. Altogether, our results provide molecular tools for the understanding of Scl-mediated streptococcal pathogenesis and important structural insights for the future design of small molecular inhibitors of streptococcal invasion.

  14. A surprisingly large RNase P RNA in Candida glabrata

    PubMed Central

    KACHOURI, RYM; STRIBINSKIS, VILIUS; ZHU, YANGLONG; RAMOS, KENNETH S.; WESTHOF, ERIC; LI, YONG

    2005-01-01

    We have found an extremely large ribonuclease P (RNase P) RNA (RPR1) in the human pathogen Candida glabrata and verified that this molecule is expressed and present in the active enzyme complex of this hemiascomycete yeast. A structural alignment of the C. glabrata sequence with 36 other hemiascomycete RNase P RNAs (abbreviated as P RNAs) allows us to characterize the types of insertions. In addition, 15 P RNA sequences were newly characterized by searching in the recently sequenced genomes Candida albicans, C. glabrata, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Kluyveromyces waltii, Naumovia castellii, Saccharomyces kudriavzevii, Saccharomyces mikatae, and Yarrowia lipolytica; and by PCR amplification for other Candida species (Candida guilliermondii, Candida krusei, Candida parapsilosis, Candida stellatoidea, and Candida tropicalis). The phylogenetic comparative analysis identifies a hemiascomycete secondary structure consensus that presents a conserved core in all species with variable insertions or deletions. The most significant variability is found in C. glabrata P RNA in which three insertions exceeding in total 700 nt are present in the Specificity domain. This P RNA is more than twice the length of any other homologous P RNAs known in the three domains of life and is eight times the size of the smallest. RNase P RNA, therefore, represents one of the most diversified noncoding RNAs in terms of size variation and structural diversity. PMID:15987816

  15. Structure of a Trypanosoma Brucei Alpha/Beta--Hydrolase Fold Protein With Unknown Function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Merritt, E.A.; Holmes, M.; Buckner, F.S.

    2009-05-26

    The structure of a structural genomics target protein, Tbru020260AAA from Trypanosoma brucei, has been determined to a resolution of 2.2 {angstrom} using multiple-wavelength anomalous diffraction at the Se K edge. This protein belongs to Pfam sequence family PF08538 and is only distantly related to previously studied members of the {alpha}/{beta}-hydrolase fold family. Structural superposition onto representative {alpha}/{beta}-hydrolase fold proteins of known function indicates that a possible catalytic nucleophile, Ser116 in the T. brucei protein, lies at the expected location. However, the present structure and by extension the other trypanosomatid members of this sequence family have neither sequence nor structural similaritymore » at the location of other active-site residues typical for proteins with this fold. Together with the presence of an additional domain between strands {beta}6 and {beta}7 that is conserved in trypanosomatid genomes, this suggests that the function of these homologs has diverged from other members of the fold family.« less

  16. Structural analysis of a 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase with an N-terminal chorismate mutase-like regulatory domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Light, Samuel H.; Halavaty, Andrei S.; Minasov, George

    2012-06-27

    3-Deoxy-D-arabino-heptulosonate 7-phosphate synthase (DAHPS) catalyzes the first step in the biosynthesis of a number of aromatic metabolites. Likely because this reaction is situated at a pivotal biosynthetic gateway, several DAHPS classes distinguished by distinct mechanisms of allosteric regulation have independently evolved. One class of DAHPSs contains a regulatory domain with sequence homology to chorismate mutase - an enzyme further downstream of DAHPS that catalyzes the first committed step in tyrosine/phenylalanine biosynthesis - and is inhibited by chorismate mutase substrate (chorismate) and product (prephenate). Described in this work, structures of the Listeria monocytogenes chorismate/prephenate regulated DAHPS in complex with Mn{sup 2+}more » and Mn{sup 2+} + phosphoenolpyruvate reveal an unusual quaternary architecture: DAHPS domains assemble as a tetramer, from either side of which chorismate mutase-like (CML) regulatory domains asymmetrically emerge to form a pair of dimers. This domain organization suggests that chorismate/prephenate binding promotes a stable interaction between the discrete regulatory and catalytic domains and supports a mechanism of allosteric inhibition similar to tyrosine/phenylalanine control of a related DAHPS class. We argue that the structural similarity of chorismate mutase enzyme and CML regulatory domain provides a unique opportunity for the design of a multitarget antibacterial.« less

  17. Molecular dynamics simulations and docking enable to explore the biophysical factors controlling the yields of engineered nanobodies.

    PubMed

    Soler, Miguel A; de Marco, Ario; Fortuna, Sara

    2016-10-10

    Nanobodies (VHHs) have proved to be valuable substitutes of conventional antibodies for molecular recognition. Their small size represents a precious advantage for rational mutagenesis based on modelling. Here we address the problem of predicting how Camelidae nanobody sequences can tolerate mutations by developing a simulation protocol based on all-atom molecular dynamics and whole-molecule docking. The method was tested on two sets of nanobodies characterized experimentally for their biophysical features. One set contained point mutations introduced to humanize a wild type sequence, in the second the CDRs were swapped between single-domain frameworks with Camelidae and human hallmarks. The method resulted in accurate scoring approaches to predict experimental yields and enabled to identify the structural modifications induced by mutations. This work is a promising tool for the in silico development of single-domain antibodies and opens the opportunity to customize single functional domains of larger macromolecules.

  18. Molecular dynamics simulations and docking enable to explore the biophysical factors controlling the yields of engineered nanobodies

    NASA Astrophysics Data System (ADS)

    Soler, Miguel A.; De Marco, Ario; Fortuna, Sara

    2016-10-01

    Nanobodies (VHHs) have proved to be valuable substitutes of conventional antibodies for molecular recognition. Their small size represents a precious advantage for rational mutagenesis based on modelling. Here we address the problem of predicting how Camelidae nanobody sequences can tolerate mutations by developing a simulation protocol based on all-atom molecular dynamics and whole-molecule docking. The method was tested on two sets of nanobodies characterized experimentally for their biophysical features. One set contained point mutations introduced to humanize a wild type sequence, in the second the CDRs were swapped between single-domain frameworks with Camelidae and human hallmarks. The method resulted in accurate scoring approaches to predict experimental yields and enabled to identify the structural modifications induced by mutations. This work is a promising tool for the in silico development of single-domain antibodies and opens the opportunity to customize single functional domains of larger macromolecules.

  19. Adenovirus fibre shaft sequences fold into the native triple beta-spiral fold when N-terminally fused to the bacteriophage T4 fibritin foldon trimerisation motif.

    PubMed

    Papanikolopoulou, Katerina; Teixeira, Susana; Belrhali, Hassan; Forsyth, V Trevor; Mitraki, Anna; van Raaij, Mark J

    2004-09-03

    Adenovirus fibres are trimeric proteins that consist of a globular C-terminal domain, a central fibrous shaft and an N-terminal part that attaches to the viral capsid. In the presence of the globular C-terminal domain, which is necessary for correct trimerisation, the shaft segment adopts a triple beta-spiral conformation. We have replaced the head of the fibre by the trimerisation domain of the bacteriophage T4 fibritin, the foldon. Two different fusion constructs were made and crystallised, one with an eight amino acid residue linker and one with a linker of only two residues. X-ray crystallographic studies of both fusion proteins shows that residues 319-391 of the adenovirus type 2 fibre shaft fold into a triple beta-spiral fold indistinguishable from the native structure, although this is now resolved at a higher resolution of 1.9 A. The foldon residues 458-483 also adopt their natural structure. The intervening linkers are not well ordered in the crystal structures. This work shows that the shaft sequences retain their capacity to fold into their native beta-spiral fibrous fold when fused to a foreign C-terminal trimerisation motif. It provides a structural basis to artificially trimerise longer adenovirus shaft segments and segments from other trimeric beta-structured fibre proteins. Such artificial fibrous constructs, amenable to crystallisation and solution studies, can offer tractable model systems for the study of beta-fibrous structure. They can also prove useful for gene therapy and fibre engineering applications.

  20. A Novel N-Acetylglutamate Synthase Architecture Revealed by the Crystal Structure of the Bifunctional Enzyme from Maricaulis maris

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Dashuang; Li, Yongdong; Cabrera-Luque, Juan

    2012-05-24

    Novel bifunctional N-acetylglutamate synthase/kinases (NAGS/K) that catalyze the first two steps of arginine biosynthesis and are homologous to vertebrate N-acetylglutamate synthase (NAGS), an essential cofactor-producing enzyme in the urea cycle, were identified in Maricaulis maris and several other bacteria. Arginine is an allosteric inhibitor of NAGS but not NAGK activity. The crystal structure of M. maris NAGS/K (mmNAGS/K) at 2.7 {angstrom} resolution indicates that it is a tetramer, in contrast to the hexameric structure of Neisseria gonorrhoeae NAGS. The quaternary structure of crystalline NAGS/K from Xanthomonas campestris (xcNAGS/K) is similar, and cross-linking experiments indicate that both mmNAGS/K and xcNAGS aremore » tetramers in solution. Each subunit has an amino acid kinase (AAK) domain, which is likely responsible for N-acetylglutamate kinase (NAGK) activity and has a putative arginine binding site, and an N-acetyltransferase (NAT) domain that contains the putative NAGS active site. These structures and sequence comparisons suggest that the linker residue 291 may determine whether arginine acts as an allosteric inhibitor or activator in homologous enzymes in microorganisms and vertebrates. In addition, the angle of rotation between AAK and NAT domains varies among crystal forms and subunits within the tetramer. A rotation of 26{sup o} is sufficient to close the predicted AcCoA binding site, thus reducing enzymatic activity. Since mmNAGS/K has the highest degree of sequence homology to vertebrate NAGS of NAGS and NAGK enzymes whose structures have been determined, the mmNAGS/K structure was used to develop a structural model of human NAGS that is fully consistent with the functional effects of the 14 missense mutations that were identified in NAGS-deficient patients.« less

  1. Molecular and Structural Characterization of the Tegumental 20.6-kDa Protein in Clonorchis sinensis as a Potential Druggable Target.

    PubMed

    Kim, Yu-Jung; Yoo, Won Gi; Lee, Myoung-Ro; Kang, Jung-Mi; Na, Byoung-Kuk; Cho, Shin-Hyeong; Park, Mi-Yeoun; Ju, Jung-Won

    2017-03-04

    The tegument, representing the membrane-bound outer surface of platyhelminth parasites, plays an important role for the regulation of the host immune response and parasite survival. A comprehensive understanding of tegumental proteins can provide drug candidates for use against helminth-associated diseases, such as clonorchiasis caused by the liver fluke Clonorchis sinensis . However, little is known regarding the physicochemical properties of C. sinensis teguments. In this study, a novel 20.6-kDa tegumental protein of the C. sinensis adult worm (CsTegu20.6) was identified and characterized by molecular and in silico methods. The complete coding sequence of 525 bp was derived from cDNA clones and encodes a protein of 175 amino acids. Homology search using BLASTX showed CsTegu20.6 identity ranging from 29% to 39% with previously-known tegumental proteins in C. sinensis . Domain analysis indicated the presence of a calcium-binding EF-hand domain containing a basic helix-loop-helix structure and a dynein light chain domain exhibiting a ferredoxin fold. We used a modified method to obtain the accurate tertiary structure of the CsTegu20.6 protein because of the unavailability of appropriate templates. The CsTegu20.6 protein sequence was split into two domains based on the disordered region, and then, the structure of each domain was modeled using I-TASSER. A final full-length structure was obtained by combining two structures and refining the whole structure. A refined CsTegu20.6 structure was used to identify a potential CsTegu20.6 inhibitor based on protein structure-compound interaction analysis. The recombinant proteins were expressed in Escherichia coli and purified by nickel-nitrilotriacetic acid affinity chromatography. In C. sinensis , CsTegu20.6 mRNAs were abundant in adult and metacercariae, but not in the egg. Immunohistochemistry revealed that CsTegu20.6 localized to the surface of the tegument in the adult fluke. Collectively, our results contribute to a better understanding of the structural and functional characteristics of CsTegu20.6 and homologs of flukes. One compound is proposed as a putative inhibitor of CsTegu20.6 to facilitate further studies for anthelmintics.

  2. Characterization of a novel organic solute transporter homologue from Clonorchis sinensis

    PubMed Central

    Dai, Fuhong; Lee, Ji-Yun; Pak, Jhang Ho; Sohn, Woon-Mok

    2018-01-01

    Clonorchis sinensis is a liver fluke that can dwell in the bile ducts of mammals. Bile acid transporters function to maintain the homeostasis of bile acids in C. sinensis, as they induce physiological changes or have harmful effects on C. sinensis survival. The organic solute transporter (OST) transports mainly bile acid and belongs to the SLC51 subfamily of solute carrier transporters. OST plays a critical role in the recirculation of bile acids in higher animals. In this study, we cloned full-length cDNA of the 480-amino acid OST from C. sinensis (CsOST). Genomic analysis revealed 11 exons and nine introns. The CsOST protein had a ‘Solute_trans_a’ domain with 67% homology to Schistosoma japonicum OST. For further analysis, the CsOST protein sequence was split into the ordered domain (CsOST-N) at the N-terminus and disordered domain (CsOST-C) at the C-terminus. The tertiary structure of each domain was built using a threading-based method and determined by manual comparison. In a phylogenetic tree, the CsOST-N domain belonged to the OSTα and CsOST-C to the OSTβ clade. These two domains were more highly conserved with the OST α- and β-subunits at the structure level than at sequence level. These findings suggested that CsOST comprised the OST α- and β-subunits. CsOST was localized in the oral and ventral suckers and in the mesenchymal tissues abundant around the intestine, vitelline glands, uterus, and testes. This study provides fundamental data for the further understanding of homologues in other flukes. PMID:29702646

  3. Myocilin, a Component of a Membrane-Associated Protein Complex Driven by a Homologous Q-SNARE Domain

    PubMed Central

    Dismuke, W. Michael; McKay, Brian S.; Stamer, W. Daniel

    2012-01-01

    Myocilin is a widely expressed protein with no known function, however, mutations in myocilin appear to manifest uniquely as ocular hypertension and the blinding disease glaucoma. Using the protein homology/analogy recognition engine (PHYRE) we find that the olfactomedin domain of myocilin is similar in sequence motif and structure to a six-bladed, kelch repeat motif based on the known crystal structures of such proteins. Additionally, using sequence analysis we identify a coiled-coil segment of myocilin with homology to human Q-SNARE proteins. Using COS-7 cells expressing full length human myocilin and a version lacking the C-terminal olfactomedin domain, we identified a membrane-associated protein complex containing myocilin by hydrodynamic analysis. The myocilin construct that included the coiled-coil but lacked the olfactomedin domain formed complexes similar to the full-length protein, indicating that the coiled-coil domain of myocilin is sufficient for myocilin to bind to the large detergent resistant complex. In human retina and retinal pigment epithelium, which express myocilin, we detected the protein in a large, SDS-resistant, membrane-associated complex. We characterized the hydrodynamic properties of myocilin in human tissues as either a 15s complex with an Mr=405,000–440,000 yielding a slightly elongated globular shape similar to known SNARE complexes or a dimer of 6.4s and Mr=108,000. By identifying the Q-SNARE homology within the second coil of myocilin and documenting its participation in a SNARE-like complex, we provide evidence of a SNARE domain containing protein associated with a human disease. PMID:22463803

  4. Protein Structure and Function Prediction Using I-TASSER

    PubMed Central

    Yang, Jianyi; Zhang, Yang

    2016-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386

  5. ATP interacts with the CPVT mutation-associated central domain of the cardiac ryanodine receptor.

    PubMed

    Blayney, Lynda; Beck, Konrad; MacDonald, Ewan; D'Cruz, Leon; Nomikos, Michail; Griffiths, Julia; Thanassoulas, Angelos; Nounesis, George; Lai, F Anthony

    2013-10-01

    This study was designed to determine whether the cardiac ryanodine receptor (RyR2) central domain, a region associated with catecholamine polymorphic ventricular tachycardia (CPVT) mutations, interacts with the RyR2 regulators, ATP and the FK506-binding protein 12.6 (FKBP12.6). Wild-type (WT) RyR2 central domain constructs (G(2236)to G(2491)) and those containing the CPVT mutations P2328S and N2386I, were expressed as recombinant proteins. Folding and stability of the proteins were examined by circular dichroism (CD) spectroscopy and guanidine hydrochloride chemical denaturation. The far-UV CD spectra showed a soluble stably-folded protein with WT and mutant proteins exhibiting a similar secondary structure. Chemical denaturation analysis also confirmed a stable protein for both WT and mutant constructs with similar two-state unfolding. ATP and caffeine binding was measured by fluorescence spectroscopy. Both ATP and caffeine bound with an EC50 of ~200-400μM, and the affinity was the same for WT and mutant constructs. Sequence alignment with other ATP binding proteins indicated the RyR2 central domain contains the signature of an ATP binding pocket. Interaction of the central domain with FKBP12.6 was tested by glutaraldehyde cross-linking and no association was found. The RyR2 central domain, expressed as a 'correctly' folded recombinant protein, bound ATP in accord with bioinformatics evidence of conserved ATP binding sequence motifs. An interaction with FKBP12.6 was not evident. CPVT mutations did not disrupt the secondary structure nor binding to ATP. Part of the RyR2 central domain CPVT mutation cluster, can be expressed independently with retention of ATP binding. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Calvo, Eric; Mans, Ben J.; Ribeiro, José M.C.

    The mosquito D7 salivary proteins are encoded by a multigene family related to the arthropod odorant-binding protein (OBP) superfamily. Forms having either one or two OBP domains are found in mosquito saliva. Four single-domain and one two-domain D7 proteins from Anopheles gambiae and Aedes aegypti (AeD7), respectively, were shown to bind biogenic amines with high affinity and with a stoichiometry of one ligand per protein molecule. Sequence comparisons indicated that only the C-terminal domain of AeD7 is homologous to the single-domain proteins from A. gambiae, suggesting that the N-terminal domain may bind a different class of ligands. Here, we describemore » the 3D structure of AeD7 and examine the ligand-binding characteristics of the N- and C-terminal domains. Isothermal titration calorimetry and ligand complex crystal structures show that the N-terminal domain binds cysteinyl leukotrienes (cysLTs) with high affinities (50-60 nM) whereas the C-terminal domain binds biogenic amines. The lipid chain of the cysLT binds in a hydrophobic pocket of the N-terminal domain, whereas binding of norepinephrine leads to an ordering of the C-terminal portion of the C-terminal domain into an alpha-helix that, along with rotations of Arg-176 and Glu-268 side chains, acts to bury the bound ligand.« less

  7. Atomic structure of the nuclear pore complex targeting domain of a Nup116 homologue from the yeast, Candida glabrata

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sampathkumar, Parthasarathy; Kim, Seung Joong; Manglicmot, Danalyn

    2012-10-23

    The nuclear pore complex (NPC), embedded in the nuclear envelope, is a large, dynamic molecular assembly that facilitates exchange of macromolecules between the nucleus and the cytoplasm. The yeast NPC is an eightfold symmetric annular structure composed of {approx}456 polypeptide chains contributed by {approx}30 distinct proteins termed nucleoporins. Nup116, identified only in fungi, plays a central role in both protein import and mRNA export through the NPC. Nup116 is a modular protein with N-terminal 'FG' repeats containing a Gle2p-binding sequence motif and a NPC targeting domain at its C-terminus. We report the crystal structure of the NPC targeting domain ofmore » Candida glabrata Nup116, consisting of residues 882-1034 [CgNup116(882-1034)], at 1.94 {angstrom} resolution. The X-ray structure of CgNup116(882-1034) is consistent with the molecular envelope determined in solution by small-angle X-ray scattering. Structural similarities of CgNup116(882-1034) with homologous domains from Saccharomyces cerevisiae Nup116, S. cerevisiae Nup145N, and human Nup98 are discussed.« less

  8. A novel signal transduction protein: Combination of solute binding and tandem PAS-like sensor domains in one polypeptide chain: Periplasmic Ligand Binding Protein Dret_0059

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, R.; Wilton, R.; Cuff, M. E.

    We report the structural and biochemical characterization of a novel periplasmic ligand-binding protein, Dret_0059, from Desulfohalobium retbaense DSM 5692, an organism isolated from the Salt Lake Retba in Senegal. The structure of the protein consists of a unique combination of a periplasmic solute binding protein (SBP) domain at the N-terminal and a tandem PAS-like sensor domain at the C-terminal region. SBP domains are found ubiquitously and their best known function is in solute transport across membranes. PAS-like sensor domains are commonly found in signal transduction proteins. These domains are widely observed as parts of many protein architectures and complexes butmore » have not been observed previously within the same polypeptide chain. In the structure of Dret_0059, a ketoleucine moiety is bound to the SBP, whereas a cytosine molecule is bound in the distal PAS-like domain of the tandem PAS-like domain. Differential scanning flourimetry support the binding of ligands observed in the crystal structure. There is significant interaction between the SBP and tandem PAS-like domains, and it is possible that the binding of one ligand could have an effect on the binding of the other. We uncovered three other proteins with this structural architecture in the non-redundant sequence data base, and predict that they too bind the same substrates. The genomic context of this protein did not offer any clues for its function. We did not find any biological process in which the two observed ligands are coupled. The protein Dret_0059 could be involved in either signal transduction or solute transport.« less

  9. A novel signal transduction protein: Combination of solute binding and tandem PAS-like sensor domains in one polypeptide chain.

    PubMed

    Wu, R; Wilton, R; Cuff, M E; Endres, M; Babnigg, G; Edirisinghe, J N; Henry, C S; Joachimiak, A; Schiffer, M; Pokkuluri, P R

    2017-04-01

    We report the structural and biochemical characterization of a novel periplasmic ligand-binding protein, Dret_0059, from Desulfohalobium retbaense DSM 5692, an organism isolated from Lake Retba, in Senegal. The structure of the protein consists of a unique combination of a periplasmic solute binding protein (SBP) domain at the N-terminal and a tandem PAS-like sensor domain at the C-terminal region. SBP domains are found ubiquitously, and their best known function is in solute transport across membranes. PAS-like sensor domains are commonly found in signal transduction proteins. These domains are widely observed as parts of many protein architectures and complexes but have not been observed previously within the same polypeptide chain. In the structure of Dret_0059, a ketoleucine moiety is bound to the SBP, whereas a cytosine molecule is bound in the distal PAS-like domain of the tandem PAS-like domain. Differential scanning flourimetry support the binding of ligands observed in the crystal structure. There is significant interaction between the SBP and tandem PAS-like domains, and it is possible that the binding of one ligand could have an effect on the binding of the other. We uncovered three other proteins with this structural architecture in the non-redundant sequence data base, and predict that they too bind the same substrates. The genomic context of this protein did not offer any clues for its function. We did not find any biological process in which the two observed ligands are coupled. The protein Dret_0059 could be involved in either signal transduction or solute transport. © 2017 The Protein Society.

  10. Structural insight into the mechanism of synergistic autoinhibition of SAD kinases

    PubMed Central

    Wu, Jing-Xiang; Cheng, Yun-Sheng; Wang, Jue; Chen, Lei; Ding, Mei; Wu, Jia-Wei

    2015-01-01

    The SAD/BRSK kinases participate in various important life processes, including neural development, cell cycle and energy metabolism. Like other members of the AMPK family, SAD contains an N-terminal kinase domain followed by the characteristic UBA and KA1 domains. Here we identify a unique autoinhibitory sequence (AIS) in SAD kinases, which exerts autoregulation in cooperation with UBA. Structural studies of mouse SAD-A revealed that UBA binds to the kinase domain in a distinct mode and, more importantly, AIS nestles specifically into the KD-UBA junction. The cooperative action of AIS and UBA results in an ‘αC-out' inactive kinase, which is conserved across species and essential for presynaptic vesicle clustering in C. elegans. In addition, the AIS, along with the KA1 domain, is indispensable for phospholipid binding. Taken together, these data suggest a model for synergistic autoinhibition and membrane activation of SAD kinases. PMID:26626945

  11. Structural insight into the mechanism of synergistic autoinhibition of SAD kinases.

    PubMed

    Wu, Jing-Xiang; Cheng, Yun-Sheng; Wang, Jue; Chen, Lei; Ding, Mei; Wu, Jia-Wei

    2015-12-02

    The SAD/BRSK kinases participate in various important life processes, including neural development, cell cycle and energy metabolism. Like other members of the AMPK family, SAD contains an N-terminal kinase domain followed by the characteristic UBA and KA1 domains. Here we identify a unique autoinhibitory sequence (AIS) in SAD kinases, which exerts autoregulation in cooperation with UBA. Structural studies of mouse SAD-A revealed that UBA binds to the kinase domain in a distinct mode and, more importantly, AIS nestles specifically into the KD-UBA junction. The cooperative action of AIS and UBA results in an 'αC-out' inactive kinase, which is conserved across species and essential for presynaptic vesicle clustering in C. elegans. In addition, the AIS, along with the KA1 domain, is indispensable for phospholipid binding. Taken together, these data suggest a model for synergistic autoinhibition and membrane activation of SAD kinases.

  12. Review the role of terminal domains during storage and assembly of spider silk proteins.

    PubMed

    Eisoldt, Lukas; Thamm, Christopher; Scheibel, Thomas

    2012-06-01

    Fibrous proteins in nature fulfill a wide variety of functions in different structures ranging from cellular scaffolds to very resilient structures like tendons and even extra-corporal fibers such as silks in spider webs or silkworm cocoons. Despite their different origins and sequence varieties many of these fibrous proteins share a common building principle: they consist of a large repetitive core domain flanked by relatively small non-repetitive terminal domains. Amongst protein fibers, spider dragline silk shows prominent mechanical properties that exceed those of man-made fibers like Kevlar. Spider silk fibers assemble in a spinning process allowing the transformation from an aqueous solution into a solid fiber within milliseconds. Here, we highlight the role of the non-repetitive terminal domains of spider dragline silk proteins during storage in the gland and initiation of the fiber assembly process. Copyright © 2011 Wiley Periodicals, Inc.

  13. Identification and characterization of a novel zebrafish (Danio rerio) pentraxin-carbonic anhydrase.

    PubMed

    Patrikainen, Maarit S; Tolvanen, Martti E E; Aspatwar, Ashok; Barker, Harlan R; Ortutay, Csaba; Jänis, Janne; Laitaoja, Mikko; Hytönen, Vesa P; Azizi, Latifeh; Manandhar, Prajwol; Jáger, Edit; Vullo, Daniela; Kukkurainen, Sampo; Hilvo, Mika; Supuran, Claudiu T; Parkkila, Seppo

    2017-01-01

    Carbonic anhydrases (CAs) are ubiquitous, essential enzymes which catalyze the conversion of carbon dioxide and water to bicarbonate and H + ions. Vertebrate genomes generally contain gene loci for 15-21 different CA isoforms, three of which are enzymatically inactive. CA VI is the only secretory protein of the enzymatically active isoforms. We discovered that non-mammalian CA VI contains a C-terminal pentraxin (PTX) domain, a novel combination for both CAs and PTXs. We isolated and sequenced zebrafish ( Danio rerio ) CA VI cDNA, complete with the sequence coding for the PTX domain, and produced the recombinant CA VI-PTX protein. Enzymatic activity and kinetic parameters were measured with a stopped-flow instrument. Mass spectrometry, analytical gel filtration and dynamic light scattering were used for biophysical characterization. Sequence analyses and Bayesian phylogenetics were used in generating hypotheses of protein structure and CA VI gene evolution. A CA VI-PTX antiserum was produced, and the expression of CA VI protein was studied by immunohistochemistry. A knock-down zebrafish model was constructed, and larvae were observed up to five days post-fertilization (dpf). The expression of ca6 mRNA was quantitated by qRT-PCR in different developmental times in morphant and wild-type larvae and in different adult fish tissues. Finally, the swimming behavior of the morphant fish was compared to that of wild-type fish. The recombinant enzyme has a very high carbonate dehydratase activity. Sequencing confirms a 530-residue protein identical to one of the predicted proteins in the Ensembl database (ensembl.org). The protein is pentameric in solution, as studied by gel filtration and light scattering, presumably joined by the PTX domains. Mass spectrometry confirms the predicted signal peptide cleavage and disulfides, and N-glycosylation in two of the four observed glycosylation motifs. Molecular modeling of the pentamer is consistent with the modifications observed in mass spectrometry. Phylogenetics and sequence analyses provide a consistent hypothesis of the evolutionary history of domains associated with CA VI in mammals and non-mammals. Briefly, the evidence suggests that ancestral CA VI was a transmembrane protein, the exon coding for the cytoplasmic domain was replaced by one coding for PTX domain, and finally, in the therian lineage, the PTX-coding exon was lost. We knocked down CA VI expression in zebrafish embryos with antisense morpholino oligonucleotides, resulting in phenotype features of decreased buoyancy and swim bladder deflation in 4 dpf larvae. These findings provide novel insights into the evolution, structure, and function of this unique CA form.

  14. Structure of glycosylated NPC1 luminal domain C reveals insights into NPC2 and Ebola virus interactions.

    PubMed

    Zhao, Yuguang; Ren, Jingshan; Harlos, Karl; Stuart, David I

    2016-03-01

    Niemann-pick type C1 (NPC1) is an endo/lysosomal membrane protein involved in intracellular cholesterol trafficking, and its luminal domain C is an essential endosomal receptor for Ebola and Marburg viruses. We have determined the crystal structure of glycosylated NPC1 luminal domain C and find all seven possible sites are glycosylated. Mapping the disease mutations onto the glycosylated structure reveals a potential binding face for NPC2. Knowledge-based docking of NPC1 onto Ebola viral glycoprotein and sequence analysis of filovirus susceptible and refractory species reveals four critical residues, H418, Q421, F502 and F504, some or all of which are likely responsible for the species-specific susceptibility to the virus infection. © 2016 The Authors. FEBS Letters published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.

  15. A corkscrew model for dynamin constriction

    PubMed Central

    Mears, Jason A.; Ray, Pampa; Hinshaw, Jenny E.

    2007-01-01

    SUMMARY Numerous vesiculation processes throughout the eukaryotic cell are dependant on the protein dynamin, a large GTPase that constricts lipid bilayers. We have combined x-ray crystallography and cryo-electron microscopy (cryo-EM) data to generate a coherent model of dynamin-mediated membrane constriction. X-ray structures of mammalian GTPase and pleckstrin homology (PH) domains of dynamin were fit to cryo-EM structures of human ΔPRD dynamin helices bound to lipid in non-constricted and constricted states. Proteolysis and immunogold labeling experiments confirm the topology of dynamin domains predicted from the helical arrays. Based on the fitting, an observed twisting motion of the GTPase, middle and GTPase-effector domains coincides with conformational changes determined by cryo-EM. We propose a corkscrew model for dynamin constriction based on these motions and predict regions of sequence important for dynamin function as potential targets for future mutagenic and structural studies. PMID:17937909

  16. High throughput profile-profile based fold recognition for the entire human proteome.

    PubMed

    McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2006-06-07

    In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.

  17. Structural features of diverse Pin-II proteinase inhibitor genes from Capsicum annuum.

    PubMed

    Mahajan, Neha S; Dewangan, Veena; Lomate, Purushottam R; Joshi, Rakesh S; Mishra, Manasi; Gupta, Vidya S; Giri, Ashok P

    2015-02-01

    The proteinase inhibitor (PI) genes from Capsicum annuum were characterized with respect to their UTR, introns and promoter elements. The occurrence of PIs with circularly permuted domain organization was evident. Several potato inhibitor II (Pin-II) type proteinase inhibitor (PI) genes have been analyzed from Capsicum annuum (L.) with respect to their differential expression during plant defense response. However, complete gene characterization of any of these C. annuum PIs (CanPIs) has not been carried out so far. Complete gene architectures of a previously identified CanPI-7 (Beads-on-string, Type A) and a member of newly isolated Bracelet type B, CanPI-69 are reported in this study. The 5' UTR (untranslated region), 3'UTR, and intronic sequences of both the CanPI genes were obtained. The genomic sequence of CanPI-7 exhibited, exon 1 (49 base pair, bp) and exon 2 (740 bp) interrupted by a 294-bp long type I intron. We noted the occurrence of three multi-domain PIs (CanPI-69, 70, 71) with circularly permuted domain organization. CanPI-69 was found to possess exon 1 (49 bp), exon 2 (551 bp) and a 584-bp long type I intron. The upstream sequence analysis of CanPI-7 and CanPI-69 predicted various transcription factor-binding sites including TATA and CAAT boxes, hormone-responsive elements (ABRELATERD1, DOFCOREZM, ERELEE4), and a defense-responsive element (WRKY71OS). Binding of transcription factors such as zinc finger motif MADS-box and MYB to the promoter regions was confirmed using electrophoretic mobility shift assay followed by mass spectrometric identification. The 3' UTR analysis for 25 CanPI genes revealed unique/distinct 3' UTR sequence for each gene. Structures of three domain CanPIs of type A and B were predicted and further analyzed for their attributes. This investigation of CanPI gene architecture will enable the better understanding of the genetic elements present in CanPIs.

  18. S46 Peptidases are the First Exopeptidases to be Members of Clan PA

    PubMed Central

    Sakamoto, Yasumitsu; Suzuki, Yoshiyuki; Iizuka, Ippei; Tateoka, Chika; Roppongi, Saori; Fujimoto, Mayu; Inaka, Koji; Tanaka, Hiroaki; Masaki, Mika; Ohta, Kazunori; Okada, Hirofumi; Nonaka, Takamasa; Morikawa, Yasushi; Nakamura, Kazuo T.; Ogasawara, Wataru; Tanaka, Nobutada

    2014-01-01

    The dipeptidyl aminopeptidase BII (DAP BII) belongs to a serine peptidase family, S46. The amino acid sequence of the catalytic unit of DAP BII exhibits significant similarity to those of clan PA endopeptidases, such as chymotrypsin. However, the molecular mechanism of the exopeptidase activity of family S46 peptidase is unknown. Here, we report crystal structures of DAP BII. DAP BII contains a peptidase domain including a typical double β-barrel fold and previously unreported α-helical domain. The structures of peptide complexes revealed that the α-helical domain covers the active-site cleft and the side chain of Asn330 in the domain forms hydrogen bonds with the N-terminus of the bound peptide. These observations indicate that the α-helical domain regulates the exopeptidase activity of DAP BII. Because S46 peptidases are not found in mammals, we expect that our study will be useful for the design of specific inhibitors of S46 peptidases from pathogens. PMID:24827749

  19. Targeted induction of meiotic double-strand breaks reveals chromosomal domain-dependent regulation of Spo11 and interactions among potential sites of meiotic recombination

    PubMed Central

    Fukuda, Tomoyuki; Kugou, Kazuto; Sasanuma, Hiroyuki; Shibata, Takehiko

    2008-01-01

    Meiotic recombination is initiated by programmed DNA double-strand break (DSB) formation mediated by Spo11. DSBs occur with frequency in chromosomal regions called hot domains but are seldom seen in cold domains. To obtain insights into the determinants of the distribution of meiotic DSBs, we examined the effects of inducing targeted DSBs during yeast meiosis using a UAS-directed form of Spo11 (Gal4BD-Spo11) and a meiosis-specific endonuclease, VDE (PI-SceI). Gal4BD-Spo11 cleaved its target sequence (UAS) integrated in hot domains but rarely in cold domains. However, Gal4BD-Spo11 did bind to UAS and VDE efficiently cleaved its recognition sequence in either context, suggesting that a cold domain is not a region of inaccessible or uncleavable chromosome structure. Importantly, self-association of Spo11 occurred at UAS in a hot domain but not in a cold domain, raising the possibility that Spo11 remains in an inactive intermediate state in cold domains. Integration of UAS adjacent to known DSB hotspots allowed us to detect competitive interactions among hotspots for activation. Moreover, the presence of VDE-introduced DSB repressed proximal hotspot activity, implicating DSBs themselves in interactions among hotspots. Thus, potential sites for Spo11-mediated DSB are subject to domain-specific and local competitive regulations during and after DSB formation. PMID:18096626

  20. Structure of bacteriophage T4 fibritin: a segmented coiled coil and the role of the C-terminal domain.

    PubMed

    Tao, Y; Strelkov, S V; Mesyanzhinov, V V; Rossmann, M G

    1997-06-15

    Oligomeric coiled-coil motifs are found in numerous protein structures; among them is fibritin, a structural protein of bacteriophage T4, which belongs to a class of chaperones that catalyze a specific phage-assembly process. Fibritin promotes the assembly of the long tail fibers and their subsequent attachment to the tail baseplate; it is also a sensing device that controls the retraction of the long tail fibers in adverse environments and, thus, prevents infection. The structure of fibritin had been predicted from sequence and biochemical analyses to be mainly a triple-helical coiled coil. The determination of its structure at atomic resolution was expected to give insights into the assembly process and biological function of fibritin, and the properties of modified coiled-coil structures in general. The three-dimensional structure of fibritin E, a deletion mutant of wild-type fibritin, was determined to 2.2 A resolution by X-ray crystallography. Three identical subunits of 119 amino acid residues form a trimeric parallel coiled-coil domain and a small globular C-terminal domain about a crystallographic threefold axis. The coiled-coil domain is divided into three segments that are separated by insertion loops. The C-terminal domain, which consists of 30 residues from each subunit, contains a beta-propeller-like structure with a hydrophobic interior. The residues within the C-terminal domain make extensive hydrophobic and some polar intersubunit interactions. This is consistent with the C-terminal domain being important for the correct assembly of fibritin, as shown earlier by mutational studies. Tight interactions between the C-terminal residues of adjacent subunits counteract the latent instability that is suggested by the structural properties of the coiled-coil segments. Trimerization is likely to begin with the formation of the C-terminal domain which subsequently initiates the assembly of the coiled coil. The interplay between the stabilizing effect of the C-terminal domain and the labile coiled-coil domain may be essential for the fibritin function and for the correct functioning of many other alpha-fibrous proteins.

Top