3D RNA and functional interactions from evolutionary couplings
Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.
2016-01-01
Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444
Predicting PDZ domain mediated protein interactions from structure
2013-01-01
Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252
Sequence co-evolution gives 3D contacts and structures of protein complexes
Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S
2014-01-01
Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213
NASA Astrophysics Data System (ADS)
Benito, S.; Ferrer, A.; Benabou, S.; Aviñó, A.; Eritja, R.; Gargallo, R.
2018-05-01
Guanine-rich sequences may fold into highly ordered structures known as G-quadruplexes. Apart from the monomeric G-quadruplex, these sequences may form multimeric structures that are not usually considered when studying interaction with ligands. This work studies the interaction of a ligand, crystal violet, with three guanine-rich DNA sequences with the capacity to form multimeric structures. These sequences correspond to short stretches found near the promoter regions of c-kit and SMARCA4 genes. Instrumental techniques (circular dichroism, molecular fluorescence, size-exclusion chromatography and electrospray ionization mass spectrometry) and multivariate data analysis were used for this purpose. The polymorphism of G-quadruplexes was characterized prior to the interaction studies. The ligand was shown to interact preferentially with the monomeric G-quadruplex; the binding stoichiometry was 1:1 and the binding constant was in the order of 105 M-1 for all three sequences. The results highlight the importance of DNA treatment prior to interaction studies.
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Insights into the fold organization of TIM barrel from interaction energy based structure networks.
Vijayabaskar, M S; Vishveshwara, Saraswathi
2012-01-01
There are many well-known examples of proteins with low sequence similarity, adopting the same structural fold. This aspect of sequence-structure relationship has been extensively studied both experimentally and theoretically, however with limited success. Most of the studies consider remote homology or "sequence conservation" as the basis for their understanding. Recently "interaction energy" based network formalism (Protein Energy Networks (PENs)) was developed to understand the determinants of protein structures. In this paper we have used these PENs to investigate the common non-covalent interactions and their collective features which stabilize the TIM barrel fold. We have also developed a method of aligning PENs in order to understand the spatial conservation of interactions in the fold. We have identified key common interactions responsible for the conservation of the TIM fold, despite high sequence dissimilarity. For instance, the central beta barrel of the TIM fold is stabilized by long-range high energy electrostatic interactions and low-energy contiguous vdW interactions in certain families. The other interfaces like the helix-sheet or the helix-helix seem to be devoid of any high energy conserved interactions. Conserved interactions in the loop regions around the catalytic site of the TIM fold have also been identified, pointing out their significance in both structural and functional evolution. Based on these investigations, we have developed a novel network based phylogenetic analysis for remote homologues, which can perform better than sequence based phylogeny. Such an analysis is more meaningful from both structural and functional evolutionary perspective. We believe that the information obtained through the "interaction conservation" viewpoint and the subsequently developed method of structure network alignment, can shed new light in the fields of fold organization and de novo computational protein design.
Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori
2013-01-01
The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl+) and the polarized first hydration shell waters of divalent cations (Mg2+, Ca2+) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves. PMID:23940752
Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori
2013-01-01
The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl⁺) and the polarized first hydration shell waters of divalent cations (Mg²⁺, Ca²⁺) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves.
MPID-T2: a database for sequence-structure-function analyses of pMHC and TR/pMHC structures.
Khan, Javed Mohammed; Cheruku, Harish Reddy; Tong, Joo Chuan; Ranganathan, Shoba
2011-04-15
Sequence-structure-function information is critical in understanding the mechanism of pMHC and TR/pMHC binding and recognition. A database for sequence-structure-function information on pMHC and TR/pMHC interactions, MHC-Peptide Interaction Database-TR version 2 (MPID-T2), is now available augmented with the latest PDB and IMGT/3Dstructure-DB data, advanced features and new parameters for the analysis of pMHC and TR/pMHC structures. http://biolinfo.org/mpid-t2. shoba.ranganathan@mq.edu.au Supplementary data are available at Bioinformatics online.
Designing pH induced fold switch in proteins
NASA Astrophysics Data System (ADS)
Baruah, Anupaul; Biswas, Parbati
2015-05-01
This work investigates the computational design of a pH induced protein fold switch based on a self-consistent mean-field approach by identifying the ensemble averaged characteristics of sequences that encode a fold switch. The primary challenge to balance the alternative sets of interactions present in both target structures is overcome by simultaneously optimizing two foldability criteria corresponding to two target structures. The change in pH is modeled by altering the residual charge on the amino acids. The energy landscape of the fold switch protein is found to be double funneled. The fold switch sequences stabilize the interactions of the sites with similar relative surface accessibility in both target structures. Fold switch sequences have low sequence complexity and hence lower sequence entropy. The pH induced fold switch is mediated by attractive electrostatic interactions rather than hydrophobic-hydrophobic contacts. This study may provide valuable insights to the design of fold switch proteins.
RNA Tertiary Interactions in a Riboswitch Stabilize the Structure of a Kink Turn
Schroeder, Kersten T.; Daldrop, Peter; Lilley, David M.J.
2011-01-01
Summary The kink turn is a widespread RNA motif that introduces an acute kink into the axis of duplex RNA, typically comprising a bulge followed by a G⋅A and A⋅G pairs. The kinked conformation is stabilized by metal ions, or the binding of proteins including L7Ae. We now demonstrate a third mechanism for the stabilization of k-turn structure, involving tertiary interactions within a larger RNA structure. The SAM-I riboswitch contains an essential standard k-turn sequence that kinks a helix so that its terminal loop can make a long-range interaction. We find that some sequence variations in the k-turn within the riboswitch do not prevent SAM binding, despite preventing the folding of the k-turn in isolation. Furthermore, two crystal structures show that the sequence-variant k-turns are conventionally folded within the riboswitch. This study shows that the folded structure of the k-turn can be stabilized by tertiary interactions within a larger RNA structure. PMID:21893284
Heinke, Florian; Bittrich, Sebastian; Kaiser, Florian; Labudde, Dirk
2016-01-01
To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial phase of structure-based studies. In this respect, the web server can be a valuable tool, as users are allowed to process multiple structures, quickly switch between results, and interact with generated visualizations in an intuitive manner. The SequenceCEROSENE web server is available at https://biosciences.hs-mittweida.de/seqcerosene.
Smith, Colin A; Kortemme, Tanja
2011-01-01
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
The evolution processes of DNA sequences, languages and carols
NASA Astrophysics Data System (ADS)
Hauck, Jürgen; Henkel, Dorothea; Mika, Klaus
2001-04-01
The sequences of bases A, T, C and G of about 100 enolase, secA and cytochrome DNA were analyzed for attractive or repulsive interactions by the numbers T 1,T 2,T 3; r of nearest, next-nearest and third neighbor bases of the same kind and the concentration r=other bases/analyzed base. The area of possible T1, T2 values is limited by the linear borders T 2=2T 1-2, T 2=0 or T1=0 for clustering, attractive or repulsive interactions and the border T2=-2 T1+2(2- r) for a variation from repulsive to attractive interactions at r⩽2. Clustering is preferred by most bases in sequences of enolases and secA’ s. Major deviations with repulsive interactions of some bases are observed for archaea bacteria in secA and for highly developed animals and the human species in enolase sequences. The borders of the structure map for enthalpy stabilized structures with maximum interactions are approached in few cases. Most letters of the natural languages and some music notes are at the borders of the structure map.
Exploration of the relationship between topology and designability of conformations
NASA Astrophysics Data System (ADS)
Leelananda, Sumudu P.; Towfic, Fadi; Jernigan, Robert L.; Kloczkowski, Andrzej
2011-06-01
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.
High-Throughput, Data-Rich Cellular RNA Device Engineering
Townshend, Brent; Kennedy, Andrew B.; Xiang, Joy S.; Smolke, Christina D.
2015-01-01
Methods for rapidly assessing sequence-structure-function landscapes and developing conditional gene-regulatory devices are critical to our ability to manipulate and interface with biology. We describe a framework for engineering RNA devices from preexisting aptamers that exhibit ligand-responsive ribozyme tertiary interactions. Our methodology utilizes cell sorting, high-throughput sequencing, and statistical data analyses to enable parallel measurements of the activities of hundreds of thousands of sequences from RNA device libraries in the absence and presence of ligands. Our tertiary interaction RNA devices exhibit improved performance in terms of gene silencing, activation ratio, and ligand sensitivity as compared to optimized RNA devices that rely on secondary structure changes. We apply our method to building biosensors for diverse ligands and determine consensus sequences that enable ligand-responsive tertiary interactions. These methods advance our ability to develop broadly applicable genetic tools and to elucidate understanding of the underlying sequence-structure-function relationships that empower rational design of complex biomolecules. PMID:26258292
Freiburg RNA tools: a central online resource for RNA-focused research and teaching.
Raden, Martin; Ali, Syed M; Alkhnbashi, Omer S; Busch, Anke; Costa, Fabrizio; Davis, Jason A; Eggenhofer, Florian; Gelhausen, Rick; Georg, Jens; Heyne, Steffen; Hiller, Michael; Kundu, Kousik; Kleinkauf, Robert; Lott, Steffen C; Mohamed, Mostafa M; Mattheis, Alexander; Miladi, Milad; Richter, Andreas S; Will, Sebastian; Wolff, Joachim; Wright, Patrick R; Backofen, Rolf
2018-05-21
The Freiburg RNA tools webserver is a well established online resource for RNA-focused research. It provides a unified user interface and comprehensive result visualization for efficient command line tools. The webserver includes RNA-RNA interaction prediction (IntaRNA, CopraRNA, metaMIR), sRNA homology search (GLASSgo), sequence-structure alignments (LocARNA, MARNA, CARNA, ExpaRNA), CRISPR repeat classification (CRISPRmap), sequence design (antaRNA, INFO-RNA, SECISDesign), structure aberration evaluation of point mutations (RaSE), and RNA/protein-family models visualization (CMV), and other methods. Open education resources offer interactive visualizations of RNA structure and RNA-RNA interaction prediction as well as basic and advanced sequence alignment algorithms. The services are freely available at http://rna.informatik.uni-freiburg.de.
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Sequence and Structure Dependent DNA-DNA Interactions
NASA Astrophysics Data System (ADS)
Kopchick, Benjamin; Qiu, Xiangyun
Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.
Unified Deep Learning Architecture for Modeling Biology Sequence.
Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang
2017-10-09
Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.
Mahajan, Gaurang; Mande, Shekhar C
2017-04-04
A comprehensive map of the human-M. tuberculosis (MTB) protein interactome would help fill the gaps in our understanding of the disease, and computational prediction can aid and complement experimental studies towards this end. Several sequence-based in silico approaches tap the existing data on experimentally validated protein-protein interactions (PPIs); these PPIs serve as templates from which novel interactions between pathogen and host are inferred. Such comparative approaches typically make use of local sequence alignment, which, in the absence of structural details about the interfaces mediating the template interactions, could lead to incorrect inferences, particularly when multi-domain proteins are involved. We propose leveraging the domain-domain interaction (DDI) information in PDB complexes to score and prioritize candidate PPIs between host and pathogen proteomes based on targeted sequence-level comparisons. Our method picks out a small set of human-MTB protein pairs as candidates for physical interactions, and the use of functional meta-data suggests that some of them could contribute to the in vivo molecular cross-talk between pathogen and host that regulates the course of the infection. Further, we present numerical data for Pfam domain families that highlights interaction specificity on the domain level. Not every instance of a pair of domains, for which interaction evidence has been found in a few instances (i.e. structures), is likely to functionally interact. Our sorting approach scores candidates according to how "distant" they are in sequence space from known examples of DDIs (templates). Thus, it provides a natural way to deal with the heterogeneity in domain-level interactions. Our method represents a more informed application of local alignment to the sequence-based search for potential human-microbial interactions that uses available PPI data as a prior. Our approach is somewhat limited in its sensitivity by the restricted size and diversity of the template dataset, but, given the rapid accumulation of solved protein complex structures, its scope and utility are expected to keep steadily improving.
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-02-28
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-01-01
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
Grate, Jay W.; Mo, Kai -For; Daily, Michael D.
2016-02-10
Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone–backbone interactions, including H-bonding motifs and pi–pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. In conclusion, the synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone–backbone hydrogen-bonding motifs, and willmore » thus enable new macromolecules and materials with useful functions.« less
Grate, Jay W; Mo, Kai-For; Daily, Michael D
2016-03-14
Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grate, Jay W.; Mo, Kai -For; Daily, Michael D.
Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone–backbone interactions, including H-bonding motifs and pi–pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. In conclusion, the synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone–backbone hydrogen-bonding motifs, and willmore » thus enable new macromolecules and materials with useful functions.« less
Computer constructed imagery of distant plasma interaction boundaries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grenstadt, E.W.; Schurr, H.D.; Tsugawa, R.K.
1982-01-01
Computer constructed sketches of plasma boundaries arising from the interaction between the solar wind and the magnetosphere can serve as both didactic and research tools. In particular, the structure of the earth's bow shock can be represented as a nonuniform surfce according to the instantaneous orientation of the IMF, and temporal changes in structural distribution can be modeled as a sequence of sketches based on observed sequences of spacecraft-based measurements. Viewed rapidly, such a sequence of sketches can be the basis for representation of plasma processes by computer animation.
Rizvi, Tahir A; Kenyon, Julia C; Ali, Jahabar; Aktar, Suriya J; Phillip, Pretty S; Ghazawi, Akela; Mustafa, Farah; Lever, Andrew M L
2010-10-15
The feline immunodeficiency virus (FIV) is a lentivirus that is related to human immunodeficiency virus (HIV), causing a similar pathology in cats. It is a potential small animal model for AIDS and the FIV-based vectors are also being pursued for human gene therapy. Previous studies have mapped the FIV packaging signal (ψ) to two or more discontinuous regions within the 5' 511 nt of the genomic RNA and structural analyses have determined its secondary structure. The 5' and 3' sequences within ψ region interact through extensive long-range interactions (LRIs), including a conserved heptanucleotide interaction between R/U5 and gag. Other secondary structural elements identified include a conserved 150 nt stem-loop (SL2) and a small palindromic stem-loop within gag open reading frame that might act as a viral dimerization initiation site. We have performed extensive mutational analysis of these sequences and structures and ascertained their importance in FIV packaging using a trans-complementation assay. Disrupting the conserved heptanucleotide LRI to prevent base pairing between R/U5 and gag reduced packaging by 2.8-5.5 fold. Restoration of pairing using an alternative, non-wild type (wt) LRI sequence restored RNA packaging and propagation to wt levels, suggesting that it is the structure of the LRI, rather than its sequence, that is important for FIV packaging. Disrupting the palindrome within gag reduced packaging by 1.5-3-fold, but substitution with a different palindromic sequence did not restore packaging completely, suggesting that the sequence of this region as well as its palindromic nature is important. Mutation of individual regions of SL2 did not have a pronounced effect on FIV packaging, suggesting that either it is the structure of SL2 as a whole that is necessary for optimal packaging, or that there is redundancy within this structure. The mutational analysis presented here has further validated the previously predicted RNA secondary structure of FIV ψ. Copyright © 2010 Elsevier Ltd. All rights reserved.
Xu, Weijia; Ozer, Stuart; Gutell, Robin R
2009-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Xu, Weijia; Ozer, Stuart; Gutell, Robin R.
2010-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
Structure-Templated Predictions of Novel Protein Interactions from Sequence Information
Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V
2007-01-01
The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321
Power law tails in phylogenetic systems.
Qin, Chongli; Colwell, Lucy J
2018-01-23
Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.
Identifying novel sequence variants of RNA 3D motifs
Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.
2015-01-01
Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Structure stability of lytic peptides during their interactions with lipid bilayers.
Chen, H M; Lee, C H
2001-10-01
In this work, molecular dynamics simulations were used to examine the consequences of a variety of analogs of cecropin A on lipid bilayers. Analog sequences were constructed by replacing either the N- or C-terminal helix with the other helix in native or reverse sequence order, by making palindromic peptides based on both the N- and C-terminal helices, and by deleting the hinge region. The structure of the peptides was monitored throughout the simulation. The hinge region appeared not to assist in maintaining helical structure but help in motion flexibility. In general, the N-terminal helix of peptides was less stable than the C-terminal one during the interaction with anionic lipid bilayers. Sequences with hydrophobic helices tended to regain helical structure after an initial loss while sequences with amphipathic helices were less able to do this. The results suggests that hydrophobic design peptides have a high structural stability in an anionic membrane and are the candidates for experimental investigation.
Wustman, Brandon A; Morse, Daniel E; Evans, John Spencer
2004-08-05
The AP7 and AP24 proteins represent a class of mineral-interaction polypeptides that are found in the aragonite-containing nacre layer of mollusk shell (H. rufescens). These proteins have been shown to preferentially interfere with calcium carbonate mineral growth in vitro. It is believed that both proteins play an important role in aragonite polymorph selection in the mollusk shell. Previously, we demonstrated the 1-30 amino acid (AA) N-terminal sequences of AP7 and AP24 represent mineral interaction/modification domains in both proteins, as evidenced by their ability to frustrate calcium carbonate crystal growth at step edge regions. In this present report, using free N-terminal, C(alpha)-amide "capped" synthetic polypeptides representing the 1-30 AA regions of AP7 (AP7-1 polypeptide) and AP24 (AP24-1 polypeptide) and NMR spectroscopy, we confirm that both N-terminal sequences possess putative Ca (II) interaction polyanionic sequence regions (2 x -DD- in AP7-1, -DDDED- in AP24-1) that are random coil-like in structure. However, with regard to the remaining sequences regions, each polypeptide features unique structural differences. AP7-1 possesses an extended beta-strand or polyproline type II-like structure within the A11-M10, S12-V13, and S28-I27 sequence regions, with the remaining sequence regions adopting a random-coil-like structure, a trait common to other polyelectrolyte mineral-associated polypeptide sequences. Conversely, AP24-1 possesses random coil-like structure within A1-S9 and Q14-N16 sequence regions, and evidence for turn-like, bend, or loop conformation within the G10-N13, Q17-N24, and M29-F30 sequence regions, similar to the structures identified within the putative elastomeric proteins Lustrin A and sea urchin spicule matrix proteins. The similarities and differences in AP7 and AP24 N-terminal domain structure are discussed with regard to joint AP7-AP24 protein modification of calcium carbonate growth. Copyright 2004 Wiley Periodicals, Inc.
Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T
2014-06-01
RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.
Protein structure recognition: From eigenvector analysis to structural threading method
NASA Astrophysics Data System (ADS)
Cao, Haibo
In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching
NASA Astrophysics Data System (ADS)
Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.
Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Suhkmann; Zhang, Ziming; Upchurch, Sean
2004-04-16
2 ARID is a homologous family of DNA-binding domains that occur in DNA binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We havemore » solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized non-specific DNA-binding by the SWI1 ARID domain. Results from this study indicate that a flexible long internal loop in ARID motif is likely to be important for sequence specific DNA-recognition. The structure of human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that boundary of the DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies. Key Words: ARID, SWI1, NMR, structural genomics, protein-DNA interaction.« less
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-01-01
Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-12-27
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Structure and stability of the ankyrin domain of the Drosophila Notch receptor.
Zweifel, Mark E; Leahy, Daniel J; Hughson, Frederick M; Barrick, Doug
2003-11-01
The Notch receptor contains a conserved ankyrin repeat domain that is required for Notch-mediated signal transduction. The ankyrin domain of Drosophila Notch contains six ankyrin sequence repeats previously identified as closely matching the ankyrin repeat consensus sequence, and a putative seventh C-terminal sequence repeat that exhibits lower similarity to the consensus sequence. To better understand the role of the Notch ankyrin domain in Notch-mediated signaling and to examine how structure is distributed among the seven ankyrin sequence repeats, we have determined the crystal structure of this domain to 2.0 angstroms resolution. The seventh, C-terminal, ankyrin sequence repeat adopts a regular ankyrin fold, but the first, N-terminal ankyrin repeat, which contains a 15-residue insertion, appears to be largely disordered. The structure reveals a substantial interface between ankyrin polypeptides, showing a high degree of shape and charge complementarity, which may be related to homotypic interactions suggested from indirect studies. However, the Notch ankyrin domain remains largely monomeric in solution, demonstrating that this interface alone is not sufficient to promote tight association. Using the structure, we have classified reported mutations within the Notch ankyrin domain that are known to disrupt signaling into those that affect buried residues and those restricted to surface residues. We show that the buried substitutions greatly decrease protein stability, whereas the surface substitutions have only a marginal affect on stability. The surface substitutions are thus likely to interfere with Notch signaling by disrupting specific Notch-effector interactions and map the sites of these interactions.
Paiardini, Alessandro; Bossa, Francesco; Pascarella, Stefano
2004-01-01
The wealth of biological information provided by structural and genomic projects opens new prospects of understanding life and evolution at the molecular level. In this work, it is shown how computational approaches can be exploited to pinpoint protein structural features that remain invariant upon long evolutionary periods in the fold-type I, PLP-dependent enzymes. A nonredundant set of 23 superposed crystallographic structures belonging to this superfamily was built. Members of this family typically display high-structural conservation despite low-sequence identity. For each structure, a multiple-sequence alignment of orthologous sequences was obtained, and the 23 alignments were merged using the structural information to obtain a comprehensive multiple alignment of 921 sequences of fold-type I enzymes. The structurally conserved regions (SCRs), the evolutionarily conserved residues, and the conserved hydrophobic contacts (CHCs) were extracted from this data set, using both sequence and structural information. The results of this study identified a structural pattern of hydrophobic contacts shared by all of the superfamily members of fold-type I enzymes and involved in native interactions. This profile highlights the presence of a nucleus for this fold, in which residues participating in the most conserved native interactions exhibit preferential evolutionary conservation, that correlates significantly (r = 0.70) with the extent of mean hydrophobic contact value of their apolar fraction. PMID:15498941
Crystal structure of the Msx-1 homeodomain/DNA complex.
Hovde, S; Abate-Shen, C; Geiger, J H
2001-10-09
The Msx-1 homeodomain protein plays a crucial role in craniofacial, limb, and nervous system development. Homeodomain DNA-binding domains are comprised of 60 amino acids that show a high degree of evolutionary conservation. We have determined the structure of the Msx-1 homeodomain complexed to DNA at 2.2 A resolution. The structure has an unusually well-ordered N-terminal arm with a unique trajectory across the minor groove of the DNA. DNA specificity conferred by bases flanking the core TAAT sequence is explained by well ordered water-mediated interactions at Q50. Most interactions seen at the TAAT sequence are typical of the interactions seen in other homeodomain structures. Comparison of the Msx-1-HD structure to all other high resolution HD-DNA complex structures indicate a remarkably well-conserved sphere of hydration between the DNA and protein in these complexes.
Nicoludis, John M; Lau, Sze-Yi; Schärfe, Charlotta P I; Marks, Debora S; Weihofen, Wilhelm A; Gaudet, Rachelle
2015-11-03
Clustered protocadherin (Pcdh) proteins mediate dendritic self-avoidance in neurons via specific homophilic interactions in their extracellular cadherin (EC) domains. We determined crystal structures of EC1-EC3, containing the homophilic specificity-determining region, of two mouse clustered Pcdh isoforms (PcdhγA1 and PcdhγC3) to investigate the nature of the homophilic interaction. Within the crystal lattices, we observe antiparallel interfaces consistent with a role in trans cell-cell contact. Antiparallel dimerization is supported by evolutionary correlations. Two interfaces, located primarily on EC2-EC3, involve distinctive clustered Pcdh structure and sequence motifs, lack predicted glycosylation sites, and contain residues highly conserved in orthologs but not paralogs, pointing toward their biological significance as homophilic interaction interfaces. These two interfaces are similar yet distinct, reflecting a possible difference in interaction architecture between clustered Pcdh subfamilies. These structures initiate a molecular understanding of clustered Pcdh assemblies that are required to produce functional neuronal networks. Copyright © 2015 Elsevier Ltd. All rights reserved.
Prediction of Ras-effector interactions using position energy matrices.
Kiel, Christina; Serrano, Luis
2007-09-01
One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.
Predicting helix orientation for coiled-coil dimers
Apgar, James R.; Gutwin, Karl N.; Keating, Amy E.
2008-01-01
The alpha-helical coiled coil is a structurally simple protein oligomerization or interaction motif consisting of two or more alpha helices twisted into a supercoiled bundle. Coiled coils can differ in their stoichiometry, helix orientation and axial alignment. Because of the near degeneracy of many of these variants, coiled coils pose a challenge to fold recognition methods for structure prediction. Whereas distinctions between some protein folds can be discriminated on the basis of hydrophobic/polar patterning or secondary structure propensities, the sequence differences that encode important details of coiled-coil structure can be subtle. This is emblematic of a larger problem in the field of protein structure and interaction prediction: that of establishing specificity between closely similar structures. We tested the behavior of different computational models on the problem of recognizing the correct orientation - parallel vs. antiparallel - of pairs of alpha helices that can form a dimeric coiled coil. For each of 131 examples of known structure, we constructed a large number of both parallel and antiparallel structural models and used these to asses the ability of five energy functions to recognize the correct fold. We also developed and tested three sequenced-based approaches that make use of varying degrees of implicit structural information. The best structural methods performed similarly to the best sequence methods, correctly categorizing ∼81% of dimers. Steric compatibility with the fold was important for some coiled coils we investigated. For many examples, the correct orientation was determined by smaller energy differences between parallel and antiparallel structures distributed over many residues and energy components. Prediction methods that used structure but incorporated varying approximations and assumptions showed quite different behaviors when used to investigate energetic contributions to orientation preference. Sequence based methods were sensitive to the choice of residue-pair interactions scored. PMID:18506779
Functions of the 3′ and 5′ genome RNA regions of members of the genus Flavivirus
Brinton, Margo A.; Basu, Mausumi
2015-01-01
The positive sense genomes of members of the genus Flavivirus in the family Flaviviridae are ~11 kb nts in length and have a 5′ type I cap but no 3′ poly A. The 5′ and 3′ terminal regions contain short conserved sequences that are proposed to be repeated remnants of an ancient sequence. However, the functions of most of these conserved sequences have not yet been determined. The terminal regions of the genome also contain multiple conserved RNA structures. Functional data for many of these structures has been obtained. Three sets of complementary 3′ and 5′ terminal region sequences, some of which are located in conserved RNA structures, interact to form a panhandle structure that is required for initiation of minus strand RNA synthesis with the 5′ terminal structure functioning as the promoter. How the switch from the terminal RNA structure base pairing to the long distance RNA-RNA interaction is triggered and regulated is not well understood but evidence suggests involvement of a cell protein binding to three sites on the 3′ terminal RNA structures and a cis-acting metastable 3′ RNA element in the 3′ terminal structure. Cell proteins may also be involved in facilitating exponential replication of nascent genomic RNA within replication vesicles at later times of infection cycle. Other conserved RNA structures and/or sequences in the 5′ and 3′ terminal regions have been proposed to regulate genome translation. Additional functions of the 5′ and 3′ terminal sequences have also been reported. PMID:25683510
Nonparametric Combinatorial Sequence Models
NASA Astrophysics Data System (ADS)
Wauthier, Fabian L.; Jordan, Michael I.; Jojic, Nebojsa
This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.
Recognition of Local DNA Structures by p53 Protein
Brázda, Václav; Coufal, Jan
2017-01-01
p53 plays critical roles in regulating cell cycle, apoptosis, senescence and metabolism and is commonly mutated in human cancer. These roles are achieved by interaction with other proteins, but particularly by interaction with DNA. As a transcription factor, p53 is well known to bind consensus target sequences in linear B-DNA. Recent findings indicate that p53 binds with higher affinity to target sequences that form cruciform DNA structure. Moreover, p53 binds very tightly to non-B DNA structures and local DNA structures are increasingly recognized to influence the activity of wild-type and mutant p53. Apart from cruciform structures, p53 binds to quadruplex DNA, triplex DNA, DNA loops, bulged DNA and hemicatenane DNA. In this review, we describe local DNA structures and summarize information about interactions of p53 with these structural DNA motifs. These recent data provide important insights into the complexity of the p53 pathway and the functional consequences of wild-type and mutant p53 activation in normal and tumor cells. PMID:28208646
Asymmetric scoring functions for proteins
NASA Astrophysics Data System (ADS)
Lezon, Timothy; Holter, Neal; Maritan, Amos; Banavar, Jayanth
2003-03-01
The protein folding problem entails the prediction of the native state structure of a protein given the sequence of amino acids. In a coarse-grained description of a protein, an important ingredient for attempting this task is the determination of the effective energies of interaction between amino acids. We will discuss a simple approach for determining such interaction potentials from a training set of protein sequences and their experimentally determined native state structures. The key new ingredient in our study is the incorporation of the lack of symmetry in the effective interactions between amino acids. Our results, obtained using a set of 513 proteins, and their implications will be discussed.
NASA Astrophysics Data System (ADS)
Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf
Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.
Accetto, Tomaž; Avguštin, Gorazd
2011-01-01
The Shine-Dalgarno (SD) sequence is a key element directing the translation to initiate at the authentic start codons and also enabling translation initiation to proceed in 5′ untranslated mRNA regions (5′-UTRs) containing moderately strong secondary structures. Bioinformatic analysis of almost forty genomes from the major bacterial phylum Bacteroidetes revealed, however, a general absence of SD sequence, drop in GC content and consequently reduced tendency to form secondary structures in 5′-UTRs. The experiments using the Prevotella bryantii TC1-1 expression system were in agreement with these findings: neither addition nor omission of SD sequence in the unstructured 5′-UTR affected the level of the reporter protein, non-specific nuclease NucB. Further, NucB level in P. bryantii TC1-1, contrary to hMGFP level in Escherichia coli, was five times lower when SD sequence formed part of the secondary structure with a folding energy -5,2 kcal/mol. Also, the extended SD sequences did not affect protein levels as in E. coli. It seems therefore that a functional SD interaction does not take place during the translation initiation in P. bryanttii TC1-1 and possibly other members of phylum Bacteroidetes although the anti SD sequence is present in 16S rRNA genes of their genomes. We thus propose that in the absence of the SD sequence interaction, the selection of genuine start codons in Bacteroidetes is accomplished by binding of ribosomal protein S1 to unstructured 5′-UTR as opposed to coding region which is inaccessible due to mRNA secondary structure. Additionally, we found that sequence logos of region preceding the start codons may be used as taxonomical markers. Depending on whether complete sequence logo or only part of it, such as information content and base proportion at specific positions, is used, bacterial genera or families and in some cases even bacterial phyla can be distinguished. PMID:21857964
NASA Astrophysics Data System (ADS)
Tene, Yair; Tene, Noam; Tene, G.
1993-08-01
An interactive data fusion methodology of video, audio, and nonlinear structural dynamic analysis for potential application in forensic engineering is presented. The methodology was developed and successfully demonstrated in the analysis of heavy transportable bridge collapse during preparation for testing. Multiple bridge elements failures were identified after the collapse, including fracture, cracks and rupture of high performance structural materials. Videotape recording by hand held camcorder was the only source of information about the collapse sequence. The interactive data fusion methodology resulted in extracting relevant information form the videotape and from dynamic nonlinear structural analysis, leading to full account of the sequence of events during the bridge collapse.
NASA Astrophysics Data System (ADS)
Meyer, Sam; Everaers, Ralf
2015-02-01
The histone-DNA interaction in the nucleosome is a fundamental mechanism of genomic compaction and regulation, which remains largely unknown despite increasing structural knowledge of the complex. In this paper, we propose a framework for the extraction of a nanoscale histone-DNA force-field from a collection of high-resolution structures, which may be adapted to a larger class of protein-DNA complexes. We applied the procedure to a large crystallographic database extended by snapshots from molecular dynamics simulations. The comparison of the structural models first shows that, at histone-DNA contact sites, the DNA base-pairs are shifted outwards locally, consistent with locally repulsive forces exerted by the histones. The second step shows that the various force profiles of the structures under analysis derive locally from a unique, sequence-independent, quadratic repulsive force-field, while the sequence preferences are entirely due to internal DNA mechanics. We have thus obtained the first knowledge-derived nanoscale interaction potential for histone-DNA in the nucleosome. The conformations obtained by relaxation of nucleosomal DNA with high-affinity sequences in this potential accurately reproduce the experimental values of binding preferences. Finally we address the more generic binding mechanisms relevant to the 80% genomic sequences incorporated in nucleosomes, by computing the conformation of nucleosomal DNA with sequence-averaged properties. This conformation differs from those found in crystals, and the analysis suggests that repulsive histone forces are related to local stretch tension in nucleosomal DNA, mostly between adjacent contact points. This tension could play a role in the stability of the complex.
In vitro fluorescence studies of transcription factor IIB-DNA interaction.
Górecki, Andrzej; Figiel, Małgorzata; Dziedzicka-Wasylewska, Marta
2015-01-01
General transcription factor TFIIB is one of the basal constituents of the preinitiation complex of eukaryotic RNA polymerase II, acting as a bridge between the preinitiation complex and the polymerase, and binding promoter DNA in an asymmetric manner, thereby defining the direction of the transcription. Methods of fluorescence spectroscopy together with circular dichroism spectroscopy were used to observe conformational changes in the structure of recombinant human TFIIB after binding to specific DNA sequence. To facilitate the exploration of the structural changes, several site-directed mutations have been introduced altering the fluorescence properties of the protein. Our observations showed that binding of specific DNA sequences changed the protein structure and dynamics, and TFIIB may exist in two conformational states, which can be described by a different microenvironment of W52. Fluorescence studies using both intrinsic and exogenous fluorophores showed that these changes significantly depended on the recognition sequence and concerned various regions of the protein, including those interacting with other transcription factors and RNA polymerase II. DNA binding can cause rearrangements in regions of proteins interacting with the polymerase in a manner dependent on the recognized sequences, and therefore, influence the gene expression.
Complementary molecular information changes our perception of food web structure
Wirta, Helena K.; Hebert, Paul D. N.; Kaartinen, Riikka; Prosser, Sean W.; Várkonyi, Gergely; Roslin, Tomas
2014-01-01
How networks of ecological interactions are structured has a major impact on their functioning. However, accurately resolving both the nodes of the webs and the links between them is fraught with difficulties. We ask whether the new resolution conferred by molecular information changes perceptions of network structure. To probe a network of antagonistic interactions in the High Arctic, we use two complementary sources of molecular data: parasitoid DNA sequenced from the tissues of their hosts and host DNA sequenced from the gut of adult parasitoids. The information added by molecular analysis radically changes the properties of interaction structure. Overall, three times as many interaction types were revealed by combining molecular information from parasitoids and hosts with rearing data, versus rearing data alone. At the species level, our results alter the perceived host specificity of parasitoids, the parasitoid load of host species, and the web-wide role of predators with a cryptic lifestyle. As the northernmost network of host–parasitoid interactions quantified, our data point exerts high leverage on global comparisons of food web structure. However, how we view its structure will depend on what information we use: compared with variation among networks quantified at other sites, the properties of our web vary as much or much more depending on the techniques used to reconstruct it. We thus urge ecologists to combine multiple pieces of evidence in assessing the structure of interaction webs, and suggest that current perceptions of interaction structure may be strongly affected by the methods used to construct them. PMID:24449902
MollDE: a homology modeling framework you can click with.
Canutescu, Adrian A; Dunbrack, Roland L
2005-06-15
Molecular Integrated Development Environment (MolIDE) is an integrated application designed to provide homology modeling tools and protocols under a uniform, user-friendly graphical interface. Its main purpose is to combine the most frequent modeling steps in a semi-automatic, interactive way, guiding the user from the target protein sequence to the final three-dimensional protein structure. The typical basic homology modeling process is composed of building sequence profiles of the target sequence family, secondary structure prediction, sequence alignment with PDB structures, assisted alignment editing, side-chain prediction and loop building. All of these steps are available through a graphical user interface. MolIDE's user-friendly and streamlined interactive modeling protocol allows the user to focus on the important modeling questions, hiding from the user the raw data generation and conversion steps. MolIDE was designed from the ground up as an open-source, cross-platform, extensible framework. This allows developers to integrate additional third-party programs to MolIDE. http://dunbrack.fccc.edu/molide/molide.php rl_dunbrack@fccc.edu.
Direct Calculation of Protein Fitness Landscapes through Computational Protein Design
Au, Loretta; Green, David F.
2016-01-01
Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A∗ search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones. PMID:26745411
Mahoney, J. Matthew; Titiz, Ali S.; Hernan, Amanda E.; Scott, Rod C.
2016-01-01
Hippocampal neural systems consolidate multiple complex behaviors into memory. However, the temporal structure of neural firing supporting complex memory consolidation is unknown. Replay of hippocampal place cells during sleep supports the view that a simple repetitive behavior modifies sleep firing dynamics, but does not explain how multiple episodes could be integrated into associative networks for recollection during future cognition. Here we decode sequential firing structure within spike avalanches of all pyramidal cells recorded in sleeping rats after running in a circular track. We find that short sequences that combine into multiple long sequences capture the majority of the sequential structure during sleep, including replay of hippocampal place cells. The ensemble, however, is not optimized for maximally producing the behavior-enriched episode. Thus behavioral programming of sequential correlations occurs at the level of short-range interactions, not whole behavioral sequences and these short sequences are assembled into a large and complex milieu that could support complex memory consolidation. PMID:26866597
Interactive computer programs for the graphic analysis of nucleotide sequence data.
Luckow, V A; Littlewood, R K; Rownd, R H
1984-01-01
A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437
NASA Astrophysics Data System (ADS)
Weigt, Martin
Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
Interactive computer graphics system for structural sizing and analysis of aircraft structures
NASA Technical Reports Server (NTRS)
Bendavid, D.; Pipano, A.; Raibstein, A.; Somekh, E.
1975-01-01
A computerized system for preliminary sizing and analysis of aircraft wing and fuselage structures was described. The system is based upon repeated application of analytical program modules, which are interactively interfaced and sequence-controlled during the iterative design process with the aid of design-oriented graphics software modules. The entire process is initiated and controlled via low-cost interactive graphics terminals driven by a remote computer in a time-sharing mode.
Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian
2009-11-01
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.
RNAfbinv: an interactive Java application for fragment-based design of RNA sequences.
Weinbrand, Lina; Avihoo, Assaf; Barash, Danny
2013-11-15
In RNA design problems, it is plausible to assume that the user would be interested in preserving a particular RNA secondary structure motif, or fragment, for biological reasons. The preservation could be in structure or sequence, or both. Thus, the inverse RNA folding problem could benefit from considering fragment constraints. We have developed a new interactive Java application called RNA fragment-based inverse that allows users to insert an RNA secondary structure in dot-bracket notation. It then performs sequence design that conforms to the shape of the input secondary structure, the specified thermodynamic stability, the specified mutational robustness and the user-selected fragment after shape decomposition. In this shape-based design approach, specific RNA structural motifs with known biological functions are strictly enforced, while others can possess more flexibility in their structure in favor of preserving physical attributes and additional constraints. RNAfbinv is freely available for download on the web at http://www.cs.bgu.ac.il/~RNAexinv/RNAfbinv. The site contains a help file with an explanation regarding the exact use.
JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures
Dong, Min; Graham, Mitchell; Yadav, Nehul
2017-01-01
Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416
Substrate sequence selectivity of APOBEC3A implicates intra-DNA interactions.
Silvas, Tania V; Hou, Shurong; Myint, Wazo; Nalivaika, Ellen; Somasundaran, Mohan; Kelch, Brian A; Matsuo, Hiroshi; Kurt Yilmaz, Nese; Schiffer, Celia A
2018-05-14
The APOBEC3 (A3) family of human cytidine deaminases is renowned for providing a first line of defense against many exogenous and endogenous retroviruses. However, the ability of these proteins to deaminate deoxycytidines in ssDNA makes A3s a double-edged sword. When overexpressed, A3s can mutate endogenous genomic DNA resulting in a variety of cancers. Although the sequence context for mutating DNA varies among A3s, the mechanism for substrate sequence specificity is not well understood. To characterize substrate specificity of A3A, a systematic approach was used to quantify the affinity for substrate as a function of sequence context, length, secondary structure, and solution pH. We identified the A3A ssDNA binding motif as (T/C)TC(A/G), which correlated with enzymatic activity. We also validated that A3A binds RNA in a sequence specific manner. A3A bound tighter to substrate binding motif within a hairpin loop compared to linear oligonucleotide, suggesting A3A affinity is modulated by substrate structure. Based on these findings and previously published A3A-ssDNA co-crystal structures, we propose a new model with intra-DNA interactions for the molecular mechanism underlying A3A sequence preference. Overall, the sequence and structural preferences identified for A3A leads to a new paradigm for identifying A3A's involvement in mutation of endogenous or exogenous DNA.
(Pea)nuts and bolts of visual narrative: Structure and meaning in sequential image comprehension
Cohn, Neil; Paczynski, Martin; Jackendoff, Ray; Holcomb, Phillip J.; Kuperberg, Gina R.
2012-01-01
Just as syntax differentiates coherent sentences from scrambled word strings, the comprehension of sequential images must also use a cognitive system to distinguish coherent narrative sequences from random strings of images. We conducted experiments analogous to two classic studies of language processing to examine the contributions of narrative structure and semantic relatedness to processing sequential images. We compared four types of comic strips: 1) Normal sequences with both structure and meaning, 2) Semantic Only sequences (in which the panels were related to a common semantic theme, but had no narrative structure), 3) Structural Only sequences (narrative structure but no semantic relatedness), and 4) Scrambled sequences of randomly-ordered panels. In Experiment 1, participants monitored for target panels in sequences presented panel-by-panel. Reaction times were slowest to panels in Scrambled sequences, intermediate in both Structural Only and Semantic Only sequences, and fastest in Normal sequences. This suggests that both semantic relatedness and narrative structure offer advantages to processing. Experiment 2 measured ERPs to all panels across the whole sequence. The N300/N400 was largest to panels in both the Scrambled and Structural Only sequences, intermediate in Semantic Only sequences and smallest in the Normal sequences. This implies that a combination of narrative structure and semantic relatedness can facilitate semantic processing of upcoming panels (as reflected by the N300/N400). Also, panels in the Scrambled sequences evoked a larger left-lateralized anterior negativity than panels in the Structural Only sequences. This localized effect was distinct from the N300/N400, and appeared despite the fact that these two sequence types were matched on local semantic relatedness between individual panels. These findings suggest that sequential image comprehension uses a narrative structure that may be independent of semantic relatedness. Altogether, we argue that the comprehension of visual narrative is guided by an interaction between structure and meaning. PMID:22387723
Structure-affinity relationships for the binding of actinomycin D to DNA
NASA Astrophysics Data System (ADS)
Gallego, José; Ortiz, Angel R.; de Pascual-Teresa, Beatriz; Gago, Federico
1997-03-01
Molecular models of the complexes between actinomycin D and 14 different DNA hexamers were built based on the X-ray crystal structure of the actinomycin-d(GAAGCTTC)2 complex. The DNA sequences included the canonical GpC binding step flanked by different base pairs, nonclassical binding sites such as GpG and GpT, and sites containing 2,6-diamino- purine. A good correlation was found between the intermolecular interaction energies calculated for the refined complexes and the relative preferences of actinomycin binding to standard and modified DNA. A detailed energy decomposition into van der Waals and electrostatic components for the interactions between the DNA base pairs and either the chromophore or the peptidic part of the antibiotic was performed for each complex. The resulting energy matrix was then subjected to principal component analysis, which showed that actinomycin D discriminates among different DNA sequences by an interplay of hydrogen bonding and stacking interactions. The structure-affinity relationships for this important antitumor drug are thus rationalized and may be used to advantage in the design of novel sequence-specific DNA-binding agents.
Kwasigroch, Jean Marc; Rooman, Marianne
2006-07-15
Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude(&Fugue) computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. (Prelude&)Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold. http://babylone.ulb.ac.be/Prelude_and_Fugue.
Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin
2015-04-01
This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.
A structural basis for antigen presentation by the MHC class Ib molecule, Qa-1b.
Zeng, Li; Sullivan, Lucy C; Vivian, Julian P; Walpole, Nicholas G; Harpur, Christopher M; Rossjohn, Jamie; Clements, Craig S; Brooks, Andrew G
2012-01-01
The primary function of the monomorphic MHC class Ib molecule Qa-1(b) is to present peptides derived from the leader sequences of other MHC class I molecules for recognition by the CD94-NKG2 receptors expressed by NK and T cells. Whereas the mode of peptide presentation by its ortholog HLA-E, and subsequent recognition by CD94-NKG2A, is known, the molecular basis of Qa-1(b) function is unclear. We have assessed the interaction between Qa-1(b) and CD94-NKG2A and shown that they interact with an affinity of 17 μM. Furthermore, we have determined the structure of Qa-1(b) bound to the leader sequence peptide, Qdm (AMAPRTLLL), to a resolution of 1.9 Å and compared it with that of HLA-E. The crystal structure provided a basis for understanding the restricted peptide repertoire of Qa-1(b). Whereas the Qa-1(b-AMAPRTLLL) complex was similar to that of HLA-E, significant sequence and structural differences were observed between the respective Ag-binding clefts. However, the conformation of the Qdm peptide bound by Qa-1(b) was very similar to that of peptide bound to HLA-E. Although a number of conserved innate receptors can recognize heterologous ligands from other species, the structural differences between Qa-1(b) and HLA-E manifested in CD94-NKG2A ligand recognition being species specific despite similarities in peptide sequence and conformation. Collectively, our data illustrate the structural homology between Qa-1(b) and HLA-E and provide a structural basis for understanding peptide repertoire selection and the specificity of the interaction of Qa-1(b) with CD94-NKG2 receptors.
Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins
NASA Astrophysics Data System (ADS)
Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil
2014-03-01
Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.
Detecting Coevolution in and among Protein Domains
Yeang, Chen-Hsiang; Haussler, David
2007-01-01
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264
Understanding the mechanisms of protein-DNA interactions
NASA Astrophysics Data System (ADS)
Lavery, Richard
2004-03-01
Structural, biochemical and thermodynamic data on protein-DNA interactions show that specific recognition cannot be reduced to a simple set of binary interactions between the partners (such as hydrogen bonds, ion pairs or steric contacts). The mechanical properties of the partners also play a role and, in the case of DNA, variations in both conformation and flexibility as a function of base sequence can be a significant factor in guiding a protein to the correct binding site. All-atom molecular modeling offers a means of analyzing the role of different binding mechanisms within protein-DNA complexes of known structure. This however requires estimating the binding strengths for the full range of sequences with which a given protein can interact. Since this number grows exponentially with the length of the binding site it is necessary to find a method to accelerate the calculations. We have achieved this by using a multi-copy approach (ADAPT) which allows us to build a DNA fragment with a variable base sequence. The results obtained with this method correlate well with experimental consensus binding sequences. They enable us to show that indirect recognition mechanisms involving the sequence dependent properties of DNA play a significant role in many complexes. This approach also offers a means of predicting protein binding sites on the basis of binding energies, which is complementary to conventional lexical techniques.
Finding the target sites of RNA-binding proteins
Li, Xiao; Kazan, Hilal; Lipshitz, Howard D; Morris, Quaid D
2014-01-01
RNA–protein interactions differ from DNA–protein interactions because of the central role of RNA secondary structure. Some RNA-binding domains (RBDs) recognize their target sites mainly by their shape and geometry and others are sequence-specific but are sensitive to secondary structure context. A number of small- and large-scale experimental approaches have been developed to measure RNAs associated in vitro and in vivo with RNA-binding proteins (RBPs). Generalizing outside of the experimental conditions tested by these assays requires computational motif finding. Often RBP motif finding is done by adapting DNA motif finding methods; but modeling secondary structure context leads to better recovery of RBP-binding preferences. Genome-wide assessment of mRNA secondary structure has recently become possible, but these data must be combined with computational predictions of secondary structure before they add value in predicting in vivo binding. There are two main approaches to incorporating structural information into motif models: supplementing primary sequence motif models with preferred secondary structure contexts (e.g., MEMERIS and RNAcontext) and directly modeling secondary structure recognized by the RBP using stochastic context-free grammars (e.g., CMfinder and RNApromo). The former better reconstruct known binding preferences for sequence-specific RBPs but are not suitable for modeling RBPs that recognize shape and geometry of RNAs. Future work in RBP motif finding should incorporate interactions between multiple RBDs and multiple RBPs in binding to RNA. WIREs RNA 2014, 5:111–130. doi: 10.1002/wrna.1201 PMID:24217996
Probing the Structures of Viral RNA Regulatory Elements with SHAPE and Related Methodologies
Rausch, Jason W.; Sztuba-Solinska, Joanna; Le Grice, Stuart F. J.
2018-01-01
Viral RNAs were selected by evolution to possess maximum functionality in a minimal sequence. Depending on the classification of the virus and the type of RNA in question, viral RNAs must alternately be replicated, spliced, transcribed, transported from the nucleus into the cytoplasm, translated and/or packaged into nascent virions, and in most cases, provide the sequence and structural determinants to facilitate these processes. One consequence of this compact multifunctionality is that viral RNA structures can be exquisitely complex, often involving intermolecular interactions with RNA or protein, intramolecular interactions between sequence segments separated by several thousands of nucleotides, or specialized motifs such as pseudoknots or kissing loops. The fluidity of viral RNA structure can also present a challenge when attempting to characterize it, as genomic RNAs especially are likely to sample numerous conformations at various stages of the virus life cycle. Here we review advances in chemoenzymatic structure probing that have made it possible to address such challenges with respect to cis-acting elements, full-length viral genomes and long non-coding RNAs that play a major role in regulating viral gene expression. PMID:29375504
Aamir, Mohd; Singh, Vinay K.; Meena, Mukesh; Upadhyay, Ram S.; Gupta, Vijai K.; Singh, Surendra
2017-01-01
The WRKY transcription factors (TFs), play crucial role in plant defense response against various abiotic and biotic stresses. The role of WRKY3 and WRKY4 genes in plant defense response against necrotrophic pathogens is well-reported. However, their functional annotation in tomato is largely unknown. In the present work, we have characterized the structural and functional attributes of the two identified tomato WRKY transcription factors, WRKY3 (SlWRKY3), and WRKY4 (SlWRKY4) using computational approaches. Arabidopsis WRKY3 (AtWRKY3: NP_178433) and WRKY4 (AtWRKY4: NP_172849) protein sequences were retrieved from TAIR database and protein BLAST was done for finding their sequential homologs in tomato. Sequence alignment, phylogenetic classification, and motif composition analysis revealed the remarkable sequential variation between, these two WRKYs. The tomato WRKY3 and WRKY4 clusters with Solanum pennellii showing the monophyletic origin and evolution from their wild homolog. The functional domain region responsible for sequence specific DNA-binding occupied in both proteins were modeled [using AtWRKY4 (PDB ID:1WJ2) and AtWRKY1 (PDBID:2AYD) as template protein structures] through homology modeling using Discovery Studio 3.0. The generated models were further evaluated for their accuracy and reliability based on qualitative and quantitative parameters. The modeled proteins were found to satisfy all the crucial energy parameters and showed acceptable Ramachandran statistics when compared to the experimentally resolved NMR solution structures and/or X-Ray diffracted crystal structures (templates). The superimposition of the functional WRKY domains from SlWRKY3 and SlWRKY4 revealed remarkable structural similarity. The sequence specific DNA binding for two WRKYs was explored through DNA-protein interaction using Hex Docking server. The interaction studies found that SlWRKY4 binds with the W-box DNA through WRKYGQK with Tyr408, Arg409, and Lys419 with the initial flanking sequences also get involved in binding. In contrast, the SlWRKY3 made interaction with RKYGQK along with the residues from zinc finger motifs. Protein-protein interactions studies were done using STRING version 10.0 to explore all the possible protein partners involved in associative functional interaction networks. The Gene ontology enrichment analysis revealed the functional dimension and characterized the identified WRKYs based on their functional annotation. PMID:28611792
Bartho, Joseph D.; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O.; Zhao, Youfu; Walsh, Martin A.
2017-01-01
AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family. PMID:28426806
Bartho, Joseph D; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O; Zhao, Youfu; Walsh, Martin A; Benini, Stefano
2017-01-01
AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family.
Length and sequence dependence in the association of Huntingtin protein with lipid membranes
NASA Astrophysics Data System (ADS)
Jawahery, Sudi; Nagarajan, Anu; Matysiak, Silvina
2013-03-01
There is a fundamental gap in our understanding of how aggregates of mutant Huntingtin protein (htt) with overextended polyglutamine (polyQ) sequences gain the toxic properties that cause Huntington's disease (HD). Experimental studies have shown that the most important step associated with toxicity is the binding of mutant htt aggregates to lipid membranes. Studies have also shown that flanking amino acid sequences around the polyQ sequence directly affect interactions with the lipid bilayer, and that polyQ sequences of greater than 35 glutamine repeats in htt are a characteristic of HD. The key steps that determine how flanking sequences and polyQ length affect the structure of lipid bilayers remain unknown. In this study, we use atomistic molecular dynamics simulations to study the interactions between lipid membranes of varying compositions and polyQ peptides of varying lengths and flanking sequences. We find that overextended polyQ interactions do cause deformation in model membranes, and that the flanking sequences do play a role in intensifying this deformation by altering the shape of the affected regions.
Basu, Abhijit; Jain, Niyati; Tolbert, Blanton S.; Komar, Anton A.
2017-01-01
Abstract RNA–protein interactions with physiological outcomes usually rely on conserved sequences within the RNA element. By contrast, activity of the diverse gamma-interferon-activated inhibitor of translation (GAIT)-elements relies on the conserved RNA folding motifs rather than the conserved sequence motifs. These elements drive the translational silencing of a group of chemokine (CC/CXC) and chemokine receptor (CCR) mRNAs, thereby helping to resolve physiological inflammation. Despite sequence dissimilarity, these RNA elements adopt common secondary structures (as revealed by 2D-1H NMR spectroscopy), providing a basis for their interaction with the RNA-binding GAIT complex. However, many of these elements (e.g. those derived from CCL22, CXCL13, CCR4 and ceruloplasmin (Cp) mRNAs) have substantially different affinities for GAIT complex binding. Toeprinting analysis shows that different positions within the overall conserved GAIT element structure contribute to differential affinities of the GAIT protein complex towards the elements. Thus, heterogeneity of GAIT elements may provide hierarchical fine-tuning of the resolution of inflammation. PMID:29069516
Global Organization of a Positive-strand RNA Virus Genome
Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew
2013-01-01
The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202
Yang, A S; Hitz, B; Honig, B
1996-06-21
The stability of beta-turns is calculated as a function of sequence and turn type with a Monte Carlo sampling technique. The conformational energy of four internal hydrogen-bonded turn types, I, I', II and II', is obtained by evaluating their gas phase energy with the CHARMM force field and accounting for solvation effects with the Finite Difference Poisson-Boltzmann (FDPB) method. All four turn types are found to be less stable than the coil state, independent of the sequence in the turn. The free-energy penalties associated with turn formation vary between 1.6 kcal/mol and 7.7 kcal/mol, depending on the sequence and turn type. Differences in turn stability arise mainly from intraresidue interactions within the two central residues of the turn. For each combination of the two central residues, except for -Gly-Gly-, the most stable beta-turn type is always found to occur most commonly in native proteins. The fact that a model based on local interactions accounts for the observed preference of specific sequences suggests that long-range tertiary interactions tend to play a secondary role in determining turn conformation. In contrast, for beta-hairpins, long-range interactions appear to dominate. Specifically, due to the right-handed twist of beta-strands, type I' turns for -Gly-Gly- are found to occur with high frequency, even when local energetics would dictate otherwise. The fact that any combination of two residues is found able to adopt a relatively low-energy turn structure explains why the amino acid sequence in turns is highly variable. The calculated free-energy cost of turn formation, when combined with related numbers obtained for alpha-helices and beta-sheets, suggests a model for the initiation of protein folding based on metastable fragments of secondary structure.
Rajendran, Senthilnathan; Jothi, Arunachalam
2018-05-16
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Parker, K A; Steitz, J A
1987-01-01
The human U3 ribonucleoprotein (RNP) has been analyzed to determine its protein constituents, sites of protein-RNA interaction, and RNA secondary structure. By using anti-U3 RNP antibodies and extracts prepared from HeLa cells labeled in vivo, the RNP was found to contain four nonphosphorylated proteins of 36, 30, 13, and 12.5 kilodaltons and two phosphorylated proteins of 74 and 59 kilodaltons. U3 nucleotides 72-90, 106-121, 154-166, and 190-217 must contain sites that interact with proteins since these regions are immunoprecipitated after treatment of the RNP with RNase A or T1. The secondary structure was probed with specific nucleases and by chemical modification with single-strand-specific reagents that block subsequent reverse transcription. Regions that are single stranded (and therefore potentially able to interact with a substrate RNA) include an evolutionarily conserved sequence at nucleotides 104-112 and nonconserved sequences at nucleotides 65-74, 80-84, and 88-93. Nucleotides 159-168 do not appear to be highly accessible, thus making it unlikely that this U3 sequence base pairs with sequences near the 5.8S rRNA-internal transcribed spacer II junction, as previously proposed. Alternative functions of the U3 RNP are discussed, including the possibility that U3 may participate in a processing event near the 3' end of 28S rRNA. Images PMID:2959855
Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins
Kinjo, Akira R.; Nakamura, Haruki
2012-01-01
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478
Accounting for epistatic interactions improves the functional analysis of protein structures.
Wilkins, Angela D; Venner, Eric; Marciano, David C; Erdin, Serkan; Atri, Benu; Lua, Rhonald C; Lichtarge, Olivier
2013-11-01
The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. lichtarge@bcm.edu. Supplementary data are available at Bioinformatics online.
Accounting for epistatic interactions improves the functional analysis of protein structures
Wilkins, Angela D.; Venner, Eric; Marciano, David C.; Erdin, Serkan; Atri, Benu; Lua, Rhonald C.; Lichtarge, Olivier
2013-01-01
Motivation: The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. Methods and Results: We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Conclusions: Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. Contact: lichtarge@bcm.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24021383
Crystal structure of AFV3-109, a highly conserved protein from crenarchaeal viruses
Keller, Jenny; Leulliot, Nicolas; Cambillau, Christian; Campanacci, Valérie; Porciero, Stéphanie; Prangishvili, David; Forterre, Patrick; Cortez, Diego; Quevillon-Cheruel, Sophie; van Tilbeurgh, Herman
2007-01-01
The extraordinary morphologies of viruses infecting hyperthermophilic archaea clearly distinguish them from bacterial and eukaryotic viruses. Moreover, their genomes code for proteins that to a large extend have no related sequences in the extent databases. However, a small pool of genes is shared by overlapping subsets of these viruses, and the most conserved gene, exemplified by the ORF109 of the Acidianus Filamentous Virus 3, AFV3, is present on genomes of members of three viral familes, the Lipothrixviridae, Rudiviridae, and "Bicaudaviridae", as well as of the unclassified Sulfolobus Turreted Icosahedral Virus, STIV. We present here the crystal structure of the protein (Mr = 13.1 kD, 109 residues) encoded by the AFV3 ORF 109 in two different crystal forms at 1.5 and 1.3 Å resolution. The structure of AFV3-109 is a five stranded β-sheet with loops on one side and three helices on the other. It forms a dimer adopting the shape of a cradle that encompasses the best conserved regions of the sequence. No protein with a related fold could be identified except for the ortholog from STIV1, whose structure was deposited at the Protein Data Bank. We could clearly identify a well bound glycerol inside the cradle, contacting exclusively totally conserved residues. This interaction was confirmed in solution by fluorescence titration. Although the function of AFV3-109 cannot be deduced directly from its structure, structural homology with the STIV1 protein, and the size and charge distribution of the cavity suggested it could interact with nucleic acids. Fluorescence quenching titrations also showed that AFV3-109 interacts with dsDNA. Genomic sequence analysis revealed bacterial homologs of AFV3-109 as a part of a putative previously unidentified prophage sequences in some Firmicutes. PMID:17241456
Structure and Sequence Search on Aptamer-Protein Docking
NASA Astrophysics Data System (ADS)
Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie
2015-03-01
Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.
Faulon, Jean-Loup; Misra, Milind; Martin, Shawn; ...
2007-11-23
Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. Additionally, there is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformaticsmore » representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Lastly, such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.« less
Physics and evolution of thermophilic adaptation.
Berezovsky, Igor N; Shakhnovich, Eugene I
2005-09-06
Analysis of structures and sequences of several hyperthermostable proteins from various sources reveals two major physical mechanisms of their thermostabilization. The first mechanism is "structure-based," whereby some hyperthermostable proteins are significantly more compact than their mesophilic homologues, while no particular interaction type appears to cause stabilization; rather, a sheer number of interactions is responsible for thermostability. Other hyperthermostable proteins employ an alternative, "sequence-based" mechanism of their thermal stabilization. They do not show pronounced structural differences from mesophilic homologues. Rather, a small number of apparently strong interactions is responsible for high thermal stability of these proteins. High-throughput comparative analysis of structures and complete genomes of several hyperthermophilic archaea and bacteria revealed that organisms develop diverse strategies of thermophilic adaptation by using, to a varying degree, two fundamental physical mechanisms of thermostability. The choice of a particular strategy depends on the evolutionary history of an organism. Proteins from organisms that originated in an extreme environment, such as hyperthermophilic archaea (Pyrococcus furiosus), are significantly more compact and more hydrophobic than their mesophilic counterparts. Alternatively, organisms that evolved as mesophiles but later recolonized a hot environment (Thermotoga maritima) relied in their evolutionary strategy of thermophilic adaptation on "sequence-based" mechanism of thermostability. We propose an evolutionary explanation of these differences based on physical concepts of protein designability.
Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří
2013-01-01
The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333
ERIC Educational Resources Information Center
Bramley, Neil R.; Lagnado, David A.; Speekenbrink, Maarten
2015-01-01
Interacting with a system is key to uncovering its causal structure. A computational framework for interventional causal learning has been developed over the last decade, but how real causal learners might achieve or approximate the computations entailed by this framework is still poorly understood. Here we describe an interactive computer task in…
2018-01-01
Human vocal development is dependent on learning by imitation through social feedback between infants and caregivers. Recent studies have revealed that vocal development is also influenced by parental feedback in marmoset monkeys, suggesting vocal learning mechanisms in nonhuman primates. Marmoset infants that experience more contingent vocal feedback than their littermates develop vocalizations more rapidly, and infant marmosets with limited parental interaction exhibit immature vocal behavior beyond infancy. However, it is yet unclear whether direct parental interaction is an obligate requirement for proper vocal development because all monkeys in the aforementioned studies were able to produce the adult call repertoire after infancy. Using quantitative measures to compare distinct call parameters and vocal sequence structure, we show that social interaction has a direct impact not only on the maturation of the vocal behavior but also on acoustic call structures during vocal development. Monkeys with limited parental interaction during development show systematic differences in call entropy, a measure for maturity, compared with their normally raised siblings. In addition, different call types were occasionally uttered in motif-like sequences similar to those exhibited by vocal learners, such as birds and humans, in early vocal development. These results indicate that a lack of parental interaction leads to long-term disturbances in the acoustic structure of marmoset vocalizations, suggesting an imperative role for social interaction in proper primate vocal development. PMID:29651461
Gultekin, Yasemin B; Hage, Steffen R
2018-04-01
Human vocal development is dependent on learning by imitation through social feedback between infants and caregivers. Recent studies have revealed that vocal development is also influenced by parental feedback in marmoset monkeys, suggesting vocal learning mechanisms in nonhuman primates. Marmoset infants that experience more contingent vocal feedback than their littermates develop vocalizations more rapidly, and infant marmosets with limited parental interaction exhibit immature vocal behavior beyond infancy. However, it is yet unclear whether direct parental interaction is an obligate requirement for proper vocal development because all monkeys in the aforementioned studies were able to produce the adult call repertoire after infancy. Using quantitative measures to compare distinct call parameters and vocal sequence structure, we show that social interaction has a direct impact not only on the maturation of the vocal behavior but also on acoustic call structures during vocal development. Monkeys with limited parental interaction during development show systematic differences in call entropy, a measure for maturity, compared with their normally raised siblings. In addition, different call types were occasionally uttered in motif-like sequences similar to those exhibited by vocal learners, such as birds and humans, in early vocal development. These results indicate that a lack of parental interaction leads to long-term disturbances in the acoustic structure of marmoset vocalizations, suggesting an imperative role for social interaction in proper primate vocal development.
Gaynor, R; Soultanakis, E; Kuwabara, M; Garcia, J; Sigman, D S
1989-01-01
The transactivator protein, tat, encoded by the human immunodeficiency virus is a key regulator of viral transcription. Activation by the tat protein requires sequences downstream of the transcription initiation site called the transactivating region (TAR). RNA derived from the TAR is capable of forming a stable stem-loop structure and the maintenance of both the stem structure and the loop sequences located between +19 and +44 is required for complete in vivo activation by tat. Gel retardation assays with RNA from both wild-type and mutant TAR constructs generated in vitro with SP6 polymerase indicated specific binding of HeLa nuclear proteins to the TAR. To characterize this RNA-protein interaction, a method of chemical "imprinting" has been developed using photoactivated uranyl acetate as the nucleolytic agent. This reagent nicks RNA under physiological conditions at all four nucleotides in a reaction that is independent of sequence and secondary structure. Specific interaction of cellular proteins with TAR RNA could be detected by enhanced cleavages or imprints surrounding the loop region. Mutations that either disrupted stem base-pairing or extensively changed the primary sequence resulted in alterations in the cleavage pattern of the TAR RNA. Structural features of the TAR RNA stem-loop essential for tat activation are also required for specific binding of the HeLa cell nuclear protein. Images PMID:2544877
Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions
Chica, Claudia; Diella, Francesca; Gibson, Toby J.
2009-01-01
Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925
Designing heteropolymers to fold into unique structures via water-mediated interactions.
Jamadagni, Sumanth N; Bosoy, Christian; Garde, Shekhar
2010-10-28
Hydrophobic homopolymers collapse into globular structures in water driven by hydrophobic interactions. Here we employ extensive molecular dynamics simulations to study the collapse of heteropolymers containing one or two pairs of oppositely charged monomers. We show that charging a pair of monomers can dramatically alter the most stable conformations from compact globular to more open hairpin-like. We systematically explore a subset of the sequence space of one- and two-charge-pair polymers, focusing on the locations of the charge pairs. Conformational stability is governed by a balance of hydrophobic interactions, hydration and interactions of charge groups, water-mediated charged-hydrophobic monomer repulsions, and other factors. As a result, placing charge pairs in the middle, away from the hairpin ends, leads to stable hairpin-like structures. Turning off the monomer-water attractions enhances hydrophobic interactions significantly leading to a collapse into compact globular structures even for two-charge-pair heteropolymers. In contrast, the addition of salt leads to open and extended structures, suggesting that solvation of charged monomer sites by salt ions dominates the salt-induced enhancement of hydrophobic interactions. We also test the ability of a predictive scheme based on the additivity of free energy of contact formation. The success of the scheme for symmetric two-charge-pair sequences and the failure for their flipped versions highlight the complexity of the heteropolymer conformation space and of the design problem. Collectively, our results underscore the ability of tuning water-mediated interactions to design stable nonglobular structures in water and present model heteropolymers for further studies in the extended thermodynamic space and in inhomogeneous environments.
Bedford, Nicholas M; Hughes, Zak E; Tang, Zhenghua; Li, Yue; Briggs, Beverly D; Ren, Yang; Swihart, Mark T; Petkov, Valeri G; Naik, Rajesh R; Knecht, Marc R; Walsh, Tiffany R
2016-01-20
Peptide-enabled nanoparticle (NP) synthesis routes can create and/or assemble functional nanomaterials under environmentally friendly conditions, with properties dictated by complex interactions at the biotic/abiotic interface. Manipulation of this interface through sequence modification can provide the capability for material properties to be tailored to create enhanced materials for energy, catalysis, and sensing applications. Fully realizing the potential of these materials requires a comprehensive understanding of sequence-dependent structure/function relationships that is presently lacking. In this work, the atomic-scale structures of a series of peptide-capped Au NPs are determined using a combination of atomic pair distribution function analysis of high-energy X-ray diffraction data and advanced molecular dynamics (MD) simulations. The Au NPs produced with different peptide sequences exhibit varying degrees of catalytic activity for the exemplar reaction 4-nitrophenol reduction. The experimentally derived atomic-scale NP configurations reveal sequence-dependent differences in structural order at the NP surface. Replica exchange with solute-tempering MD simulations are then used to predict the morphology of the peptide overlayer on these Au NPs and identify factors determining the structure/catalytic properties relationship. We show that the amount of exposed Au surface, the underlying surface structural disorder, and the interaction strength of the peptide with the Au surface all influence catalytic performance. A simplified computational prediction of catalytic performance is developed that can potentially serve as a screening tool for future studies. Our approach provides a platform for broadening the analysis of catalytic peptide-enabled metallic NP systems, potentially allowing for the development of rational design rules for property enhancement.
Spink, N; Brown, D G; Skelly, J V; Neidle, S
1994-01-01
The bis-benzimidazole drug Hoechst 33258 has been co-crystallized with the dodecanucleotide sequence d(CGCAAATTTGCG)2. The structure has been solved by molecular replacement and refined to an R factor of 18.5% for 2125 reflections collected on a Xentronics area detector. The drug is bound in the minor groove, at the five base-pair site 5'-ATTTG and is in a unique orientation. This is displaced by one base pair in the 5' direction compared to previously-determined structures of this drug with the sequence d(CGCGAATTCGCG)2. Reasons for this difference in behaviour are discussed in terms of several sequence-dependent structural features of the DNA, with particular reference to differences in propeller twist and minor-groove width. Images PMID:7515488
Structural and sequencing analysis of local target DNA recognition by MLV integrase.
Aiyer, Sriram; Rossi, Paolo; Malani, Nirav; Schneider, William M; Chandar, Ashwin; Bushman, Frederic D; Montelione, Gaetano T; Roth, Monica J
2015-06-23
Target-site selection by retroviral integrase (IN) proteins profoundly affects viral pathogenesis. We describe the solution nuclear magnetic resonance structure of the Moloney murine leukemia virus IN (M-MLV) C-terminal domain (CTD) and a structural homology model of the catalytic core domain (CCD). In solution, the isolated MLV IN CTD adopts an SH3 domain fold flanked by a C-terminal unstructured tail. We generated a concordant MLV IN CCD structural model using SWISS-MODEL, MMM-tree and I-TASSER. Using the X-ray crystal structure of the prototype foamy virus IN target capture complex together with our MLV domain structures, residues within the CCD α2 helical region and the CTD β1-β2 loop were predicted to bind target DNA. The role of these residues was analyzed in vivo through point mutants and motif interchanges. Viable viruses with substitutions at the IN CCD α2 helical region and the CTD β1-β2 loop were tested for effects on integration target site selection. Next-generation sequencing and analysis of integration target sequences indicate that the CCD α2 helical region, in particular P187, interacts with the sequences distal to the scissile bonds whereas the CTD β1-β2 loop binds to residues proximal to it. These findings validate our structural model and disclose IN-DNA interactions relevant to target site selection. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Kono, H; Saven, J G
2001-02-23
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.
Computational studies of sequence-specific driving forces in peptide self-assembly
NASA Astrophysics Data System (ADS)
Jeon, Joohyun
Peptides are biopolymers made from various sequences of twenty different types of amino acids, connected by peptide bonds. There are practically an infinite number of possible sequences and tremendous possible combinations of peptide-peptide interactions. Recently, an increasing number of studies have shown a stark variety of peptide self-assembled nanomaterials whose detailed structures depend on their sequences and environmental factors; these have end uses in medical and bio-electronic applications, for example. To understand the underlying physics of complex peptide self-assembly processes and to delineate sequence specific effects, in this study, I use various simulation tools spanning all-atom molecular dynamics to simple lattice models and quantify the balance of interactions in the peptide self-assembly processes. In contrast to the existing view that peptides' aggregation propensities are proportional to the net sequence hydrophobicity and inversely proportional to the net charge, I show the more nuanced effects of electrostatic interactions, including the cooperative effects between hydrophobic and electrostatic interactions. Notably, I suggest rather unexpected, yet important roles of entropies in the small scale oligomerization processes. Overall, this study broadens our understanding of the role of thermodynamic driving forces in peptide self-assembly.
Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA.
Dong, F; Allawi, H T; Anderson, T; Neri, B P; Lyamichev, V I
2001-08-01
DNA sequence analysis by oligonucleotide binding is often affected by interference with the secondary structure of the target DNA. Here we describe an approach that improves DNA secondary structure prediction by combining enzymatic probing of DNA by structure-specific 5'-nucleases with an energy minimization algorithm that utilizes the 5'-nuclease cleavage sites as constraints. The method can identify structural differences between two DNA molecules caused by minor sequence variations such as a single nucleotide mutation. It also demonstrates the existence of long-range interactions between DNA regions separated by >300 nt and the formation of multiple alternative structures by a 244 nt DNA molecule. The differences in the secondary structure of DNA molecules revealed by 5'-nuclease probing were used to design structure-specific probes for mutation discrimination that target the regions of structural, rather than sequence, differences. We also demonstrate the performance of structure-specific 'bridge' probes complementary to non-contiguous regions of the target molecule. The structure-specific probes do not require the high stringency binding conditions necessary for methods based on mismatch formation and permit mutation detection at temperatures from 4 to 37 degrees C. Structure-specific sequence analysis is applied for mutation detection in the Mycobacterium tuberculosis katG gene and for genotyping of the hepatitis C virus.
Baltoumas, Fotis A; Theodoropoulou, Margarita C; Hamodrakas, Stavros J
2013-06-01
G-protein coupled receptors (GPCRs) are one of the largest families of membrane receptors in eukaryotes. Heterotrimeric G-proteins, composed of α, β and γ subunits, are important molecular switches in the mediation of GPCR signaling. Receptor stimulation after the binding of a suitable ligand leads to G-protein heterotrimer activation and dissociation into the Gα subunit and Gβγ heterodimer. These subunits then interact with a large number of effectors, leading to several cell responses. We studied the interactions between Gα subunits and their binding partners, using information from structural, mutagenesis and Bioinformatics studies, and conducted a series of comparisons of sequence, structure, electrostatic properties and intermolecular energies among different Gα families and subfamilies. We identified a number of Gα surfaces that may, in several occasions, participate in interactions with receptors as well as effectors. The study of Gα interacting surfaces in terms of sequence, structure and electrostatic potential reveals features that may account for the Gα subunit's behavior towards its interacting partners. The electrostatic properties of the Gα subunits, which in some cases differ greatly not only between families but also between subfamilies, as well as the G-protein interacting surfaces of effectors and regulators of G-protein signaling (RGS) suggest that electrostatic complementarity may be an important factor in G-protein interactions. Energy calculations also support this notion. This information may be useful in future studies of G-protein interactions with GPCRs and effectors. Copyright © 2013 Elsevier Inc. All rights reserved.
Petrov, Artem; Arzhanik, Vladimir; Makarov, Gennady; Koliasnikov, Oleg
2016-08-01
Antibodies are the family of proteins, which are responsible for antigen recognition. The computational modeling of interaction between an antigen and an antibody is very important when crystallographic structure is unavailable. In this research, we have discovered the correlation between the amino acid sequence of antibody and its specific binding characteristics on the example of the novel conservative binding motif, which consists of four residues: Arg H52, Tyr H33, Thr H59, and Glu H61. These residues are specifically oriented in the binding site and interact with each other in a specific manner. The residues of the binding motif are involved in interaction strictly with negatively charged groups of antigens, and form a binding complex. Mechanism of interaction and characteristics of the complex were also discovered. The results of this research can be used to increase the accuracy of computational antibody-antigen interaction modeling and for post-modeling quality control of the modeled structures.
Tang, Huiwu; Zheng, Xingmei; Li, Chuliang; Xie, Xianrong; Chen, Yuanling; Chen, Letian; Zhao, Xiucai; Zheng, Huiqi; Zhou, Jiajian; Ye, Shan; Guo, Jingxin; Liu, Yao-Guang
2017-01-01
New gene origination is a major source of genomic innovations that confer phenotypic changes and biological diversity. Generation of new mitochondrial genes in plants may cause cytoplasmic male sterility (CMS), which can promote outcrossing and increase fitness. However, how mitochondrial genes originate and evolve in structure and function remains unclear. The rice Wild Abortive type of CMS is conferred by the mitochondrial gene WA352c (previously named WA352) and has been widely exploited in hybrid rice breeding. Here, we reconstruct the evolutionary trajectory of WA352c by the identification and analyses of 11 mitochondrial genomic recombinant structures related to WA352c in wild and cultivated rice. We deduce that these structures arose through multiple rearrangements among conserved mitochondrial sequences in the mitochondrial genome of the wild rice Oryza rufipogon, coupled with substoichiometric shifting and sequence variation. We identify two expressed but nonfunctional protogenes among these structures, and show that they could evolve into functional CMS genes via sequence variations that could relieve the self-inhibitory potential of the proteins. These sequence changes would endow the proteins the ability to interact with the nucleus-encoded mitochondrial protein COX11, resulting in premature programmed cell death in the anther tapetum and male sterility. Furthermore, we show that the sequences that encode the COX11-interaction domains in these WA352c-related genes have experienced purifying selection during evolution. We propose a model for the formation and evolution of new CMS genes via a “multi-recombination/protogene formation/functionalization” mechanism involving gradual variations in the structure, sequence, copy number, and function. PMID:27725674
Nagesh, Narayana; Krishnaiah, Abburi
2003-07-31
DNA from the telomeres contains a stretch of simple tandemly repeated sequences in which clusters of G residues alternate with clusters of T/A sequences along one DNA strand. Model telomeric G-clusters form four-stranded structures in presence of Na(I), K(I) and NH(4)(I) ions. Electrophoretic and spectroscopic studies were made with the telomeric related sequences d(T6G16) or d(G4T2G4T2G4T2G4). It was noticed earlier that G-quadruplex may either be inter-molecular, or intra-molecular, or a mixture of both. CD spectral characteristics of various G-quadruplex DNA suggests that the CD maximum at 293 nm corresponds to that of an intra-molecular G-quadruplex structure or hairpin dimers. Fluorescence titration studies also show that acridine and the bis-acridine are interacting with G-quadruplex DNA and destabilize the K(I)-quadruplex structure more efficiently than the quadruplex formed by NH(4)(I) ion. Among the two drugs studied, acridine is more capable of breaking the G-quadruplex structure than bis-acridine. This result is further confirmed by the CD experiments.
The molecular mechanism for interaction of ceruloplasmin and myeloperoxidase
NASA Astrophysics Data System (ADS)
Bakhautdin, Bakytzhan; Bakhautdin, Esen Göksöy
2016-04-01
Ceruloplasmin (Cp) is a copper-containing ferroxidase with potent antioxidant activity. Cp is expressed by hepatocytes and activated macrophages and has been known as physiologic inhibitor of myeloperoxidase (MPO). Enzymatic activity of MPO produces anti-microbial agents and strong prooxidants such as hypochlorous acid and has a potential to damage host tissue at the sites of inflammation and infection. Thus Cp-MPO interaction and inhibition of MPO has previously been suggested as an important control mechanism of excessive MPO activity. Our aim in this study was to identify minimal Cp domain or peptide that interacts with MPO. We first confirmed Cp-MPO interaction by ELISA and surface plasmon resonance (SPR). SPR analysis of the interaction yielded 30 nM affinity between Cp and MPO. We then designed and synthesized 87 overlapping peptides spanning the entire amino acid sequence of Cp. Each of the peptides was tested whether it binds to MPO by direct binding ELISA. Two of the 87 peptides, P18 and P76 strongly interacted with MPO. Amino acid sequence analysis of identified peptides revealed high sequence and structural homology between them. Further structural analysis of Cp's crystal structure by PyMOL software unfolded that both peptides represent surface-exposed sites of Cp and face nearly the same direction. To confirm our finding we raised anti-P18 antisera in rabbit and demonstrated that this antisera disrupts Cp-MPO binding and rescues MPO activity. Collectively, our results confirm Cp-MPO interaction and identify two nearly identical sites on Cp that specifically bind MPO. We propose that inhibition of MPO by Cp requires two nearly identical sites on Cp to bind homodimeric MPO simultaneously and at an angle of at least 120 degrees, which, in turn, exerts tension on MPO and results in conformational change.
Estrogen Receptor Folding Modulates cSrc Kinase SH2 Interaction via a Helical Binding Mode.
Nieto, Lidia; Tharun, Inga M; Balk, Mark; Wienk, Hans; Boelens, Rolf; Ottmann, Christian; Milroy, Lech-Gustav; Brunsveld, Luc
2015-11-20
The estrogen receptors (ERs) feature, next to their transcriptional role, important nongenomic signaling actions, with emerging clinical relevance. The Src Homology 2 (SH2) domain mediated interaction between cSrc kinase and ER plays a key role in this; however the molecular determinants of this interaction have not been elucidated. Here, we used phosphorylated ER peptide and semisynthetic protein constructs in a combined biochemical and structural study to, for the first time, provide a quantitative and structural characterization of the cSrc SH2-ER interaction. Fluorescence polarization experiments delineated the SH2 binding motif in the ER sequence. Chemical shift perturbation analysis by nuclear magnetic resonance (NMR) together with molecular dynamics (MD) simulations allowed us to put forward a 3D model of the ER-SH2 interaction. The structural basis of this protein-protein interaction has been compared with that of the high affinity SH2 binding sequence GpYEEI. The ER features a different binding mode from that of the "two-pronged plug two-hole socket" model in the so-called specificity determining region. This alternative binding mode is modulated via the folding of ER helix 12, a structural element directly C-terminal of the key phosphorylated tyrosine. The present findings provide novel molecular entries for understanding nongenomic ER signaling and targeting the corresponding disease states.
Ford, K G; Neidle, S
1995-06-01
The interactions of several porphyrins with a 74 base-pair DNA sequence have been examined by footprinting and chemical protection methods. Tetra-(4-N-methyl-(pyridyl)) porphyrin (TMPy), two of its metal complexes and tetra-(4-trimethylanilinium) porphyrin (TMAP) bind to closely similar AT-rich sequences. The three TMPy ligands produce modest changes in DNA structure and base accessibility on binding, in contrast to the large-scale conformational changes observed with TMAP. Molecular modelling studies have been performed on TMPy and TMAP bound in the AT-rich minor groove of an oligonucleotide. These have shown that significant structural change is needed to accommodate the bulky trimethyl substituent groups of TMAP, in contrast to the facile minor groove fit of TMPy.
[NMR structure and dynamics of the chimeric protein SH3-F2].
Kutyshenko, V P; Gushchina, L V; Khristoforov, V S; Prokhorov, D A; Timchenko, M A; Kudrevatykh, Iu A; Fediukina, D V; Filimonov, V V
2010-01-01
For the further elucidation of structural and dynamic principles of protein self-organization and protein-ligand interactions the design of new chimeric protein SH3-F2 was made and genetically engineered construct was created. The SH3-F2 amino acid sequence consists of polyproline ligand mgAPPLPPYSA, GG linker and the sequence of spectrin SH3 domain circular permutant S19-P20s. Structural and dynamics properties of the protein were studied by high-resolution NMR. According to NMR data the tertiary structure of the chimeric protein SH3-F2 has the topology which is typical of SH3 domains in the complex with the ligand, forming polyproline type II helix, located in the conservative region of binding in the orientation II. The polyproline ligand closely adjoins with the protein globule and is stabilized by hydrophobic interactions. However the interaction of ligand and the part of globule relative to SH3 domain is not too large because the analysis of protein dynamic characteristics points to the low amplitude, high-frequency ligand tumbling in relation to the slow intramolecular motions of the main globule. The constructed chimera permits to carry out further structural and thermodynamic investigations of polyproline helix properties and its interaction with regulatory domains.
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA.
Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A
2016-10-07
Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA*
Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A.
2016-01-01
Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. PMID:27563064
Structure-related statistical singularities along protein sequences: a correlation study.
Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro
2005-01-01
A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.
Miller, Thomas F.
2017-01-01
We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943
Mashiyama, Susan T.; Koupparis, Kyriacos; Caffrey, Conor R.; McKerrow, James H.; Babbitt, Patricia C.
2012-01-01
We performed a genome-level computational study of sequence and structure similarity, the latter using crystal structures and models, of the proteases of Homo sapiens and the human parasite Trypanosoma brucei. Using sequence and structure similarity networks to summarize the results, we constructed global views that show visually the relative abundance and variety of proteases in the degradome landscapes of these two species, and provide insights into evolutionary relationships between proteases. The results also indicate how broadly these sequence sets are covered by three-dimensional structures. These views facilitate cross-species comparisons and offer clues for drug design from knowledge about the sequences and structures of potential drug targets and their homologs. Two protease groups (“M32” and “C51”) that are very different in sequence from human proteases are examined in structural detail, illustrating the application of this global approach in mining new pathogen genomes for potential drug targets. Based on our analyses, a human ACE2 inhibitor was selected for experimental testing on one of these parasite proteases, TbM32, and was shown to inhibit it. These sequence and structure data, along with interactive versions of the protein similarity networks generated in this study, are available at http://babbittlab.ucsf.edu/resources.html. PMID:23236535
NASA Astrophysics Data System (ADS)
Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra
2016-05-01
A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.
Type III restriction-modification enzymes: a historical perspective.
Rao, Desirazu N; Dryden, David T F; Bheemanaik, Shivakumara
2014-01-01
Restriction endonucleases interact with DNA at specific sites leading to cleavage of DNA. Bacterial DNA is protected from restriction endonuclease cleavage by modifying the DNA using a DNA methyltransferase. Based on their molecular structure, sequence recognition, cleavage position and cofactor requirements, restriction-modification (R-M) systems are classified into four groups. Type III R-M enzymes need to interact with two separate unmethylated DNA sequences in inversely repeated head-to-head orientations for efficient cleavage to occur at a defined location (25-27 bp downstream of one of the recognition sites). Like the Type I R-M enzymes, Type III R-M enzymes possess a sequence-specific ATPase activity for DNA cleavage. ATP hydrolysis is required for the long-distance communication between the sites before cleavage. Different models, based on 1D diffusion and/or 3D-DNA looping, exist to explain how the long-distance interaction between the two recognition sites takes place. Type III R-M systems are found in most sequenced bacteria. Genome sequencing of many pathogenic bacteria also shows the presence of a number of phase-variable Type III R-M systems, which play a role in virulence. A growing number of these enzymes are being subjected to biochemical and genetic studies, which, when combined with ongoing structural analyses, promise to provide details for mechanisms of DNA recognition and catalysis.
Hsing, Michael; Cherkasov, Artem
2008-06-25
Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido
2018-05-23
Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Insight into the Structure of Amyloid Fibrils from the Analysis of Globular Proteins
Trovato, Antonio; Chiti, Fabrizio; Maritan, Amos; Seno, Flavio
2006-01-01
The conversion from soluble states into cross-β fibrillar aggregates is a property shared by many different proteins and peptides and was hence conjectured to be a generic feature of polypeptide chains. Increasing evidence is now accumulating that such fibrillar assemblies are generally characterized by a parallel in-register alignment of β-strands contributed by distinct protein molecules. Here we assume a universal mechanism is responsible for β-structure formation and deduce sequence-specific interaction energies between pairs of protein fragments from a statistical analysis of the native folds of globular proteins. The derived fragment–fragment interaction was implemented within a novel algorithm, prediction of amyloid structure aggregation (PASTA), to investigate the role of sequence heterogeneity in driving specific aggregation into ordered self-propagating cross-β structures. The algorithm predicts that the parallel in-register arrangement of sequence portions that participate in the fibril cross-β core is favoured in most cases. However, the antiparallel arrangement is correctly discriminated when present in fibrils formed by short peptides. The predictions of the most aggregation-prone portions of initially unfolded polypeptide chains are also in excellent agreement with available experimental observations. These results corroborate the recent hypothesis that the amyloid structure is stabilised by the same physicochemical determinants as those operating in folded proteins. They also suggest that side chain–side chain interaction across neighbouring β-strands is a key determinant of amyloid fibril formation and of their self-propagating ability. PMID:17173479
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bedford, Nicholas M.; Hughes, Zak E.; Tang, Zhenghua
Peptide-enabled nanoparticle (NP) synthesis routes can create and/or assemble functional nanomaterials under environmentally friendly conditions, with properties dictated by complex interactions at the biotic/abiotic interface. Manipulation of this interface through sequence modification can provide the capability for material properties to be tailored to create enhanced materials for energy, catalysis, and sensing applications. Fully realizing the potential of these materials requires a comprehensive understanding of sequence-dependent structure/function relationships that is presently lacking. In this work, the atomic-scale structures of a series of peptide-capped Au NPs are determined using a combination of atomic pair distribution function analysis of high-energy X-ray diffraction datamore » and advanced molecular dynamics (MD) simulations. The Au NPs produced with different peptide sequences exhibit varying degrees of catalytic activity for the exemplar reaction 4-nitrophenol reduction. The experimentally derived atomic-scale NP configurations reveal sequence-dependent differences in structural order at the NP surface. Replica exchange with solute-tempering MD simulations are then used to predict the morphology of the peptide overlayer on these Au NPs and identify factors determining the structure/catalytic properties relationship. We show that the amount of exposed Au surface, the underlying surface structural disorder, and the interaction strength of the peptide with the Au surface all influence catalytic performance. A simplified computational prediction of catalytic performance is developed that can potentially serve as a screening tool for future studies. Our approach provides a platform for broadening the analysis of catalytic peptide-enabled metallic NP systems, potentially allowing for the development of rational design rules for property enhancement.« less
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Gold, Matthew G.; Fowler, Douglas M.; Means, Christopher K.; Pawson, Catherine T.; Stephany, Jason J.; Langeberg, Lorene K.; Fields, Stanley; Scott, John D.
2013-01-01
PKA is retained within distinct subcellular environments by the association of its regulatory type II (RII) subunits with A-kinase anchoring proteins (AKAPs). Conventional reagents that universally disrupt PKA anchoring are patterned after a conserved AKAP motif. We introduce a phage selection procedure that exploits high-resolution structural information to engineer RII mutants that are selective for a particular AKAP. Selective RII (RSelect) sequences were obtained for eight AKAPs following competitive selection screening. Biochemical and cell-based experiments validated the efficacy of RSelect proteins for AKAP2 and AKAP18. These engineered proteins represent a new class of reagents that can be used to dissect the contributions of different AKAP-targeted pools of PKA. Molecular modeling and high-throughput sequencing analyses revealed the molecular basis of AKAP-selective interactions and shed new light on native RII-AKAP interactions. We propose that this structure-directed evolution strategy might be generally applicable for the investigation of other protein interaction surfaces. PMID:23625929
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
NMR and enzymology of modified DNA/protein interactions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kennedy, M.A.
1994-12-31
We have found distinct DNA structure and base dynamics precisely at the TpA cleavage site in the TTTAAA AHA III endonuclease restriction sequence. Hence, the unusual base stacking and mobility found in this sequence may be important to the mechanism of enzymatic cleavage of the phophodiester bond.
THGS: a web-based database of Transmembrane Helices in Genome Sequences
Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.
2004-01-01
Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375
Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA
Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev
2012-01-01
B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350
HomPPI: a class of sequence homology based protein-protein interface prediction methods
2011-01-01
Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895
Higgins, Matthew K; Carrington, Mark
2014-01-01
Trypanosoma and Plasmodium species are unicellular, eukaryotic pathogens that have evolved the capacity to survive and proliferate within a human host, causing sleeping sickness and malaria, respectively. They have very different survival strategies. African trypanosomes divide in blood and extracellular spaces, whereas Plasmodium species invade and proliferate within host cells. Interaction with host macromolecules is central to establishment and maintenance of an infection by both parasites. Proteins that mediate these interactions are under selection pressure to bind host ligands without compromising immune avoidance strategies. In both parasites, the expansion of genes encoding a small number of protein folds has established large protein families. This has permitted both diversification to form novel ligand binding sites and variation in sequence that contributes to avoidance of immune recognition. In this review we consider two such parasite surface protein families, one from each species. In each case, known structures demonstrate how extensive sequence variation around a conserved molecular architecture provides an adaptable protein scaffold that the parasites can mobilise to mediate interactions with their hosts. PMID:24442723
Predictive and comparative analysis of Ebolavirus proteins
Cong, Qian; Pei, Jimin; Grishin, Nick V
2015-01-01
Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus. PMID:26158395
Predictive and comparative analysis of Ebolavirus proteins.
Cong, Qian; Pei, Jimin; Grishin, Nick V
2015-01-01
Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus.
Interactions," Journal of Biomolecular Structure & Dynamics (2009) "Structure-Based Protocol for from left to right with several dots of multiple colors. "Cellulase Linkers Are Optimized Based on the Sequence and Structure of a Protein-Binding Peptide," Journal of the American Chemical
Interactions of DNA binding proteins with G-Quadruplex structures at the single molecule level
NASA Astrophysics Data System (ADS)
Ray, Sujay
Guanine-rich nucleic acid (DNA/RNA) sequences can form non-canonical secondary structures, known as G-quadruplex (GQ). Numerous in vivo and in vitro studies have demonstrated formation of these structures in telomeric and non-telomeric regions of the genome. Telomeric GQs protect the chromosome ends whereas non-telomeric GQs either act as road blocks or recognition sites for DNA metabolic machinery. These observations suggest the significance of these structures in regulation of different metabolic processes, such as replication and repair. GQs are typically thermodynamically more stable than the corresponding Watson-Crick base pairing formed by G-rich and C-rich strands, making protein activity a crucial factor for their destabilization. Inside the cell, GQs interact with different proteins and their enzymatic activity is the determining factor for their stability. We studied interactions of several proteins with GQs to understand the underlying principles of protein-GQ interactions using single-molecule FRET and other biophysical techniques. Replication Protein-A (RPA), a single stranded DNA (ssDNA) binding protein, is known to posses GQ unfolding activity. First, we compared the thermal stability of three potentially GQ-forming DNA sequences (PQS) to their stability against RPA-mediated unfolding. One of these sequences is the human telomeric repeat and the other two, located in the promoter region of tyrosine hydroxylase gene, are highly heterogeneous sequences that better represent PQS in the genome. The thermal stability of these structures do not necessarily correlate with their stability against protein-mediated unfolding. We conclude that thermal stability is not necessarily an adequate criterion for predicting the physiological viability of GQ structures. To determine the critical structural factors that influence protein-GQ interactions we studied two groups of GQ structures that have systematically varying loop lengths and number of G-tetrad layers. We observed a linear increase in the steady-state stability of the GQ against RPA-mediated unfolding with increasing number of layers or decreasing loop length. The stability demonstrated by different GQ structures varied by at least three orders of magnitude. Finally, we studied another protein-GQ system where a protein complex works synergistically with a GQ to suppress DNA damage signals by preventing RPA to bind to telomeric DNA. Human telomeres that terminate with a single-stranded 3' G-overhang can be recognized as a DNA damage site by RPA. The protection of telomere-1 (POT1) and POT1-interacting protein (TPP1) heterodimer, binds specifically to telomeric DNA and protects it against RPA binding. Using model telomeric DNA, we studied the competition between POT1/TPP1 and RPA to access telomeric GQs in vitro. Under physiological salt and pH conditions, POT1/TPP1 stably load to a minimal DNA sequence adjacent to a folded GQ and unfolds the anti-parallel GQ as the parallel conformation remains folded. We showed that GQ formation of telomeres enhances the ability of POT1/TPP1 to block RPA's access to telomeres by two orders of magnitude and contributes to suppress DNA damage signals.
Nair, Maya S; D'Mello, Samar; Pant, Rashmi; Poluri, Krishna Mohan
2017-05-01
Interactions of a natural stilbene compound, resveratrol with two DNA sequences containing AATT/TTAA segments have been studied. Resveratrol is found to interact with both the sequences. The mode of interaction has been studied using absorption, steady state fluorescence and circular dichroism spectroscopic techniques. UV-visible absorption and fluorescence studies provided the information regarding the binding constants and the stoichiometry of binding, whereas circular dichroism studies depicted the structural changes in DNA upon resveratrol binding. Our results evidenced that, though resveratrol showed similar affinity to both the sequences, the mode of interactions was different. The binding constants of resveratrol to AATT/TTAA sequences were found to be 7.55×10 5 M -1 and 5.42×10 5 M -1 respectively. Spectroscopic data evidenced for a groove binding interaction. Melting studies showed that the binding of resveratrol induces differential stability to the DNA sequences d(CGTTAACG) 2 and d(CGAATTCG) 2 . Fluorescence data showed a stoichiometry of 1:1 for d(CGAATTCG) 2 -resveratrol complex and 1:4 for d(CGTTAACG) 2 -resveratrol complex. Molecular docking studies demonstrated that resveratrol binds to the minor groove region of both the sequences to form stable complexes with varied atomic contacts to the DNA bases or backbone. Both the complexes are stabilized by hydrogen bond formation. Our results evidenced that modulation of DNA sequence within the same bases can greatly alter the binding geometry and stability of the complex upon binding to small molecule inhibitor compounds like resveratrol. Copyright © 2017 Elsevier B.V. All rights reserved.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Helix-packing motifs in membrane proteins.
Walters, R F S; DeGrado, W F
2006-09-12
The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd =1.5 A), allowing 90% of the library to be assigned to clusters consisting of at least five members. Surprisingly, three quarters of the helical pairs belong to one of five tightly clustered motifs whose structural features can be understood in terms of simple principles of helix-helix packing. Thus, the universe of common transmembrane helix-pairing motifs is relatively simple. The largest cluster, which comprises 29% of the library members, consists of an antiparallel motif with left-handed packing angles, and it is frequently stabilized by packing of small side chains occurring every seven residues in the sequence. Right-handed parallel and antiparallel structures show a similar tendency to segregate small residues to the helix-helix interface but spaced at four-residue intervals. Position-specific sequence propensities were derived for the most populated motifs. These structural and sequential motifs should be quite useful for the design and structural prediction of membrane proteins.
Jia, Min; Li, Jianchao; Zhu, Jinwei; Wen, Wenyu; Zhang, Mingjie; Wang, Wenning
2012-01-01
GoLoco (GL) motif-containing proteins regulate G protein signaling by binding to Gα subunit and acting as guanine nucleotide dissociation inhibitors. GLs of LGN are also known to bind the GDP form of Gαi/o during asymmetric cell division. Here, we show that the C-terminal GL domain of LGN binds four molecules of Gαi·GDP. The crystal structures of Gαi·GDP in complex with LGN GL3 and GL4, respectively, reveal distinct GL/Gαi interaction features when compared with the only high resolution structure known with GL/Gαi interaction between RGS14 and Gαi1. Only a few residues C-terminal to the conserved GL sequence are required for LGN GLs to bind to Gαi·GDP. A highly conserved “double Arg finger” sequence (RΨ(D/E)(D/E)QR) is responsible for LGN GL to bind to GDP bound to Gαi. Together with the sequence alignment, we suggest that the LGN GL/Gαi interaction represents a general binding mode between GL motifs and Gαi. We also show that LGN GLs are potent guanine nucleotide dissociation inhibitors. PMID:22952234
Trevino, R J; Gliubich, F; Berni, R; Cianci, M; Chirgwin, J M; Zanotti, G; Horowitz, P M
1999-05-14
The NH2-terminal sequence of rhodanese influences many of its properties, ranging from mitochondrial import to folding. Rhodanese truncated by >9 residues is degraded in Escherichia coli. Mutant enzymes with lesser truncations are recoverable and active, but they show altered active site reactivities (Trevino, R. J., Tsalkova, T., Dramer, G., Hardesty, B., Chirgwin, J. M., and Horowitz, P. M. (1998) J. Biol. Chem. 273, 27841-27847), suggesting that the NH2-terminal sequence stabilizes the overall structure. We tested aspects of the conformations of these shortened species. Intrinsic and probe fluorescence showed that truncation decreased stability and increased hydrophobic exposure, while near UV CD suggested altered tertiary structure. Under native conditions, truncated rhodanese bound to GroEL and was released and reactivated by adding ATP and GroES, suggesting equilibrium between native and non-native conformers. Furthermore, GroEL assisted folding of denatured mutants to the same extent as wild type, although at a reduced rate. X-ray crystallography showed that Delta1-7 crystallized isomorphously with wild type in polyethyleneglycol, and the structure was highly conserved. Thus, the missing NH2-terminal residues that contribute to global stability of the native structure in solution do not significantly alter contacts at the atomic level of the crystallized protein. The two-domain structure of rhodanese was not significantly altered by drastically different crystallization conditions or crystal packing suggesting rigidity of the native rhodanese domains and the stabilization of the interdomain interactions by the crystal environment. The results support a model in which loss of interactions near the rhodanese NH2 terminus does not distort the folded native structure but does facilitate the transition in solution to a molten globule state, which among other things, can interact with molecular chaperones.
Biophysical and structural considerations for protein sequence evolution
2011-01-01
Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites. Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model. PMID:22171550
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.
Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz
2015-01-01
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
CaMELS: In silico prediction of calmodulin binding proteins and their binding sites.
Abbasi, Wajid Arshad; Asif, Amina; Andleeb, Saiqa; Minhas, Fayyaz Ul Amir Afsar
2017-09-01
Due to Ca 2+ -dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet-lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet-lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large-margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM-binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome-wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif-based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub-sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels. © 2017 Wiley Periodicals, Inc.
MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions.
Li, Minghui; Simonetti, Franco L; Goncearenco, Alexander; Panchenko, Anna R
2016-07-08
Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of impacts of variants on proteins. To address this need we introduce a new computational method MutaBind to evaluate the effects of sequence variants and disease mutations on protein interactions and calculate the quantitative changes in binding affinity. The MutaBind method uses molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. The MutaBind server maps mutations on a structural protein complex, calculates the associated changes in binding affinity, determines the deleterious effect of a mutation, estimates the confidence of this prediction and produces a mutant structural model for download. MutaBind can be applied to a large number of problems, including determination of potential driver mutations in cancer and other diseases, elucidation of the effects of sequence variants on protein fitness in evolution and protein design. MutaBind is available at http://www.ncbi.nlm.nih.gov/projects/mutabind/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-02-06
Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-09-01
Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Wybenga-Groot, Leanne E; McGlade, C Jane
2013-12-01
The Src-like adaptor proteins (SLAP/SLAP2) are key components of Cbl-dependent downregulation of antigen receptor, cytokine receptor, and receptor tyrosine kinase signaling in hematopoietic cells. SLAP and SLAP2 consist of adjacent SH3 and SH2 domains that are most similar in sequence to Src family kinases (SFKs). Notably, the SH3-SH2 connector sequence is significantly shorter in SLAP/SLAP2 than in SFKs. To understand the structural implication of a short SH3-SH2 connector sequence, we solved the crystal structure of a protein encompassing the SH3 domain, SH3-SH2 connector, and SH2 domain of SLAP2 (SLAP2-32). While both domains adopt typical folds, the short SH3-SH2 connector places them in close association. Strand βe of the SH3 domain interacts with strand βA of the SH2 domain, resulting in the formation of a continuous β sheet that spans the length of the protein. Disruption of the SH3/SH2 interface through mutagenesis decreases SLAP-32 stability in vitro, consistent with inter-domain binding being an important component of SLAP2 structure and function. The canonical peptide binding pockets of the SH3 and SH2 domains are fully accessible, in contrast to other protein structures that display direct interaction between SH3 and SH2 domains, in which either peptide binding surface is obstructed by the interaction. Our results reveal potential sites of novel interaction for SH3 and SH2 domains, and illustrate the adaptability of SH2 and SH3 domains in mediating interactions. As well, our results suggest that the SH3 and SH2 domains of SLAP2 function interdependently, with implications on their mode of substrate binding. © 2013.
Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan
2016-10-07
RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
Hills, Ronald D.; Kathuria, Sagar V.; Wallace, Louise A.; Day, Iain J.; Brooks, Charles L.; Matthews, C. Robert
2010-01-01
The thermodynamic hypothesis of Anfinsen postulates that structures and stabilities of globular proteins are determined by their amino acid sequences. Chain topology, however, is known to influence the folding reaction, in that motifs with a preponderance of local interactions typically fold more rapidly than those with a larger fraction of non-local interactions. Together, the topology and sequence can modulate the energy landscape and influence the rate at which the protein folds to the native conformation. To explore the relationship of sequence and topology in the folding of βα–repeat proteins, which are dominated by local interactions, a combined experimental and simulation analysis was performed on two members of the flavodoxin-like, α/β/α sandwich fold. Spo0F and the N-terminal receiver domain of NtrC (NT-NtrC) have similar topologies but low sequence identity, enabling a test of the effects of sequence on folding. Experimental results demonstrated that both response-regulator proteins fold via parallel channels through highly structured sub-millisecond intermediates before accessing their cis prolyl peptide bond-containing native conformations. Global analysis of the experimental results preferentially places these intermediates off the productive folding pathway. Sequence-sensitive Gō-model simulations conclude that frustration in the folding in Spo0F, corresponding to the appearance of the off-pathway intermediate, reflects competition for intra-subdomain van der Waals contacts between its N- and C-terminal subdomains. The extent of transient, premature structure appears to correlate with the number of isoleucine, leucine and valine (ILV) side-chains that form a large sequence-local cluster involving the central β-sheet and helices α2, α3 and α4. The failure to detect the off-pathway species in the simulations of NT-NtrC may reflect the reduced number of ILV side-chains in its corresponding hydrophobic cluster. The location of the hydrophobic clusters in the structure may also be related to the differing functional properties of these response regulators. Comparison with the results of previous experimental and simulation analyses on the homologous CheY argues that prematurely-folded unproductive intermediates are a common property of the βα-repeat motif. PMID:20226790
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Horton, John R.; Zhang, Xing; Blumenthal, Robert M.
DNA adenine methyltransferase (Dam) is widespread and conserved among the γ-proteobacteria. Methylation of the Ade in GATC sequences regulates diverse bacterial cell functions, including gene expression, mismatch repair and chromosome replication. Dam also controls virulence in many pathogenic Gram-negative bacteria. An unexplained and perplexing observation about Escherichia coli Dam (EcoDam) is that there is no obvious relationship between the genes that are transcriptionally responsive to Dam and the promoter-proximal presence of GATC sequences. Here, we demonstrate that EcoDam interacts with a 5-base pair non-cognate sequence distinct from GATC. The crystal structure of a non-cognate complex allowed us to identify amore » DNA binding element, GTYTA/TARAC (where Y = C/T and R = A/G). This element immediately flanks GATC sites in some Dam-regulated promoters, including the Pap operon which specifies pyelonephritis-associated pili. In addition, Dam interacts with near-cognate GATC sequences (i.e. 3/4-site ATC and GAT). All together, these results imply that Dam, in addition to being responsible for GATC methylation, could also function as a methylation-independent transcriptional repressor.« less
Horton, John R.; Zhang, Xing; Blumenthal, Robert M.; ...
2015-04-06
DNA adenine methyltransferase (Dam) is widespread and conserved among the γ-proteobacteria. Methylation of the Ade in GATC sequences regulates diverse bacterial cell functions, including gene expression, mismatch repair and chromosome replication. Dam also controls virulence in many pathogenic Gram-negative bacteria. An unexplained and perplexing observation about Escherichia coli Dam (EcoDam) is that there is no obvious relationship between the genes that are transcriptionally responsive to Dam and the promoter-proximal presence of GATC sequences. Here, we demonstrate that EcoDam interacts with a 5-base pair non-cognate sequence distinct from GATC. The crystal structure of a non-cognate complex allowed us to identify amore » DNA binding element, GTYTA/TARAC (where Y = C/T and R = A/G). This element immediately flanks GATC sites in some Dam-regulated promoters, including the Pap operon which specifies pyelonephritis-associated pili. In addition, Dam interacts with near-cognate GATC sequences (i.e. 3/4-site ATC and GAT). All together, these results imply that Dam, in addition to being responsible for GATC methylation, could also function as a methylation-independent transcriptional repressor.« less
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Sequence Dependent Interactions Between DNA and Single-Walled Carbon Nanotubes
NASA Astrophysics Data System (ADS)
Roxbury, Daniel
It is known that single-stranded DNA adopts a helical wrap around a single-walled carbon nanotube (SWCNT), forming a water-dispersible hybrid molecule. The ability to sort mixtures of SWCNTs based on chirality (electronic species) has recently been demonstrated using special short DNA sequences that recognize certain matching SWCNTs of specific chirality. This thesis investigates the intricacies of DNA-SWCNT sequence-specific interactions through both experimental and molecular simulation studies. The DNA-SWCNT binding strengths were experimentally quantified by studying the kinetics of DNA replacement by a surfactant on the surface of particular SWCNTs. Recognition ability was found to correlate strongly with measured binding strength, e.g. DNA sequence (TAT)4 was found to bind 20 times stronger to the (6,5)-SWCNT than sequence (TAT)4T. Next, using replica exchange molecular dynamics (REMD) simulations, equilibrium structures formed by (a) single-strands and (b) multiple-strands of 12-mer oligonucleotides adsorbed on various SWCNTs were explored. A number of structural motifs were discovered in which the DNA strand wraps around the SWCNT and 'stitches' to itself via hydrogen bonding. Great variability among equilibrium structures was observed and shown to be directly influenced by DNA sequence and SWCNT type. For example, the (6,5)-SWCNT DNA recognition sequence, (TAT)4, was found to wrap in a tight single-stranded right-handed helical conformation. In contrast, DNA sequence T12 forms a beta-barrel left-handed structure on the same SWCNT. These are the first theoretical indications that DNA-based SWCNT selectivity can arise on a molecular level. In a biomedical collaboration with the Mayo Clinic, pathways for DNA-SWCNT internalization into healthy human endothelial cells were explored. Through absorbance spectroscopy, TEM imaging, and confocal fluorescence microscopy, we showed that intracellular concentrations of SWCNTs far exceeded those of the incubation solution, which suggested an energy-dependent pathway. Additionally, by means of pharmacological inhibition and vector-induced gene knockout studies, the DNA-SWCNTs were shown to enter the cells via Rac1-mediated macropinocytosis.
Online interactive analysis of protein structure ensembles with Bio3D-web.
Skjærven, Lars; Jariwala, Shashank; Yao, Xin-Qiu; Grant, Barry J
2016-11-15
Bio3D-web is an online application for analyzing the sequence, structure and conformational heterogeneity of protein families. Major functionality is provided for identifying protein structure sets for analysis, their alignment and refined structure superposition, sequence and structure conservation analysis, mapping and clustering of conformations and the quantitative comparison of their predicted structural dynamics. Bio3D-web is based on the Bio3D and Shiny R packages. All major browsers are supported and full source code is available under a GPL2 license from http://thegrantlab.org/bio3d-web CONTACT: bjgrant@umich.edu or lars.skjarven@uib.no. © The Author 2016. Published by Oxford University Press.
Random sequences generation through optical measurements by phase-shifting interferometry
NASA Astrophysics Data System (ADS)
François, M.; Grosges, T.; Barchiesi, D.; Erra, R.; Cornet, A.
2012-04-01
The development of new techniques for producing random sequences with a high level of security is a challenging topic of research in modern cryptographics. The proposed method is based on the measurement by phase-shifting interferometry of the speckle signals of the interaction between light and structures. We show how the combination of amplitude and phase distributions (maps) under a numerical process can produce random sequences. The produced sequences satisfy all the statistical requirements of randomness and can be used in cryptographic schemes.
The molecular dynamics of long noncoding RNA control of transcription in PTEN and its pseudogene
Lister, Nicholas; Shevchenko, Galina; Walshe, James L.; Groen, Jessica; Johnsson, Per; Vidarsdóttir, Linda; Grander, Dan; Ataide, Sandro F.; Morris, Kevin V.
2017-01-01
RNA has been found to interact with chromatin and modulate gene transcription. In human cells, little is known about how long noncoding RNAs (lncRNAs) interact with target loci in the context of chromatin. We find here, using the phosphatase and tensin homolog (PTEN) pseudogene as a model system, that antisense lncRNAs interact first with a 5′ UTR-containing promoter-spanning transcript, which is then followed by the recruitment of DNA methyltransferase 3a (DNMT3a), ultimately resulting in the transcriptional and epigenetic control of gene expression. Moreover, we find that the lncRNA and promoter-spanning transcript interaction are based on a combination of structural and sequence components of the antisense lncRNA. These observations suggest, on the basis of this one example, that evolutionary pressures may be placed on RNA structure more so than sequence conservation. Collectively, the observations presented here suggest a much more complex and vibrant RNA regulatory world may be operative in the regulation of gene expression. PMID:28847966
Molecular recognition of the Tes LIM2-3 domains by the actin-related protein Arp7A.
Boëda, Batiste; Knowles, Phillip P; Briggs, David C; Murray-Rust, Judith; Soriano, Erika; Garvalov, Boyan K; McDonald, Neil Q; Way, Michael
2011-04-01
Actin-related proteins (Arps) are a highly conserved family of proteins that have extensive sequence and structural similarity to actin. All characterized Arps are components of large multimeric complexes associated with chromatin or the cytoskeleton. In addition, the human genome encodes five conserved but largely uncharacterized "orphan" Arps, which appear to be mostly testis-specific. Here we show that Arp7A, which has 43% sequence identity with β-actin, forms a complex with the cytoskeletal proteins Tes and Mena in the subacrosomal layer of round spermatids. The N-terminal 65-residue extension to the actin-like fold of Arp7A interacts directly with Tes. The crystal structure of the 1-65(Arp7A)·LIM2-3(Tes)·EVH1(Mena) complex reveals that residues 28-49 of Arp7A contact the LIM2-3 domains of Tes. Two alanine residues from Arp7A that occupy equivalent apolar pockets in both LIM domains as well as an intervening GPAK linker that binds the LIM2-3 junction are critical for the Arp7A-Tes interaction. Equivalent occupied apolar pockets are also seen in the tandem LIM domain structures of LMO4 and Lhx3 bound to unrelated ligands. Our results indicate that apolar pocket interactions are a common feature of tandem LIM domain interactions, but ligand specificity is principally determined by the linker sequence.
Pley, H W; Flaherty, K M; McKay, D B
1994-11-03
In large structured RNAs, RNA hairpins in which the strands of the duplex stem are connected by a tetraloop of the consensus sequence 5'-GNRA (where N is any nucleotide, and R is either G or A) are unusually frequent. In group I introns there is a covariation in sequence between nucleotides in the third and fourth positions of the loop with specific distant base pairs in putative RNA duplex stems: GNAA loops correlate with successive 5'-C-C.G-C base pairs in stems, whereas GNGA loops correlate with 5'-C-U.G-A. This has led to the suggestion that GNRA tetraloops may be involved in specific long-range tertiary interactions, with each A in position 3 or 4 of the loop interacting with a C-G base pair in the duplex, and G in position 3 interacting with a U-A base pair. This idea is supported experimentally for the GAAA loop of the P5b extension of the group I intron of Tetrahymena thermophila and the L9 GUGA terminal loop of the td intron of bacteriophage T4 (ref. 4). NMR has revealed the overall structure of the tetraloop for 12-nucleotide hairpins with GCAA and GAAA loops and models have been proposed for the interaction of GNRA tetraloops with base pairs in the minor groove of A-form RNA. Here we describe the crystal structure of an intermolecular complex between a GAAA tetraloop and an RNA helix. The interactions we observe correlate with the specificity of GNRA tetraloops inferred from phylogenetic studies, suggesting that this complex is a legitimate model for intramolecular tertiary interactions mediated by GNRA tetraloops in large structured RNAs.
Jaffrey, S R; Haile, D J; Klausner, R D; Harford, J B
1993-09-25
To assess the influence of RNA sequence/structure on the interaction RNAs with the iron-responsive element binding protein (IRE-BP), twenty eight altered RNAs were tested as competitors for an RNA corresponding to the ferritin H chain IRE. All changes in the loop of the predicted IRE hairpin and in the unpaired cytosine residue characteristically found in IRE stems significantly decreased the apparent affinity of the RNA for the IRE-BP. Similarly, alteration in the spacing and/or orientation of the loop and the unpaired cytosine of the stem by either increasing or decreasing the number of base pairs separating them significantly reduced efficacy as a competitor. It is inferred that the IRE-BP forms multiple contacts with its cognate RNA, and that these contacts, acting in concert, provide the basis for the high affinity of this interaction.
Covalent attachment of TAT peptides and thiolated alkyl molecules on GaAs surfaces.
Cho, Youngnam; Ivanisevic, Albena
2005-07-07
Four TAT peptide fragments were used to functionalize GaAs surfaces by adsorption from solution. In addition, two well-studied alkylthiols, mercaptohexadecanoic acid (MHA) and 1-octadecanethiol (ODT) were utilized as references to understand the structure of the TAT peptide monolayer on GaAs. The different sequences of TAT peptides were employed in recognition experiments where a synthetic RNA sequence was tested to verify the specific interaction with the TAT peptide. The modified GaAs surfaces were characterized by atomic force microscopy (AFM), X-ray photoelectron spectroscopy (XPS), and Fourier transform infrared reflection absorption spectroscopy (FT-IRRAS). AFM studies were used to compare the surface roughness before and after functionalization. XPS allowed us to characterize the chemical composition of the GaAs surface and conclude that the monolayers composed of different sequences of peptides have similar surface chemistries. Finally, FT-IRRAS experiments enabled us to deduce that the TAT peptide monolayers have a fairly ordered and densely packed alkyl chain structure. The recognition experiments showed preferred interaction of the RNA sequence toward peptides with high arginine content.
Grace, Christy R.; Ferreira, Antonio M.; Waddell, M. Brett; Ridout, Granger; Naeve, Deanna; Leuze, Michael; LoCascio, Philip F.; Panetta, John C.; Wilkinson, Mark R.; Pui, Ching-Hon; Naeve, Clayton W.; Uberbacher, Edward C.; Bonten, Erik J.; Evans, William E.
2016-01-01
MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA) and typically down-regulating their stability or translation. Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence (i.e., NMR, FRET, SPR) that purine or pyrimidine-rich microRNAs of appropriate length and sequence form triple-helical structures with purine-rich sequences of duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show that several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 × 10−16) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. This work has thus revealed a new mechanism by which microRNAs could interact with gene promoter regions to modify gene transcription. PMID:26844769
StrBioLib: a Java library for development of custom computational structural biology applications.
Chandonia, John-Marc
2007-08-01
StrBioLib is a library of Java classes useful for developing software for computational structural biology research. StrBioLib contains classes to represent and manipulate protein structures, biopolymer sequences, sets of biopolymer sequences, and alignments between biopolymers based on either sequence or structure. Interfaces are provided to interact with commonly used bioinformatics applications, including (psi)-blast, modeller, muscle and Primer3, and tools are provided to read and write many file formats used to represent bioinformatic data. The library includes a general-purpose neural network object with multiple training algorithms, the Hooke and Jeeves non-linear optimization algorithm, and tools for efficient C-style string parsing and formatting. StrBioLib is the basis for the Pred2ary secondary structure prediction program, is used to build the astral compendium for sequence and structure analysis, and has been extensively tested through use in many smaller projects. Examples and documentation are available at the site below. StrBioLib may be obtained under the terms of the GNU LGPL license from http://strbio.sourceforge.net/
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.
Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao
2016-04-01
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.
Role of indirect readout mechanism in TATA box binding protein-DNA interaction.
Mondal, Manas; Choudhury, Devapriya; Chakrabarti, Jaydeb; Bhattacharyya, Dhananjay
2015-03-01
Gene expression generally initiates from recognition of TATA-box binding protein (TBP) to the minor groove of DNA of TATA box sequence where the DNA structure is significantly different from B-DNA. We have carried out molecular dynamics simulation studies of TBP-DNA system to understand how the DNA structure alters for efficient binding. We observed rigid nature of the protein while the DNA of TATA box sequence has an inherent flexibility in terms of bending and minor groove widening. The bending analysis of the free DNA and the TBP bound DNA systems indicate presence of some similar structures. Principal coordinate ordination analysis also indicates some structural features of the protein bound and free DNA are similar. Thus we suggest that the DNA of TATA box sequence regularly oscillates between several alternate structures and the one suitable for TBP binding is induced further by the protein for proper complex formation.
Structural determinants of nuclear export signal orientation in binding to exportin CRM1
Fung, Ho Yee Joyce; Fu, Szu -Chin; Brautigam, Chad A.; ...
2015-09-08
The Chromosome Region of Maintenance 1 (CRM1) protein mediates nuclear export of hundreds of proteins through recognition of their nuclear export signals (NESs), which are highly variable in sequence and structure. The plasticity of the CRM1-NES interaction is not well understood, as there are many NES sequences that seem incompatible with structures of the NES-bound CRM1 groove. Crystal structures of CRM1 bound to two different NESs with unusual sequences showed the NES peptides binding the CRM1 groove in the opposite orientation (minus) to that of previously studied NESs (plus). A comparison of minus and plus NESs identified structural and sequencemore » determinants for NES orientation. The binding of NESs to CRM1 in both orientations results in a large expansion in NES consensus patterns and therefore a corresponding expansion of potential NESs in the proteome.« less
Searching RNA motifs and their intermolecular contacts with constraint networks.
Thébault, P; de Givry, S; Schiex, T; Gaspin, C
2006-09-01
Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl.
Gherghe, Cristina; Lombo, Tania; Leonard, Christopher W.; Datta, Siddhartha A. K.; Bess, Julian W.; Gorelick, Robert J.; Rein, Alan; Weeks, Kevin M.
2010-01-01
All retroviral genomic RNAs contain a cis-acting packaging signal by which dimeric genomes are selectively packaged into nascent virions. However, it is not understood how Gag (the viral structural protein) interacts with these signals to package the genome with high selectivity. We probed the structure of murine leukemia virus RNA inside virus particles using SHAPE, a high-throughput RNA structure analysis technology. These experiments showed that NC (the nucleic acid binding domain derived from Gag) binds within the virus to the sequence UCUG-UR-UCUG. Recombinant Gag and NC proteins bound to this same RNA sequence in dimeric RNA in vitro; in all cases, interactions were strongest with the first U and final G in each UCUG element. The RNA structural context is critical: High-affinity binding requires base-paired regions flanking this motif, and two UCUG-UR-UCUG motifs are specifically exposed in the viral RNA dimer. Mutating the guanosine residues in these two motifs—only four nucleotides per genomic RNA—reduced packaging 100-fold, comparable to the level of nonspecific packaging. These results thus explain the selective packaging of dimeric RNA. This paradigm has implications for RNA recognition in general, illustrating how local context and RNA structure can create information-rich recognition signals from simple single-stranded sequence elements in large RNAs. PMID:20974908
Adaptability of Protein Structures to Enable Functional Interactions and Evolutionary Implications
Haliloglu, Turkan; Bahar, Ivet
2015-01-01
Several studies in recent years have drawn attention to the ability of proteins to adapt to intermolecular interactions by conformational changes along structure-encoded collective modes of motions. These so-called soft modes, primarily driven by entropic effects, facilitate, if not enable, functional interactions. They represent excursions on the conformational space along principal low-ascent directions/paths away from the original free energy minimum, and they are accessible to the protein even prior to protein-protein/ligand interactions. An emerging concept from these studies is the evolution of structures or modular domains to favor such modes of motion that will be recruited or integrated for enabling functional interactions. Structural dynamics, including the allosteric switches in conformation that are often stabilized upon formation of complexes and multimeric assemblies, emerge as key properties that are evolutionarily maintained to accomplish biological activities, consistent with the paradigm sequence → structure → dynamics → function where ‘dynamics’ bridges structure and function. PMID:26254902
Helix Unwinding and Base Flipping Enable Human MTERF1 to Terminate Mitochondrial Transcription
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yakubovskaya, E.; Mejia, E; Byrnes, J
2010-01-01
Defects in mitochondrial gene expression are associated with aging and disease. Mterf proteins have been implicated in modulating transcription, replication and protein synthesis. We have solved the structure of a member of this family, the human mitochondrial transcriptional terminator MTERF1, bound to dsDNA containing the termination sequence. The structure indicates that upon sequence recognition MTERF1 unwinds the DNA molecule, promoting eversion of three nucleotides. Base flipping is critical for stable binding and transcriptional termination. Additional structural and biochemical results provide insight into the DNA binding mechanism and explain how MTERF1 recognizes its target sequence. Finally, we have demonstrated that themore » mitochondrial pathogenic G3249A and G3244A mutations interfere with key interactions for sequence recognition, eliminating termination. Our results provide insight into the role of mterf proteins and suggest a link between mitochondrial disease and the regulation of mitochondrial transcription.« less
Structure and function of the UV-B photoreceptor UVR8.
Jenkins, Gareth I
2014-12-01
UVR8 is a UV-B photoreceptor that employs specific tryptophans in its primary sequence as chromophores in photoreception. UV-B absorption causes dissociation of the dimeric photoreceptor by neutralizing interactions between monomers. The monomeric form initiates signalling through interaction with the COP1 protein, leading to transcriptional responses. This article discusses the structural basis of UVR8 function, highlighting recent research on the mechanism of photoreception and on interactions with other proteins involved in signalling and regulation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Atomic interaction networks in the core of protein domains and their native folds.
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram
2010-02-23
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.
Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram
2010-01-01
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337
Lahr, Roni M; Mack, Seshat M; Héroux, Annie; Blagden, Sarah P; Bousquet-Antonelli, Cécile; Deragon, Jean-Marc; Berman, Andrea J
2015-09-18
La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Lahr, Roni M.; Mack, Seshat M.; Heroux, Annie; ...
2015-07-22
La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. Amore » putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. Ultimately, these studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.« less
Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H
2014-11-19
Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.
Mansuroglu, Z; Josse, T; Gilleron, J; Billecocq, A; Leger, P; Bouloy, M; Bonnefoy, E
2010-01-01
Rift Valley fever virus (RVFV) is an emerging, highly pathogenic virus; RVFV infection can lead to encephalitis, retinitis, or fatal hepatitis associated with hemorrhagic fever in humans, as well as death, abortions, and fetal deformities in animals. RVFV nonstructural NSs protein, a major factor of the virulence, forms filamentous structures in the nuclei of infected cells. In order to further understand RVFV pathology, we investigated, by chromatin immunoprecipitation, immunofluorescence, fluorescence in situ hybridization, and confocal microscopy, the capacity of NSs to interact with the host genome. Our results demonstrate that even though cellular DNA is predominantly excluded from NSs filaments, NSs interacts with some specific DNA regions of the host genome such as clusters of pericentromeric gamma-satellite sequence. Targeting of these sequences by NSs was correlated with the induction of chromosome cohesion and segregation defects in RVFV-infected murine, as well as sheep cells. Using recombinant nonpathogenic virus rZHDeltaNSs210-230, expressing a NSs protein deleted of its region of interaction with cellular factor SAP30, we showed that the NSs-SAP30 interaction was essential for NSs to target pericentromeric sequences, as well as for induction of chromosome segregation defects. The effect of RVFV upon the inheritance of genetic information is discussed with respect to the pathology associated with fetal deformities and abortions, highlighting the main role played by cellular cofactor SAP30 on the establishment of NSs interactions with host DNA sequences and RVFV pathogenesis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shevtsov, M. B.; Streeter, S. D.; Thresh, S.-J.
2015-02-01
The structure of the new class of controller proteins (exemplified by C.Csp231I) in complex with its 21 bp DNA-recognition sequence is presented, and the molecular basis of sequence recognition in this class of proteins is discussed. An unusual extended spacer between the dimer binding sites suggests a novel interaction between the two C-protein dimers. In a wide variety of bacterial restriction–modification systems, a regulatory ‘controller’ protein (or C-protein) is required for effective transcription of its own gene and for transcription of the endonuclease gene found on the same operon. We have recently turned our attention to a new class ofmore » controller proteins (exemplified by C.Csp231I) that have quite novel features, including a much larger DNA-binding site with an 18 bp (∼60 Å) spacer between the two palindromic DNA-binding sequences and a very different recognition sequence from the canonical GACT/AGTC. Using X-ray crystallography, the structure of the protein in complex with its 21 bp DNA-recognition sequence was solved to 1.8 Å resolution, and the molecular basis of sequence recognition in this class of proteins was elucidated. An unusual aspect of the promoter sequence is the extended spacer between the dimer binding sites, suggesting a novel interaction between the two C-protein dimers when bound to both recognition sites correctly spaced on the DNA. A U-bend model is proposed for this tetrameric complex, based on the results of gel-mobility assays, hydrodynamic analysis and the observation of key contacts at the interface between dimers in the crystal.« less
Template-Based Modeling of Protein-RNA Interactions.
Zheng, Jinfang; Kundrotas, Petras J; Vakser, Ilya A; Liu, Shiyong
2016-09-01
Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes.
Figure Structure, Figure Action, and Framing in Drawings by American and Egyptian Children.
ERIC Educational Resources Information Center
Wilson, Brent; Wilson, Marjorie
1979-01-01
The purpose of this study is to investigate the interaction of biological unfolding and culturally related factors on sequences of narrative figure drawings by American and Egyptian elementary students. Findings support hypotheses relating to the interaction of natural and nurtural influences on children's drawings. (Author/SJL)
Relational Algebra and SQL: Better Together
ERIC Educational Resources Information Center
McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart
2013-01-01
In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…
RACER a Coarse-Grained RNA Model for Capturing Folding Free Energy in Molecular Dynamics Simulations
NASA Astrophysics Data System (ADS)
Cheng, Sara; Bell, David; Ren, Pengyu
RACER is a coarse-grained RNA model that can be used in molecular dynamics simulations to predict native structures and sequence-specific variation of free energy of various RNA structures. RACER is capable of accurate prediction of native structures of duplexes and hairpins (average RMSD of 4.15 angstroms), and RACER can capture sequence-specific variation of free energy in excellent agreement with experimentally measured stabilities (r-squared =0.98). The RACER model implements a new effective non-bonded potential and re-parameterization of hydrogen bond and Debye-Huckel potentials. Insights from the RACER model include the importance of treating pairing and stacking interactions separately in order to distinguish folded an unfolded states and identification of hydrogen-bonding, base stacking, and electrostatic interactions as essential driving forces for RNA folding. Future applications of the RACER model include predicting free energy landscapes of more complex RNA structures and use of RACER for multiscale simulations.
Study of Binding Interaction between Pif80 Protein Fragment and Aragonite
NASA Astrophysics Data System (ADS)
Du, Yuan-Peng; Chang, Hsun-Hui; Yang, Sheng-Yu; Huang, Shing-Jong; Tsai, Yu-Ju; Huang, Joseph Jen-Tse; Chan, Jerry Chun Chung
2016-08-01
Pif is a crucial protein for the formation of the nacreous layer in Pinctada fucata. Three non-acidic peptide fragments of the aragonite-binding domain (Pif80) are selected, which contain multiple copies of the repeat sequence DDRK, to study the interaction between non-acidic peptides and aragonite. The polypeptides DDRKDDRKGGK (Pif80-11) and DDRKDDRKGGKDDRKDDRKGGK (Pif80-22) have similar binding affinity to aragonite. Solid-state NMR data indicate that the backbones of Pif80-11 and Pif80-22 peptides bound on aragonite adopt a random-coil conformation. Pif80-11 is a lot more effective than Pif80-22 in promoting the nucleation of aragonite on the substrate of β-chitin. Our results suggest that the structural arrangement at a protein-mineral interface depends on the surface structure of the mineral substrate and the protein sequence. The side chains of the basic residues, which function as anchors to the aragonite surface, have uniform structures. The role of basic residues as anchors in protein-mineral interaction may play an important role in biomineralization.
A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP
ERIC Educational Resources Information Center
Medin, Carey L.; Nolin, Katie L.
2011-01-01
Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…
TIA-1 RRM23 binding and recognition of target oligonucleotides
Waris, Saboora; García-Mauriño, Sofía M.; Sivakumaran, Andrew; Beckham, Simone A.; Loughlin, Fionna E.; Gorospe, Myriam; Díaz-Moreno, Irene; Wilce, Matthew C.J.
2017-01-01
Abstract TIA-1 (T-cell restricted intracellular antigen-1) is an RNA-binding protein involved in splicing and translational repression. It mainly interacts with RNA via its second and third RNA recognition motifs (RRMs), with specificity for U-rich sequences directed by RRM2. It has recently been shown that RRM3 also contributes to binding, with preferential binding for C-rich sequences. Here we designed UC-rich and CU-rich 10-nt sequences for engagement of both RRM2 and RRM3 and demonstrated that the TIA-1 RRM23 construct preferentially binds the UC-rich RNA ligand (5΄-UUUUUACUCC-3΄). Interestingly, this binding depends on the presence of Lys274 that is C-terminal to RRM3 and binding to equivalent DNA sequences occurs with similar affinity. Small-angle X-ray scattering was used to demonstrate that, upon complex formation with target RNA or DNA, TIA-1 RRM23 adopts a compact structure, showing that both RRMs engage with the target 10-nt sequences to form the complex. We also report the crystal structure of TIA-1 RRM2 in complex with DNA to 2.3 Å resolution providing the first atomic resolution structure of any TIA protein RRM in complex with oligonucleotide. Together our data support a specific mode of TIA-1 RRM23 interaction with target oligonucleotides consistent with the role of TIA-1 in binding RNA to regulate gene expression. PMID:28184449
TIA-1 RRM23 binding and recognition of target oligonucleotides.
Waris, Saboora; García-Mauriño, Sofía M; Sivakumaran, Andrew; Beckham, Simone A; Loughlin, Fionna E; Gorospe, Myriam; Díaz-Moreno, Irene; Wilce, Matthew C J; Wilce, Jacqueline A
2017-05-05
TIA-1 (T-cell restricted intracellular antigen-1) is an RNA-binding protein involved in splicing and translational repression. It mainly interacts with RNA via its second and third RNA recognition motifs (RRMs), with specificity for U-rich sequences directed by RRM2. It has recently been shown that RRM3 also contributes to binding, with preferential binding for C-rich sequences. Here we designed UC-rich and CU-rich 10-nt sequences for engagement of both RRM2 and RRM3 and demonstrated that the TIA-1 RRM23 construct preferentially binds the UC-rich RNA ligand (5΄-UUUUUACUCC-3΄). Interestingly, this binding depends on the presence of Lys274 that is C-terminal to RRM3 and binding to equivalent DNA sequences occurs with similar affinity. Small-angle X-ray scattering was used to demonstrate that, upon complex formation with target RNA or DNA, TIA-1 RRM23 adopts a compact structure, showing that both RRMs engage with the target 10-nt sequences to form the complex. We also report the crystal structure of TIA-1 RRM2 in complex with DNA to 2.3 Å resolution providing the first atomic resolution structure of any TIA protein RRM in complex with oligonucleotide. Together our data support a specific mode of TIA-1 RRM23 interaction with target oligonucleotides consistent with the role of TIA-1 in binding RNA to regulate gene expression. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.
2015-01-01
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714
LncRNA Structural Characteristics in Epigenetic Regulation
Wang, Chenguang; Wang, Lianzong; Ding, Yu; Lu, Xiaoyan; Zhang, Guosi; Yang, Jiaxin; Zheng, Hewei; Wang, Hong; Jiang, Yongshuai; Xu, Liangde
2017-01-01
The rapid development of new generation sequencing technology has deepened the understanding of genomes and functional products. RNA-sequencing studies in mammals show that approximately 85% of the DNA sequences have RNA products, for which the length greater than 200 nucleotides (nt) is called long non-coding RNAs (lncRNA). LncRNAs now have been shown to play important epigenetic regulatory roles in key molecular processes, such as gene expression, genetic imprinting, histone modification, chromatin dynamics, and other activities by forming specific structures and interacting with all kinds of molecules. This paper mainly discusses the correlation between the structure and function of lncRNAs with the recent progress in epigenetic regulation, which is important to the understanding of the mechanism of lncRNAs in physiological and pathological processes. PMID:29292750
Paugh, Steven W.; Coss, David R.; Bao, Ju; ...
2016-02-04
MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA). Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence that microRNAs form triple-helical structures with duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show thatmore » several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 x 10 -16) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. As a result, this work has thus revealed a new mechanism by which microRNAs can interact with gene promoter regions to modify gene transcription.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paugh, Steven W.; Coss, David R.; Bao, Ju
MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA). Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence that microRNAs form triple-helical structures with duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show thatmore » several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 x 10 -16) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. As a result, this work has thus revealed a new mechanism by which microRNAs can interact with gene promoter regions to modify gene transcription.« less
Intramolecular triple helix as a model for regular polyribonucleotide (CAA)(n).
Efimov, Alexander V; Spirin, Alexander S
2009-10-09
The regular (CAA)(n) polyribonucleotide, as well as the omega leader sequence containing (CAA)-rich core, have recently been shown to form cooperatively melted and compact structures. In this report, we propose a structural model for the (CAA)(n) sequence in which the polyribonucleotide chain is folded upon itself, so that it forms an intramolecular triple helix. The triple helix is stabilized by hydrogen bonding between bases thus forming coplanar triads, and by stacking interactions between the base triads. A distinctive feature of the proposed triple helix is that it does not contain the canonical double-helix elements. The difference from the known triple helices is that Watson-Crick hydrogen bond pairings do not take place in the interactions between the bases within the base triads.
Prediction of RNA secondary structures: from theory to models and real molecules
NASA Astrophysics Data System (ADS)
Schuster, Peter
2006-05-01
RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA molecules can be modelled readily at the secondary structure level and allows prediction of the most stable (mfe) conformations of complexes together with suboptimal states. Cofolding algorithms are important tools for efficient and highly specific primer design in the polymerase chain reaction (PCR) and help to explain the mechanisms of small interference RNA (si-RNA) molecules in gene regulation. The evolutionary optimization of RNA structures is illustrated by the search for a target structure and mimics aptamer selection in evolutionary biotechnology. It occurs typically in steps consisting of short adaptive phases interrupted by long epochs of little or no obvious progress in optimization. During these quasi-stationary epochs the populations are essentially confined to neutral networks where they search for sequences that allow a continuation of the adaptive process. Modelling RNA evolution as a simultaneous process in sequence and shape space provides answers to questions of the optimal population size and mutation rates. Kinetic folding is a stochastic process in conformation space. Exact solutions are derived by direct simulation in the form of trajectory sampling or by solving the master equation. The exact solutions can be approximated straightforwardly by Arrhenius kinetics on barrier trees, which represent simplified versions of conformational energy landscapes. The existence of at least one sequence forming any arbitrarily chosen pair of structures is granted by the intersection theorem. Folding kinetics is the key to understanding and designing multistable RNA molecules or RNA switches. These RNAs form two or more long lived conformations, and conformational changes occur either spontaneously or are induced through binding of small molecules or other biopolymers. RNA switches are found in nature where they act as elements in genetic and metabolic regulation. The reliability of RNA secondary structure prediction is limited by the accuracy with which the empirical parameters can be determined and by principal deficiencies, for example by the lack of energy contributions resulting from tertiary interactions. In addition, native structures may be determined by folding kinetics rather than by thermodynamics. We address the first problem by considering base pair probabilities or base pairing entropies, which are derived from the partition function of conformations. A high base pair probability corresponding to a low pairing entropy is taken as an indicator of a high reliability of prediction. Pseudoknots are discussed as an example of a tertiary interaction that is highly important for RNA function. Moreover, pseudoknot formation is readily incorporated into structure prediction algorithms. Some examples of experimental data on RNA secondary structures that are readily explained using the landscape concept are presented. They deal with (i) properties of RNA molecules with random sequences, (ii) RNA molecules from restricted alphabets, (iii) existence of neutral networks, (iv) shape space covering, (v) riboswitches and (vi) evolution of non-coding RNAs as an example of evolution restricted to neutral networks.
Thermodynamics of RNA structures by Wang–Landau sampling
Lou, Feng; Clote, Peter
2010-01-01
Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures. Results: A non-Boltzmannian Monte Carlo algorithm was designed by Wang and Landau to estimate the density of states for complex systems, such as the Ising model, that exhibit a phase transition. In this article, we apply the Wang-Landau (WL) method to compute the density of states for secondary structures of a given RNA sequence, and for hybridizations of two RNA sequences. Our method is shown to be much faster than existent software, such as RNAsubopt. From density of states, we compute the partition function over all secondary structures and over all pseudoknot-free hybridizations. The advantage of the WL method is that by adding a function to evaluate the free energy of arbitary pseudoknotted structures and of arbitrary hybridizations, we can estimate thermodynamic parameters for situations known to be NP-complete. This extension to pseudoknots will be made in the sequel to this article; in contrast, the current article describes the WL algorithm applied to pseudoknot-free secondary structures and hybridizations. Availability: The WL RNA hybridization web server is under construction at http://bioinformatics.bc.edu/clotelab/. Contact: clote@bc.edu PMID:20529917
Protein 3D Structure Computed from Evolutionary Sequence Variation
Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris
2011-01-01
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331
T box riboswitches in Actinobacteria: Translational regulation via novel tRNA interactions
Sherwood, Anna V.; Grundy, Frank J.; Henkin, Tina M.
2015-01-01
The T box riboswitch regulates many amino acid-related genes in Gram-positive bacteria. T box riboswitch-mediated gene regulation was shown previously to occur at the level of transcription attenuation via structural rearrangements in the 5′ untranslated (leader) region of the mRNA in response to binding of a specific uncharged tRNA. In this study, a novel group of isoleucyl-tRNA synthetase gene (ileS) T box leader sequences found in organisms of the phylum Actinobacteria was investigated. The Stem I domains of these RNAs lack several highly conserved elements that are essential for interaction with the tRNA ligand in other T box RNAs. Many of these RNAs were predicted to regulate gene expression at the level of translation initiation through tRNA-dependent stabilization of a helix that sequesters a sequence complementary to the Shine–Dalgarno (SD) sequence, thus freeing the SD sequence for ribosome binding and translation initiation. We demonstrated specific binding to the cognate tRNAIle and tRNAIle-dependent structural rearrangements consistent with regulation at the level of translation initiation, providing the first biochemical demonstration, to our knowledge, of translational regulation in a T box riboswitch. PMID:25583497
Growth of equilibrium structures built from a large number of distinct component types.
Hedges, Lester O; Mannige, Ranjan V; Whitelam, Stephen
2014-09-14
We use simple analytic arguments and lattice-based computer simulations to study the growth of structures made from a large number of distinct component types. Components possess 'designed' interactions, chosen to stabilize an equilibrium target structure in which each component type has a defined spatial position, as well as 'undesigned' interactions that allow components to bind in a compositionally-disordered way. We find that high-fidelity growth of the equilibrium target structure can happen in the presence of substantial attractive undesigned interactions, as long as the energy scale of the set of designed interactions is chosen appropriately. This observation may help explain why equilibrium DNA 'brick' structures self-assemble even if undesigned interactions are not suppressed [Ke et al. Science, 338, 1177, (2012)]. We also find that high-fidelity growth of the target structure is most probable when designed interactions are drawn from a distribution that is as narrow as possible. We use this result to suggest how to choose complementary DNA sequences in order to maximize the fidelity of multicomponent self-assembly mediated by DNA. We also comment on the prospect of growing macroscopic structures in this manner.
Salt bridges: geometrically specific, designable interactions.
Donald, Jason E; Kulp, Daniel W; DeGrado, William F
2011-03-01
Salt bridges occur frequently in proteins, providing conformational specificity and contributing to molecular recognition and catalysis. We present a comprehensive analysis of these interactions in protein structures by surveying a large database of protein structures. Salt bridges between Asp or Glu and His, Arg, or Lys display extremely well-defined geometric preferences. Several previously observed preferences are confirmed, and others that were previously unrecognized are discovered. Salt bridges are explored for their preferences for different separations in sequence and in space, geometric preferences within proteins and at protein-protein interfaces, co-operativity in networked salt bridges, inclusion within metal-binding sites, preference for acidic electrons, apparent conformational side chain entropy reduction on formation, and degree of burial. Salt bridges occur far more frequently between residues at close than distant sequence separations, but, at close distances, there remain strong preferences for salt bridges at specific separations. Specific types of complex salt bridges, involving three or more members, are also discovered. As we observe a strong relationship between the propensity to form a salt bridge and the placement of salt-bridging residues in protein sequences, we discuss the role that salt bridges might play in kinetically influencing protein folding and thermodynamically stabilizing the native conformation. We also develop a quantitative method to select appropriate crystal structure resolution and B-factor cutoffs. Detailed knowledge of these geometric and sequence dependences should aid de novo design and prediction algorithms. Copyright © 2010 Wiley-Liss, Inc.
Predictive Bcl-2 Family Binding Models Rooted in Experiment or Structure
DeBartolo, Joe; Dutta, Sanjib; Reich, Lothar; Keating, Amy E.
2013-01-01
Proteins of the Bcl-2 family either enhance or suppress programmed cell death and are centrally involved in cancer development and resistance to chemotherapy. BH3 (Bcl-2 homology 3)-only Bcl-2 proteins promote cell death by docking an α-helix into a hydrophobic groove on the surface of one or more of five pro-survival Bcl-2 receptor proteins. There is high structural homology within the pro-death and pro-survival families, yet a high degree of interaction specificity is nevertheless encoded, posing an interesting and important molecular recognition problem. Understanding protein features that dictate Bcl-2 interaction specificity is critical for designing peptide-based cancer therapeutics and diagnostics. In this study, we present peptide SPOT arrays and deep sequencing data from yeast display screening experiments that significantly expand the BH3 sequence space that has been experimentally tested for interaction with five human anti-apoptotic receptors. These data provide rich information about the determinants of Bcl-2 family specificity. To interpret and use the information, we constructed two simple data-based models that can predict affinity and specificity when evaluated on independent data sets within a limited sequence space. We also constructed a novel structure-based statistical potential, called STATIUM, which is remarkably good at predicting Bcl-2 affinity and specificity, especially considering it is not trained on experimental data. We compare the performance of our three models to each other and to alternative structure-based methods and discuss how such tools can guide prediction and design of new Bcl-2 family complexes. PMID:22617328
NASA Astrophysics Data System (ADS)
Brown, Nathaniel James Swanton
While there is consensus that conceptual change is surprisingly difficult, many competing theories of conceptual change co-exist in the literature. This dissertation argues that this discord is partly the result of an inadequate account of the unwritten rules of human social interaction that underlie the field's preferred methodology---semi-structured interviewing. To better understand the contributions of interaction during explanations, I analyze eight undergraduate general chemistry students as they attempt to explain to various people, for various reasons, why phenomena involving chemical phase equilibrium occur. Using the methods of interaction analysis, I characterize the unwritten, but systematic, rules that these participants follow as they explain. The result is a description of the contributions of interaction to explaining. Each step in each explanation is a jointly performed expression of a subject-predicate relation, an interactive accomplishment I call an information performance (in-form, for short). Unlike clauses, in-forms need not have a coherent grammatical structure. Unlike speaker turns, in-forms have the clear function of expressing information. Unlike both clauses and speaker turns, in-forms are a co-construction, jointly performed by both the primary speaker and the other interlocutor. The other interlocutor strongly affects the form and content of each explanation by giving or withholding feedback at the end of each in-form, moments I call feedback-relevant places. While in-forms are the bricks out of which the explanation is constructed, they are secured by a series of inferential links I call an illative sequence. Illative sequences are forward-searching, starting with a remembered fact or observation and following a chain of inferences in the hope it leads to the target phenomenon. The participants treat an explanation as a success if the illative sequence generates an in-form that describes the phenomenon. If the illative sequence does not, it is partly or entirely scrubbed, a new in-form is introduced as a starting point, and the illative sequence begins anew. Knowledge of these interactional contributions to the production of explanations could allow researchers to better characterize conceptual understanding, be in a stronger position to support particular theories of conceptual change over others, improve assessments of conceptual understanding, and improve interviewing practices.
Wiedmann, Mareike M; Aibara, Shintaro; Spring, David R; Stewart, Murray; Brenton, James D
2016-09-01
The transcription factor hepatocyte nuclear factor 1β (HNF1β) is ubiquitously overexpressed in ovarian clear cell carcinoma (CCC) and is a potential therapeutic target. To explore potential approaches that block HNF1β transcription we have identified and characterised extensively the nuclear localisation signal (NLS) for HNF1β and its interactions with the nuclear protein import receptor, Importin-α. Pull-down assays demonstrated that the DNA binding domain of HNF1β interacted with a spectrum of Importin-α isoforms and deletion constructs tagged with eGFP confirmed that the HNF1β (229)KKMRRNR(235) sequence was essential for nuclear localisation. We further characterised the interaction between the NLS and Importin-α using complementary biophysical techniques and have determined the 2.4Å resolution crystal structure of the HNF1β NLS peptide bound to Importin-α. The functional, biochemical, and structural characterisation of the nuclear localisation signal present on HNF1β and its interaction with the nuclear import protein Importin-α provide the basis for the development of compounds targeting transcription factor HNF1β via its nuclear import pathway. Copyright © 2016. Published by Elsevier Inc.
Ternary metal-rich sulfide with a layered structure
Franzen, Hugo F.; Yao, Xiaoqiang
1993-08-17
A ternary Nb-Ta-S compound is provided having the atomic formula, Nb.sub.1.72 Ta.sub.3.28 S.sub.2, and exhibiting a layered structure in the sequence S-M3-M2-M1-M2-M3-S wherein S represents sulfur layers and M1, M2, and M3 represent Nb/Ta mixed metal layers. This sequence generates seven sheets stacked along the [001] direction of an approximate body centered cubic crystal structure with relatively weak sulfur-to-sulfur van der Waals type interactions between adjacent sulfur sheets and metal-to-metal bonding within and between adjacent mixed metal sheets.
The computational linguistics of biological sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Searls, D.
1995-12-31
This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.
The Janus Kinase (JAK) FERM and SH2 Domains: Bringing Specificity to JAK-Receptor Interactions.
Ferrao, Ryan; Lupardus, Patrick J
2017-01-01
The Janus kinases (JAKs) are non-receptor tyrosine kinases essential for signaling in response to cytokines and interferons and thereby control many essential functions in growth, development, and immune regulation. JAKs are unique among tyrosine kinases for their constitutive yet non-covalent association with class I and II cytokine receptors, which upon cytokine binding bring together two JAKs to create an active signaling complex. JAK association with cytokine receptors is facilitated by N-terminal FERM and SH2 domains, both of which are classical mediators of peptide interactions. Together, the JAK FERM and SH2 domains mediate a bipartite interaction with two distinct receptor peptide motifs, the proline-rich "Box1" and hydrophobic "Box2," which are present in the intracellular domain of cytokine receptors. While the general sidechain chemistry of Box1 and Box2 peptides is conserved between receptors, they share very weak primary sequence homology, making it impossible to posit why certain JAKs preferentially interact with and signal through specific subsets of cytokine receptors. Here, we review the structure and function of the JAK FERM and SH2 domains in light of several recent studies that reveal their atomic structure and elucidate interaction mechanisms with both the Box1 and Box2 receptor motifs. These crystal structures demonstrate how evolution has repurposed the JAK FERM and SH2 domains into a receptor-binding module that facilitates interactions with multiple receptors possessing diverse primary sequences.
Morgan, Rhodri M. L.; Hernández-Ramírez, Laura C.; Trivellin, Giampaolo; Zhou, Lihong; Roe, S. Mark; Korbonits, Márta; Prodromou, Chrisostomos
2012-01-01
Mutations of the aryl hydrocarbon receptor interacting protein (AIP) have been associated with familial isolated pituitary adenomas predisposing to young-onset acromegaly and gigantism. The precise tumorigenic mechanism is not well understood as AIP interacts with a large number of independent proteins as well as three chaperone systems, HSP90, HSP70 and TOMM20. We have determined the structure of the TPR domain of AIP at high resolution, which has allowed a detailed analysis of how disease-associated mutations impact on the structural integrity of the TPR domain. A subset of C-terminal α-7 helix (Cα-7h) mutations, R304* (nonsense mutation), R304Q, Q307* and R325Q, a known site for AhR and PDE4A5 client-protein interaction, occur beyond those that interact with the conserved MEEVD and EDDVE sequences of HSP90 and TOMM20. These C-terminal AIP mutations appear to only disrupt client-protein binding to the Cα-7h, while chaperone binding remains unaffected, suggesting that failure of client-protein interaction with the Cα-7h is sufficient to predispose to pituitary adenoma. We have also identified a molecular switch in the AIP TPR-domain that allows recognition of both the conserved HSP90 motif, MEEVD, and the equivalent sequence (EDDVE) of TOMM20. PMID:23300914
de Borba, Luana; Villordo, Sergio M; Iglesias, Nestor G; Filomatori, Claudia V; Gebhard, Leopoldo G; Gamarnik, Andrea V
2015-03-01
The dengue virus genome is a dynamic molecule that adopts different conformations in the infected cell. Here, using RNA folding predictions, chemical probing analysis, RNA binding assays, and functional studies, we identified new cis-acting elements present in the capsid coding sequence that facilitate cyclization of the viral RNA by hybridization with a sequence involved in a local dumbbell structure at the viral 3' untranslated region (UTR). The identified interaction differentially enhances viral replication in mosquito and mammalian cells. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Nielsen, Morten; Andreatta, Massimo
2017-07-03
Peptides are extensively used to characterize functional or (linear) structural aspects of receptor-ligand interactions in biological systems, e.g. SH2, SH3, PDZ peptide-recognition domains, the MHC membrane receptors and enzymes such as kinases and phosphatases. NNAlign is a method for the identification of such linear motifs in biological sequences. The algorithm aligns the amino acid or nucleotide sequences provided as training set, and generates a model of the sequence motif detected in the data. The webserver allows setting up cross-validation experiments to estimate the performance of the model, as well as evaluations on independent data. Many features of the training sequences can be encoded as input, and the network architecture is highly customizable. The results returned by the server include a graphical representation of the motif identified by the method, performance values and a downloadable model that can be applied to scan protein sequences for occurrence of the motif. While its performance for the characterization of peptide-MHC interactions is widely documented, we extended NNAlign to be applicable to other receptor-ligand systems as well. Version 2.0 supports alignments with insertions and deletions, encoding of receptor pseudo-sequences, and custom alphabets for the training sequences. The server is available at http://www.cbs.dtu.dk/services/NNAlign-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil
2013-01-01
Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures.
Restricted N-glycan Conformational Space in the PDB and Its Implication in Glycan Structure Modeling
Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil
2013-01-01
Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures. PMID:23516343
Jowitt, Thomas A; Murdoch, Alan D; Baldock, Clair; Berry, Richard; Day, Joanna M; Hardingham, Timothy E
2010-01-01
Structural investigation of proteins containing large stretches of sequences without predicted secondary structure is the focus of much increased attention. Here, we have produced an unglycosylated 30 kDa peptide from the chondroitin sulphate (CS)-attachment region of human aggrecan (CS-peptide), which was predicted to be intrinsically disordered and compared its structure with the adjacent aggrecan G3 domain. Biophysical analyses, including analytical ultracentrifugation, light scattering, and circular dichroism showed that the CS-peptide had an elongated and stiffened conformation in contrast to the globular G3 domain. The results suggested that it contained significant secondary structure, which was sensitive to urea, and we propose that the CS-peptide forms an elongated wormlike molecule based on a dynamic range of energetically equivalent secondary structures stabilized by hydrogen bonds. The dimensions of the structure predicted from small-angle X-ray scattering analysis were compatible with EM images of fully glycosylated aggrecan and a partly glycosylated aggrecan CS2-G3 construct. The semiordered structure identified in CS-peptide was not predicted by common structural algorithms and identified a potentially distinct class of semiordered structure within sequences currently identified as disordered. Sequence comparisons suggested some evidence for comparable structures in proteins encoded by other genes (PRG4, MUC5B, and CBP). The function of these semiordered sequences may serve to spatially position attached folded modules and/or to present polypeptides for modification, such as glycosylation, and to provide templates for the multiple pleiotropic interactions proposed for disordered proteins. Proteins 2010. © 2010 Wiley-Liss, Inc. PMID:20806220
Conformational Preference of ‘CαNN’ Short Peptide Motif towards Recognition of Anions
Banerjee, Raja
2013-01-01
Among several ‘anion binding motifs’, the recently described ‘CαNN’ motif occurring in the loop regions preceding a helix, is conserved through evolution both in sequence and its conformation. To establish the significance of the conserved sequence and their intrinsic affinity for anions, a series of peptides containing the naturally occurring ‘CαNN’ motif at the N-terminus of a designed helix, have been modeled and studied in a context free system using computational techniques. Appearance of a single interacting site with negative binding free-energy for both the sulfate and phosphate ions, as evidenced in docking experiments, establishes that the ‘CαNN’ segment has an intrinsic affinity for anions. Molecular Dynamics (MD) simulation studies reveal that interaction with anion triggers a conformational switch from non-helical to helical state at the ‘CαNN’ segment, which extends the length of the anchoring-helix by one turn at the N-terminus. Computational experiments substantiate the significance of sequence/structural context and justify the conserved nature of the ‘CαNN’ sequence for anion recognition through “local” interaction. PMID:23516403
Effects of sequence on DNA wrapping around histones
NASA Astrophysics Data System (ADS)
Ortiz, Vanessa
2011-03-01
A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).
Two distinct DNA sequences recognized by transcription factors represent enthalpy and entropy optima
Yin, Yimeng; Das, Pratyush K; Jolma, Arttu; Zhu, Fangjie; Popov, Alexander; Xu, You; Nilsson, Lennart
2018-01-01
Most transcription factors (TFs) can bind to a population of sequences closely related to a single optimal site. However, some TFs can bind to two distinct sequences that represent two local optima in the Gibbs free energy of binding (ΔG). To determine the molecular mechanism behind this effect, we solved the structures of human HOXB13 and CDX2 bound to their two optimal DNA sequences, CAATAAA and TCGTAAA. Thermodynamic analyses by isothermal titration calorimetry revealed that both sites were bound with similar ΔG. However, the interaction with the CAA sequence was driven by change in enthalpy (ΔH), whereas the TCG site was bound with similar affinity due to smaller loss of entropy (ΔS). This thermodynamic mechanism that leads to at least two local optima likely affects many macromolecular interactions, as ΔG depends on two partially independent variables ΔH and ΔS according to the central equation of thermodynamics, ΔG = ΔH - TΔS. PMID:29638214
NMR studies on the structure and dynamics of lac operator DNA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, S.C.
Nuclear Magnetic Resonance spectroscopy was used to elucidate the relationships between structure, dynamics and function of the gene regulatory sequence corresponding to the lactose operon operator of Escherichia coli. The length of the DNA fragments examined varied from 13 to 36 base pair, containing all or part of the operator sequence. These DNA fragments are either derived genetically or synthesized chemically. Resonances of the imino protons were assigned by one dimensional inter-base pair nuclear Overhauser enhancement (NOE) measurements. Imino proton exchange rates were measured by saturation recovery methods. Results from the kinetic measurements show an interesting dynamic heterogeneity with amore » maximum opening rate centered about a GTG/CAC sequence which correlates with the biological function of the operator DNA. This particular three base pair sequence occurs frequently and often symmetrically in prokaryotic nd eukaryotic DNA sites where one anticipates specific protein interaction for gene regulation. The observed sequence dependent imino proton exchange rate may be a reflection of variation of the local structure of regulatory DNA. The results also indicate that the observed imino proton exchange rates are length dependent.« less
Garcia, J A; Harrich, D; Soultanakis, E; Wu, F; Mitsuyasu, R; Gaynor, R B
1989-01-01
The human immunodeficiency virus (HIV) type 1 LTR is regulated at the transcriptional level by both cellular and viral proteins. Using HeLa cell extracts, multiple regions of the HIV LTR were found to serve as binding sites for cellular proteins. An untranslated region binding protein UBP-1 has been purified and fractions containing this protein bind to both the TAR and TATA regions. To investigate the role of cellular proteins binding to both the TATA and TAR regions and their potential interaction with other HIV DNA binding proteins, oligonucleotide-directed mutagenesis of both these regions was performed followed by DNase I footprinting and transient expression assays. In the TATA region, two direct repeats TC/AAGC/AT/AGCTGC surround the TATA sequence. Mutagenesis of both of these direct repeats or of the TATA sequence interrupted binding over the TATA region on the coding strand, but only a mutation of the TATA sequence affected in vivo assays for tat-activation. In addition to TAR serving as the site of binding of cellular proteins, RNA transcribed from TAR is capable of forming a stable stem-loop structure. To determine the relative importance of DNA binding proteins as compared to secondary structure, oligonucleotide-directed mutations in the TAR region were studied. Local mutations that disrupted either the stem or loop structure were defective in gene expression. However, compensatory mutations which restored base pairing in the stem resulted in complete tat-activation. This indicated a significant role for the stem-loop structure in HIV gene expression. To determine the role of TAR binding proteins, mutations were constructed which extensively changed the primary structure of the TAR region, yet left stem base pairing, stem energy and the loop sequence intact. These mutations resulted in decreased protein binding to TAR DNA and defects in tat-activation, and revealed factor binding specifically to the loop DNA sequence. Further mutagenesis which inverted this stem and loop mutation relative to the HIV LTR mRNA start site resulted in even larger decreases in tat-activation. This suggests that multiple determinants, including protein binding, the loop sequence, and RNA or DNA secondary structure, are important in tat-activation and suggests that tat may interact with cellular proteins binding to DNA to increase HIV gene expression. Images PMID:2721501
Biophysics of protein-DNA interactions and chromosome organization
Marko, John F.
2014-01-01
The function of DNA in cells depends on its interactions with protein molecules, which recognize and act on base sequence patterns along the double helix. These notes aim to introduce basic polymer physics of DNA molecules, biophysics of protein-DNA interactions and their study in single-DNA experiments, and some aspects of large-scale chromosome structure. Mechanisms for control of chromosome topology will also be discussed. PMID:25419039
Computer-Aided Design Of Turbine Blades And Vanes
NASA Technical Reports Server (NTRS)
Hsu, Wayne Q.
1988-01-01
Quasi-three-dimensional method for determining aerothermodynamic configuration of turbine uses computer-interactive analysis and design and computer-interactive graphics. Design procedure executed rapidly so designer easily repeats it to arrive at best performance, size, structural integrity, and engine life. Sequence of events in aerothermodynamic analysis and design starts with engine-balance equations and ends with boundary-layer analysis and viscous-flow calculations. Analysis-and-design procedure interactive and iterative throughout.
Holdsworth, Gill; Slocombe, Patrick; Doyle, Carl; Sweeney, Bernadette; Veverka, Vaclav; Le Riche, Kelly; Franklin, Richard J.; Compson, Joanne; Brookings, Daniel; Turner, James; Kennedy, Jeffery; Garlish, Rachael; Shi, Jiye; Newnham, Laura; McMillan, David; Muzylak, Mariusz; Carr, Mark D.; Henry, Alistair J.; Ceska, Thomas; Robinson, Martyn K.
2012-01-01
LRP5 and LRP6 are proteins predicted to contain four six-bladed β-propeller domains and both bind the bone-specific Wnt signaling antagonist sclerostin. Here, we report the crystal structure of the amino-terminal region of LRP6 and using NMR show that the ability of sclerostin to bind to this molecule is mediated by the central core of sclerostin and does not involve the amino- and carboxyl-terminal flexible arm regions. We show that this structured core region interacts with LRP5 and LRP6 via an NXI motif (found in the sequence PNAIG) within a flexible loop region (loop 2) within the central core region. This sequence is related closely to a previously identified motif in laminin that mediates its interaction with the β-propeller domain of nidogen. However, the NXI motif is not involved in the interaction of sclerostin with LRP4 (another β-propeller containing protein in the LRP family). A peptide derived from the loop 2 region of sclerostin blocked the interaction of sclerostin with LRP5/6 and also inhibited Wnt1 but not Wnt3A or Wnt9B signaling. This suggests that these Wnts interact with LRP6 in different ways. PMID:22696217
Structural basis of recognition of farnesylated and methylated KRAS4b by PDEδ.
Dharmaiah, Srisathiyanarayanan; Bindu, Lakshman; Tran, Timothy H; Gillette, William K; Frank, Peter H; Ghirlando, Rodolfo; Nissley, Dwight V; Esposito, Dominic; McCormick, Frank; Stephen, Andrew G; Simanshu, Dhirendra K
2016-11-01
Farnesylation and carboxymethylation of KRAS4b (Kirsten rat sarcoma isoform 4b) are essential for its interaction with the plasma membrane where KRAS-mediated signaling events occur. Phosphodiesterase-δ (PDEδ) binds to KRAS4b and plays an important role in targeting it to cellular membranes. We solved structures of human farnesylated-methylated KRAS4b in complex with PDEδ in two different crystal forms. In these structures, the interaction is driven by the C-terminal amino acids together with the farnesylated and methylated C185 of KRAS4b that binds tightly in the central hydrophobic pocket present in PDEδ. In crystal form II, we see the full-length structure of farnesylated-methylated KRAS4b, including the hypervariable region. Crystal form I reveals structural details of farnesylated-methylated KRAS4b binding to PDEδ, and crystal form II suggests the potential binding mode of geranylgeranylated-methylated KRAS4b to PDEδ. We identified a 5-aa-long sequence motif (Lys-Ser-Lys-Thr-Lys) in KRAS4b that may enable PDEδ to bind both forms of prenylated KRAS4b. Structure and sequence analysis of various prenylated proteins that have been previously tested for binding to PDEδ provides a rationale for why some prenylated proteins, such as KRAS4a, RalA, RalB, and Rac1, do not bind to PDEδ. Comparison of all four available structures of PDEδ complexed with various prenylated proteins/peptides shows the presence of additional interactions due to a larger protein-protein interaction interface in KRAS4b-PDEδ complex. This interface might be exploited for designing an inhibitor with minimal off-target effects.
Jeong, Jae-Hee; Kim, Yi-Seul; Rojviriya, Catleya; Cha, Hyung Jin; Ha, Sung-Chul; Kim, Yeon-Gil
2013-10-01
The members of the ARM/HEAT repeat-containing protein superfamily in eukaryotes have been known to mediate protein-protein interactions by using their concave surface. However, little is known about the ARM/HEAT repeat proteins in prokaryotes. Here we report the crystal structure of TON1937, a hypothetical protein from the hyperthermophilic archaeon Thermococcus onnurineus NA1. The structure reveals a crescent-shaped molecule composed of a double layer of α-helices with seven anti-parallel α-helical repeats. A structure-based sequence alignment of the α-helical repeats identified a conserved pattern of hydrophobic or aliphatic residues reminiscent of the consensus sequence of eukaryotic HEAT repeats. The individual repeats of TON1937 also share high structural similarity with the canonical eukaryotic HEAT repeats. In addition, the concave surface of TON1937 is proposed to be its potential binding interface based on this structural comparison and its surface properties. These observations lead us to speculate that the archaeal HEAT-like repeats of TON1937 have evolved to engage in protein-protein interactions in the same manner as eukaryotic HEAT repeats. Copyright © 2013 Elsevier B.V. All rights reserved.
Koentjoro, Maharani Pertiwi; Adachi, Naruhiko; Senda, Miki; Ogawa, Naoto; Senda, Toshiya
2018-03-01
LysR-type transcriptional regulators (LTTRs) are among the most abundant transcriptional regulators in bacteria. CbnR is an LTTR derived from Cupriavidus necator (formerly Alcaligenes eutrophus or Ralstonia eutropha) NH9 and is involved in transcriptional activation of the cbnABCD genes encoding chlorocatechol degradative enzymes. CbnR interacts with a cbnA promoter region of approximately 60 bp in length that contains the recognition-binding site (RBS) and activation-binding site (ABS). Upon inducer binding, CbnR seems to undergo conformational changes, leading to the activation of the transcription. Since the interaction of an LTTR with RBS is considered to be the first step of the transcriptional activation, the CbnR-RBS interaction is responsible for the selectivity of the promoter to be activated. To understand the sequence selectivity of CbnR, we determined the crystal structure of the DNA-binding domain of CbnR in complex with RBS of the cbnA promoter at 2.55 Å resolution. The crystal structure revealed details of the interactions between the DNA-binding domain and the promoter DNA. A comparison with the previously reported crystal structure of the DNA-binding domain of BenM in complex with its cognate RBS showed several differences in the DNA interactions, despite the structural similarity between CbnR and BenM. These differences explain the observed promoter sequence selectivity between CbnR and BenM. Particularly, the difference between Thr33 in CbnR and Ser33 in BenM appears to affect the conformations of neighboring residues, leading to the selective interactions with DNA. Atomic coordinates and structure factors for the DNA-binding domain of Cupriavidus necatorNH9 CbnR in complex with RBS are available in the Protein Data Bank under the accession code 5XXP. © 2018 Federation of European Biochemical Societies.
Gold, Nicola D; Jackson, Richard M
2006-02-03
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
Principles of protein folding--a perspective from simple exact models.
Dill, K. A.; Bromberg, S.; Yue, K.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S.
1995-01-01
General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse. PMID:7613459
2010-01-01
Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480
Evolution and function of CAG/polyglutamine repeats in protein–protein interaction networks
Schaefer, Martin H.; Wanker, Erich E.; Andrade-Navarro, Miguel A.
2012-01-01
Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation. Our analyses suggest that research on polyQ proteins should shift focus from expanded polyQ proteins into the characterization of the influence of the wild-type polyQ on protein interactions. PMID:22287626
Spagnol, Gaelle; Kieken, Fabien; Kopanic, Jennifer L.; Li, Hanjun; Zach, Sydney; Stauch, Kelly L.; Grosely, Rosslyn; Sorgen, Paul L.
2016-01-01
Neuronal precursor cell-expressed developmentally down-regulated 4 (Nedd4) was the first ubiquitin protein ligase identified to interact with connexin43 (Cx43), and its suppressed expression results in accumulation of gap junction plaques at the plasma membrane. Nedd4-mediated ubiquitination of Cx43 is required to recruit Eps15 and target Cx43 to the endocytic pathway. Although the Cx43 residues that undergo ubiquitination are still unknown, in this study we address other unresolved questions pertaining to the molecular mechanisms mediating the direct interaction between Nedd4 (WW1–3 domains) and Cx43 (carboxyl terminus (CT)). All three WW domains display a similar three antiparallel β-strand structure and interact with the same Cx43CT 283PPXY286 sequence. Although Tyr286 is essential for the interaction, MAPK phosphorylation of the preceding serine residues (Ser(P)279 and Ser(P)282) increases the binding affinity by 2-fold for the WW domains (WW2 > WW3 ≫ WW1). The structure of the WW2·Cx43CT276–289(Ser(P)279, Ser(P)282) complex reveals that coordination of Ser(P)282 with the end of β-strand 3 enables Ser(P)279 to interact with the back face of β-strand 3 (Tyr286 is on the front face) and loop 2, forming a horseshoe-shaped arrangement. The close sequence identity of WW2 with WW1 and WW3 residues that interact with the Cx43CT PPXY motif and Ser(P)279/Ser(P)282 strongly suggests that the significantly lower binding affinity of WW1 is the result of a more rigid structure. This study presents the first structure illustrating how phosphorylation of the Cx43CT domain helps mediate the interaction with a molecular partner involved in gap junction regulation. PMID:26841867
Spagnol, Gaelle; Kieken, Fabien; Kopanic, Jennifer L; Li, Hanjun; Zach, Sydney; Stauch, Kelly L; Grosely, Rosslyn; Sorgen, Paul L
2016-04-01
Neuronal precursor cell-expressed developmentally down-regulated 4 (Nedd4) was the first ubiquitin protein ligase identified to interact with connexin43 (Cx43), and its suppressed expression results in accumulation of gap junction plaques at the plasma membrane. Nedd4-mediated ubiquitination of Cx43 is required to recruit Eps15 and target Cx43 to the endocytic pathway. Although the Cx43 residues that undergo ubiquitination are still unknown, in this study we address other unresolved questions pertaining to the molecular mechanisms mediating the direct interaction between Nedd4 (WW1-3 domains) and Cx43 (carboxyl terminus (CT)). All three WW domains display a similar three antiparallel β-strand structure and interact with the same Cx43CT(283)PPXY(286)sequence. Although Tyr(286)is essential for the interaction, MAPK phosphorylation of the preceding serine residues (Ser(P)(279)and Ser(P)(282)) increases the binding affinity by 2-fold for the WW domains (WW2 > WW3 ≫ WW1). The structure of the WW2·Cx43CT(276-289)(Ser(P)(279), Ser(P)(282)) complex reveals that coordination of Ser(P)(282)with the end of β-strand 3 enables Ser(P)(279)to interact with the back face of β-strand 3 (Tyr(286)is on the front face) and loop 2, forming a horseshoe-shaped arrangement. The close sequence identity of WW2 with WW1 and WW3 residues that interact with the Cx43CT PPXY motif and Ser(P)(279)/Ser(P)(282)strongly suggests that the significantly lower binding affinity of WW1 is the result of a more rigid structure. This study presents the first structure illustrating how phosphorylation of the Cx43CT domain helps mediate the interaction with a molecular partner involved in gap junction regulation. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Nuclease footprint analyses of the interactions between RNase P ribozyme and a model mRNA substrate.
Trang, P; Hsu, A W; Liu, F
1999-01-01
RNase P ribozyme cleaves an RNA helix substrate which resembles the acceptor stem and T-stem structures of its natural tRNA substrate. By linking the ribozyme covalently to a sequence (guide sequence) complementary to a target RNA, the catalytic RNA can be converted into a sequence-specific ribozyme, M1GS RNA. We have previously shown that M1GS RNA can efficiently cleave the mRNA sequence encoding thymidine kinase (TK) of herpes simplex virus 1. In this study, a footprint procedure using different nucleases was carried out to map the regions of a M1GS ribozyme that potentially interact with the TK mRNA substrate. The ribozyme regions that are protected from nuclease degradation in the presence of the TK mRNA substrate include those that interact with the acceptor stem and T-stem, the 3' terminal CCA sequence and the cleavage site of a tRNA substrate. However, some of the protected regions (e.g. P13 and P14) are unique and not among those protected in the presence of a tRNA substrate. Identification of the regions that interact with a mRNA substrate will allow us to study how M1GS RNA recognizes a mRNA substrate and facilitate the development of mRNA-cleaving ribozymes for gene-targeting applications. PMID:10556315
Template-Based Modeling of Protein-RNA Interactions
Zheng, Jinfang; Kundrotas, Petras J.; Vakser, Ilya A.
2016-01-01
Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes. PMID:27662342
LenVarDB: database of length-variant protein domains.
Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan
2014-01-01
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.
Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H
2017-04-15
Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification
Sinclair, Robert M.; Ravantti, Janne J.
2017-01-01
ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Dykeman, Eric C; Stockley, Peter G; Twarock, Reidun
2013-09-09
The current paradigm for assembly of single-stranded RNA viruses is based on a mechanism involving non-sequence-specific packaging of genomic RNA driven by electrostatic interactions. Recent experiments, however, provide compelling evidence for sequence specificity in this process both in vitro and in vivo. The existence of multiple RNA packaging signals (PSs) within viral genomes has been proposed, which facilitates assembly by binding coat proteins in such a way that they promote the protein-protein contacts needed to build the capsid. The binding energy from these interactions enables the confinement or compaction of the genomic RNAs. Identifying the nature of such PSs is crucial for a full understanding of assembly, which is an as yet untapped potential drug target for this important class of pathogens. Here, for two related bacterial viruses, we determine the sequences and locations of their PSs using Hamiltonian paths, a concept from graph theory, in combination with bioinformatics and structural studies. Their PSs have a common secondary structure motif but distinct consensus sequences and positions within the respective genomes. Despite these differences, the distributions of PSs in both viruses imply defined conformations for the packaged RNA genomes in contact with the protein shell in the capsid, consistent with a recent asymmetric structure determination of the MS2 virion. The PS distributions identified moreover imply a preferred, evolutionarily conserved assembly pathway with respect to the RNA sequence with potentially profound implications for other single-stranded RNA viruses known to have RNA PSs, including many animal and human pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E
2010-06-01
A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.
Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.
DeMaere, Matthew Z; Darling, Aaron E
2018-02-01
Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
Bamford, Vicki A; Armour, Maria; Mitchell, Sue A; Cartron, Michaël; Andrews, Simon C; Watson, Kimberly A
2008-09-01
YqjH is a cytoplasmic FAD-containing protein from Escherichia coli; based on homology to ViuB of Vibrio cholerae, it potentially acts as a ferri-siderophore reductase. This work describes its overexpression, purification, crystallization and structure solution at 3.0 A resolution. YqjH shares high sequence similarity with a number of known siderophore-interacting proteins and its structure was solved by molecular replacement using the siderophore-interacting protein from Shewanella putrefaciens as the search model. The YqjH structure resembles those of other members of the NAD(P)H:flavin oxidoreductase superfamily.
Yoga, Yano M. K.; Traore, Daouda A. K.; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R.; Barker, Andrew; Leedman, Peter J.; Wilce, Jacqueline A.; Wilce, Matthew C. J.
2012-01-01
Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5′-CCCTCCCT-3′ DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5′-ACCCCA-3′ DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad. PMID:22344691
Yoga, Yano M K; Traore, Daouda A K; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R; Barker, Andrew; Leedman, Peter J; Wilce, Jacqueline A; Wilce, Matthew C J
2012-06-01
Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5'-CCCTCCCT-3' DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5'-ACCCCA-3' DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad.
Niesteruk, Anna; Jonker, Hendrik R A; Richter, Christian; Linhard, Verena; Sreeramulu, Sridhar; Schwalbe, Harald
2018-06-08
The discovery that MptpA (low-molecular-weight protein tyrosine phosphatase A) from Mycobacterium tuberculosis ( Mtb ) has an essential role for Mtb virulence has motivated research of tyrosine-specific phosphorylation in Mtb and other pathogenic bacteria. The phosphatase activity of MptpA is regulated via phosphorylation on Tyr-128 and Tyr-129. Thus far, only a single tyrosine-specific kinase, protein tyrosine kinase A (PtkA), encoded by the Rv2232 gene has been identified within the Mtb genome. MptpA undergoes phosphorylation by PtkA. PtkA is an atypical bacterial tyrosine kinase, as its sequence differs from the sequence consensus within this family. The lack of structural information on PtkA hampers the detailed characterization of the MptpA-PtkA interaction. Here, using NMR spectroscopy, we provide a detailed structural characterization of the PtkA architecture and describe its intra- and intermolecular interactions with MptpA. We found that PtkA's domain architecture differs from the conventional kinase architecture and is composed of two domains, the N-terminal highly flexible IDD PtkA and the C-terminal rigid KCD PtkA The interaction studies between the two domains together with the structural model of the IDD-KCD complex proposed in this study reveals that the IDD is unstructured and highly dynamic, allowing for a "fly-casting" like mechanism of transient interactions with the rigid KCD. This interaction modulates the accessibility of the KCD active site. In general, the structural and functional knowledge of PtkA gained in this study, is crucial for understanding the MptpA-PtkA interactions, catalytic mechanism and the role of kinase-phosphatase regulatory system in Mtb virulence. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.
Fantini, Jacques; Garmy, Nicolas; Yahi, Nouara
2006-09-12
Protein-glycolipid interactions mediate the attachment of various pathogens to the host cell surface as well as the association of numerous cellular proteins with lipid rafts. Thus, it is of primary importance to identify the protein domains involved in glycolipid recognition. Using structure similarity searches, we could identify a common glycolipid-binding domain in the three-dimensional structure of several proteins known to interact with lipid rafts. Yet the three-dimensional structure of most raft-targeted proteins is still unknown. In the present study, we have identified a glycolipid-binding domain in the amino acid sequence of a bacterial adhesin (Helicobacter pylori adhesin A, HpaA). The prediction was based on the major properties of the glycolipid-binding domains previously characterized by structural searches. A short (15-mer) synthetic peptide corresponding to this putative glycolipid-binding domain was synthesized, and we studied its interaction with glycolipid monolayers at the air-water interface. The synthetic HpaA peptide recognized LacCer but not Gb3. This glycolipid specificity was in line with that of the whole bacterium. Molecular modeling studies gave some insights into this high selectivity of interaction. It also suggested that Phe147 in HpaA played a key role in LacCer recognition, through sugar-aromatic CH-pi stacking interactions with the hydrophobic side of the galactose ring of LacCer. Correspondingly, the replacement of Phe147 with Ala strongly affected LacCer recognition, whereas substitution with Trp did not. Our method could be used to identify glycolipid-binding domains in microbial and cellular proteins interacting with lipid shells, rafts, and other specialized membrane microdomains.
Mass Spectrometric Determination of ILPR G-quadruplex Binding Sites in Insulin and IGF-2
Xiao, JunFeng
2009-01-01
The insulin-linked polymorphic region (ILPR) of the human insulin gene promoter region forms G-quadruplex structures in vitro. Previous studies show that insulin and insulin-like growth factor-2 (IGF-2) exhibit high affinity binding in vitro to 2-repeat sequences of ILPR variants a and h, but negligible binding to variant i. Two-repeat sequences of variants a and h form intramolecular G-quadruplex structures that are not evidenced for variant i. Here we report on the use of protein digestion combined with affinity capture and MALDI-MS detection to pinpoint ILPR binding sites in insulin and IGF-2. Peptides captured by ILPR variants a and h were sequenced by MALDI-MS/MS, LC-MS and in silico digestion. On-bead digestion of insulin-ILPR variant a complexes supported the conclusions. The results indicate that the sequence VCG(N)RGF is generally present in the captured peptides and is likely involved in the affinity binding interactions of the proteins with the ILPR G-quadruplexes. The significance of arginine in the interactions was studied by comparing the affinities of synthesized peptides VCGERGF and VCGEAGF with ILPR variant a. Peptides from other regions of the proteins that are connected through disulfide linkages were also detected in some capture experiments. Identification of binding sites could facilitate design of DNA binding ligands for capture and detection of insulin and IGF-2. The interactions may have biological significance as well. PMID:19747845
Neural Sequence Generation Using Spatiotemporal Patterns of Inhibition.
Cannon, Jonathan; Kopell, Nancy; Gardner, Timothy; Markowitz, Jeffrey
2015-11-01
Stereotyped sequences of neural activity are thought to underlie reproducible behaviors and cognitive processes ranging from memory recall to arm movement. One of the most prominent theoretical models of neural sequence generation is the synfire chain, in which pulses of synchronized spiking activity propagate robustly along a chain of cells connected by highly redundant feedforward excitation. But recent experimental observations in the avian song production pathway during song generation have shown excitatory activity interacting strongly with the firing patterns of inhibitory neurons, suggesting a process of sequence generation more complex than feedforward excitation. Here we propose a model of sequence generation inspired by these observations in which a pulse travels along a spatially recurrent excitatory chain, passing repeatedly through zones of local feedback inhibition. In this model, synchrony and robust timing are maintained not through redundant excitatory connections, but rather through the interaction between the pulse and the spatiotemporal pattern of inhibition that it creates as it circulates the network. These results suggest that spatially and temporally structured inhibition may play a key role in sequence generation.
Miller, Andrew D
2015-02-01
A sense peptide can be defined as a peptide whose sequence is coded by the nucleotide sequence (read 5' → 3') of the sense (positive) strand of DNA. Conversely, an antisense (complementary) peptide is coded by the corresponding nucleotide sequence (read 5' → 3') of the antisense (negative) strand of DNA. Research has been accumulating steadily to suggest that sense peptides are capable of specific interactions with their corresponding antisense peptides. Unfortunately, although more and more examples of specific sense-antisense peptide interactions are emerging, the very idea of such interactions does not conform to standard biology dogma and so there remains a sizeable challenge to lift this concept from being perceived as a peripheral phenomenon if not worse, into becoming part of the scientific mainstream. Specific interactions have now been exploited for the inhibition of number of widely different protein-protein and protein-receptor interactions in vitro and in vivo. Further, antisense peptides have also been used to induce the production of antibodies targeted to specific receptors or else the production of anti-idiotypic antibodies targeted against auto-antibodies. Such illustrations of utility would seem to suggest that observed sense-antisense peptide interactions are not just the consequence of a sequence of coincidental 'lucky-hits'. Indeed, at the very least, one might conclude that sense-antisense peptide interactions represent a potentially new and different source of leads for drug discovery. But could there be more to come from studies in this area? Studies on the potential mechanism of sense-antisense peptide interactions suggest that interactions may be driven by amino acid residue interactions specified from the genetic code. If so, such specified amino acid residue interactions could form the basis for an even wider amino acid residue interaction code (proteomic code) that links gene sequences to actual protein structure and function, even entire genomes to entire proteomes. The possibility that such a proteomic code should exist is discussed. So too the potential implications for biology and pharmaceutical science are also discussed were such a code to exist.
Structure of the human protein kinase MPSK1 reveals an atypical activation loop architecture.
Eswaran, Jeyanthy; Bernad, Antonio; Ligos, Jose M; Guinea, Barbara; Debreczeni, Judit E; Sobott, Frank; Parker, Sirlester A; Najmanovich, Rafael; Turk, Benjamin E; Knapp, Stefan
2008-01-01
The activation segment of protein kinases is structurally highly conserved and central to regulation of kinase activation. Here we report an atypical activation segment architecture in human MPSK1 comprising a beta sheet and a large alpha-helical insertion. Sequence comparisons suggested that similar activation segments exist in all members of the MPSK1 family and in MAST kinases. The consequence of this nonclassical activation segment on substrate recognition was studied using peptide library screens that revealed a preferred substrate sequence of X-X-P/V/I-phi-H/Y-T*-N/G-X-X-X (phi is an aliphatic residue). In addition, we identified the GTPase DRG1 as an MPSK1 interaction partner and specific substrate. The interaction domain in DRG1 was mapped to the N terminus, leading to recruitment and phosphorylation at Thr100 within the GTPase domain. The presented data reveal an atypical kinase structural motif and suggest a role of MPSK1 regulating DRG1, a GTPase involved in regulation of cellular growth.
Basic Tilted Helix Bundle - a new protein fold in human FKBP25/FKBP3 and HectD1.
Helander, Sara; Montecchio, Meri; Lemak, Alexander; Farès, Christophe; Almlöf, Jonas; Yi, Yanjun; Yee, Adelinda; Arrowsmith, Cheryl; DhePaganon, Sirano; Sunnerhagen, Maria
2014-04-25
In this paper, we describe the structure of a N-terminal domain motif in nuclear-localized FKBP251-73, a member of the FKBP family, together with the structure of a sequence-related subdomain of the E3 ubiquitin ligase HectD1 that we show belongs to the same fold. This motif adopts a compact 5-helix bundle which we name the Basic Tilted Helix Bundle (BTHB) domain. A positively charged surface patch, structurally centered around the tilted helix H4, is present in both FKBP25 and HectD1 and is conserved in both proteins, suggesting a conserved functional role. We provide detailed comparative analysis of the structures of the two proteins and their sequence similarities, and analysis of the interaction of the proposed FKBP25 binding protein YY1. We suggest that the basic motif in BTHB is involved in the observed DNA binding of FKBP25, and that the function of this domain can be affected by regulatory YY1 binding and/or interactions with adjacent domains. Copyright © 2014 Elsevier Inc. All rights reserved.
Structural basis of toxicity and immunity in contact-dependent growth inhibition (CDI) systems.
Morse, Robert P; Nikolakakis, Kiel C; Willett, Julia L E; Gerrick, Elias; Low, David A; Hayes, Christopher S; Goulding, Celia W
2012-12-26
Contact-dependent growth inhibition (CDI) systems encode polymorphic toxin/immunity proteins that mediate competition between neighboring bacterial cells. We present crystal structures of CDI toxin/immunity complexes from Escherichia coli EC869 and Burkholderia pseudomallei 1026b. Despite sharing little sequence identity, the toxin domains are structurally similar and have homology to endonucleases. The EC869 toxin is a Zn(2+)-dependent DNase capable of completely degrading the genomes of target cells, whereas the Bp1026b toxin cleaves the aminoacyl acceptor stems of tRNA molecules. Each immunity protein binds and inactivates its cognate toxin in a unique manner. The EC869 toxin/immunity complex is stabilized through an unusual β-augmentation interaction. In contrast, the Bp1026b immunity protein exploits shape and charge complementarity to occlude the toxin active site. These structures represent the initial glimpse into the CDI toxin/immunity network, illustrating how sequence-diverse toxins adopt convergent folds yet retain distinct binding interactions with cognate immunity proteins. Moreover, we present visual demonstration of CDI toxin delivery into a target cell.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolan, Kyle T.; Duguid, Erica M.; He, Chuan
2011-11-17
SlyA is a master virulence regulator that controls the transcription of numerous genes in Salmonella enterica. We present here crystal structures of SlyA by itself and bound to a high-affinity DNA operator sequence in the slyA gene. SlyA interacts with DNA through direct recognition of a guanine base by Arg-65, as well as interactions between conserved Arg-86 and the minor groove and a large network of non-base-specific contacts with the sugar phosphate backbone. Our structures, together with an unpublished structure of SlyA bound to the small molecule effector salicylate (Protein Data Bank code 3DEU), reveal that, unlike many other MarRmore » family proteins, SlyA dissociates from DNA without large conformational changes when bound to this effector. We propose that SlyA and other MarR global regulators rely more on indirect readout of DNA sequence to exert control over many genes, in contrast to proteins (such as OhrR) that recognize a single operator.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parish, D.; Benach, J; Liu, G
2008-01-01
The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe)more » hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.« less
A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.
Michnick, S W; Shakhnovich, E
1998-01-01
Nucleation-growth theory predicts that fast-folding peptide sequences fold to their native structure via structures in a transition-state ensemble that share a small number of native contacts (the folding nucleus). Experimental and theoretical studies of proteins suggest that residues participating in folding nuclei are conserved among homologs. We attempted to determine if this is true in proteins with highly diverged sequences but identical folds (superfamilies). We describe a strategy based on comparisons of residue conservation in natural superfamily sequences with simulated sequences (generated with a Monte-Carlo sequence design strategy) for the same proteins. The basic assumptions of the strategy were that natural sequences will conserve residues needed for folding and stability plus function, the simulated sequences contain no functional conservation, and nucleus residues make native contacts with each other. Based on these assumptions, we identified seven potential nucleus residues in ubiquitin superfamily members. Non-nucleus conserved residues were also identified; these are proposed to be involved in stabilizing native interactions. We found that all superfamily members conserved the same potential nucleus residue positions, except those for which the structural topology is significantly different. Our results suggest that the conservation of the nucleus of a specific fold can be predicted by comparing designed simulated sequences with natural highly diverged sequences that fold to the same structure. We suggest that such a strategy could be used to help plan protein folding and design experiments, to identify new superfamily members, and to subdivide superfamilies further into classes having a similar folding mechanism.
On the Origin of Protein Superfamilies and Superfolds
NASA Astrophysics Data System (ADS)
Magner, Abram; Szpankowski, Wojciech; Kihara, Daisuke
2015-02-01
Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.
Methods For Self-Organizing Software
Bouchard, Ann M.; Osbourn, Gordon C.
2005-10-18
A method for dynamically self-assembling and executing software is provided, containing machines that self-assemble execution sequences and data structures. In addition to ordered functions calls (found commonly in other software methods), mutual selective bonding between bonding sites of machines actuates one or more of the bonding machines. Two or more machines can be virtually isolated by a construct, called an encapsulant, containing a population of machines and potentially other encapsulants that can only bond with each other. A hierarchical software structure can be created using nested encapsulants. Multi-threading is implemented by populations of machines in different encapsulants that are interacting concurrently. Machines and encapsulants can move in and out of other encapsulants, thereby changing the functionality. Bonding between machines' sites can be deterministic or stochastic with bonding triggering a sequence of actions that can be implemented by each machine. A self-assembled execution sequence occurs as a sequence of stochastic binding between machines followed by their deterministic actuation. It is the sequence of bonding of machines that determines the execution sequence, so that the sequence of instructions need not be contiguous in memory.
Molecular dynamics studies on the DNA-binding process of ERG.
Beuerle, Matthias G; Dufton, Neil P; Randi, Anna M; Gould, Ian R
2016-11-15
The ETS family of transcription factors regulate gene targets by binding to a core GGAA DNA-sequence. The ETS factor ERG is required for homeostasis and lineage-specific functions in endothelial cells, some subset of haemopoietic cells and chondrocytes; its ectopic expression is linked to oncogenesis in multiple tissues. To date details of the DNA-binding process of ERG including DNA-sequence recognition outside the core GGAA-sequence are largely unknown. We combined available structural and experimental data to perform molecular dynamics simulations to study the DNA-binding process of ERG. In particular we were able to reproduce the ERG DNA-complex with a DNA-binding simulation starting in an unbound configuration with a final root-mean-square-deviation (RMSD) of 2.1 Å to the core ETS domain DNA-complex crystal structure. This allowed us to elucidate the relevance of amino acids involved in the formation of the ERG DNA-complex and to identify Arg385 as a novel key residue in the DNA-binding process. Moreover we were able to show that water-mediated hydrogen bonds are present between ERG and DNA in our simulations and that those interactions have the potential to achieve sequence recognition outside the GGAA core DNA-sequence. The methodology employed in this study shows the promising capabilities of modern molecular dynamics simulations in the field of protein DNA-interactions.
Genome-Wide Prediction and Validation of Peptides That Bind Human Prosurvival Bcl-2 Proteins
DeBartolo, Joe; Taipale, Mikko; Keating, Amy E.
2014-01-01
Programmed cell death is regulated by interactions between pro-apoptotic and prosurvival members of the Bcl-2 family. Pro-apoptotic family members contain a weakly conserved BH3 motif that can adopt an alpha-helical structure and bind to a groove on prosurvival partners Bcl-xL, Bcl-w, Bcl-2, Mcl-1 and Bfl-1. Peptides corresponding to roughly 13 reported BH3 motifs have been verified to bind in this manner. Due to their short lengths and low sequence conservation, BH3 motifs are not detected using standard sequence-based bioinformatics approaches. Thus, it is possible that many additional proteins harbor BH3-like sequences that can mediate interactions with the Bcl-2 family. In this work, we used structure-based and data-based Bcl-2 interaction models to find new BH3-like peptides in the human proteome. We used peptide SPOT arrays to test candidate peptides for interaction with one or more of the prosurvival proteins Bcl-xL, Bcl-w, Bcl-2, Mcl-1 and Bfl-1. For the 36 most promising array candidates, we quantified binding to all five human receptors using direct and competition binding assays in solution. All 36 peptides showed evidence of interaction with at least one prosurvival protein, and 22 peptides bound at least one prosurvival protein with a dissociation constant between 1 and 500 nM; many peptides had specificity profiles not previously observed. We also screened the full-length parent proteins of a subset of array-tested peptides for binding to Bcl-xL and Mcl-1. Finally, we used the peptide binding data, in conjunction with previously reported interactions, to assess the affinity and specificity prediction performance of different models. PMID:24967846
Predicting helix–helix interactions from residue contacts in membrane proteins
Lo, Allan; Chiu, Yi-Yuan; Rødland, Einar Andreas; Lyu, Ping-Chiang; Sung, Ting-Yi; Hsu, Wen-Lian
2009-01-01
Motivation: Helix–helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. Results: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins. Availability: http://bio-cluster.iis.sinica.edu.tw/TMhit/ Contact: tsung@iis.sinica.edu.tw Supplementary information:Supplementary data are available at Bioinformatics online. PMID:19244388
Prediction of TF target sites based on atomistic models of protein-DNA complexes
Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno
2008-01-01
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190
Modification-dependent restriction endonuclease, MspJI, flips 5-methylcytosine out of the DNA helix
Horton, J. R.; Wang, H.; Mabuchi, M. Y.; ...
2014-09-27
MspJI belongs to a family of restriction enzymes that cleave DNA containing 5-methylcytosine (5mC) or 5-hydroxymethylcytosine (5hmC). MspJI is specific for the sequence 5(h)mC-N-N-G or A and cleaves with some variability 9/13 nucleotides downstream. Earlier, we reported the crystal structure of MspJI without DNA and proposed how it might recognize this sequence and catalyze cleavage. Here we report its co-crystal structure with a 27-base pair oligonucleotide containing 5mC. This structure confirms that MspJI acts as a homotetramer and that the modified cytosine is flipped from the DNA helix into an SRA-like-binding pocket. We expected the structure to reveal two DNAmore » molecules bound specifically to the tetramer and engaged with the enzyme's two DNA-cleavage sites. A coincidence of crystal packing precluded this organization, however. We found that each DNA molecule interacted with two adjacent tetramers, binding one specifically and the other non-specifically. The latter interaction, which prevented cleavage-site engagement, also involved base flipping and might represent the sequence-interrogation phase that precedes specific recognition. MspJI is unusual in that DNA molecules are recognized and cleaved by different subunits. Such interchange of function might explain how other complex multimeric restriction enzymes act.« less
Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs
2017-01-01
Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package. PMID:29107980
NASA Astrophysics Data System (ADS)
Sethaphong, Latsavongsakda
This work examines smart material properties of rational self-assembly and molecular recognition found in nano-biosystems. Exploiting the sequence and structural information encoded within nucleic acids and proteins will permit programmed synthesis of nanomaterials and help create molecular machines that may carry out new roles involving chemical catalysis and bioenergy. Responsive to different ionic environments thru self-reorgnization, nucleic acids (NA) are nature's signature smart material; organisms such as viruses and bacteria use features of NAs to react to their environment and orchestrate their lifecycle. Furthermore, nucleic acid systems (both RNA and DNA) are currently exploited as scaffolds; recent applications have been showcased to build bioelectronics and biotemplated nanostructures via directed assembly of multidimensional nanoelectronic devices 1. Since the most stable and rudimentary structure of nucleic acids is the helical duplex, these were modeled in order to examine the influence of the microenvironment, sequence, and cation-dependent perturbations of their canonical forms. Due to their negatively charged phosphate backbone, NA's rely on counterions to overcome the inherent repulsive forces that arise from the assembly of two complementary strands. As a realistic model system, we chose the HIV-TAR helix (PDB ID: 397D) to study specific sequence motifs on cation sequestration. At physiologically relevant concentrations of sodium and potassium ions, we observed sequence based effects where purine stretches were adept in retaining high residency cations. The transitional space between adenine and guanosine nucleotides (ApG step) in a sequence proved the most favorable. This work was the first to directly show these subtle interactions of sequence based cationic sequestration and may be useful for controlling metallization of nucleic acids in conductive nanowires. Extending the study further, we explored the degree to which the structure of NA duplexes alone interacted with cations distinct from a specific sequence. Under physiologically relevant conditions, a duplex of RNA polyguanine-polycitidine was highly responsive and able to sequester cations to the middle of the purine stretches. The least responsive structure was a DNA polyadenine-polythymine duplex. A random sequence DNA duplex contorted into an RNA-like helix resulted in cationic dynamics similar to RNA systems. These studies showed that cation diffusive binding events in nucleic acid duplex structures are sequence specific and heavily influenced by structural aspects helical forms to account for much of the differences observed. Although structural information in nucleic acids is encoded within their sequence, linking amino acid sequence to protein structure is murkier; the structural information within proteins is encoded by the folding process itself: a complex phenomenon driven toward the equilibrium state of the active conformation. Upwards of two thirds of a protein's sequence can be substituted with similar amino acids without significantly perturbing its function; conserved residues of about 10% seem to be vital; since evolutionary selection pressure in proteins operates 3-dimenionally, a linear sequence is partially informative. We explored this problem by folding de-novo the cytosolic portion of the membrane protein, cellulose synthase, CESA1 from upland cotton, Gossypium hirsutum (Ghcesa1). The cytoplasmic region was generated by homology modeling and refined with molecular dynamics. These mutations impair local structural flexibility which likely results in cellulose that is produced at a lower rate and is less crystalline. Additional modeling of fragments of cellulose synthases from the model plant, Arabidopsis thaliana, offered novel insights into the function of conserved cytosolic domains within plant cellulose synthases. Transport mechanisms related to the transmembrane region revealed significant differences between plants and a bacterial complex. These studies generated possible mutations that may allow for the creation of new synthases and identified other avenues of research in order to develop technologies that may alter the crystallinity and other useful properties of cellulose. 1. Karplus, K., SAM-T08, HMM-based protein structure prediction. Nucleic Acids Research, 2009. 37: p. W492-W497.
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
Folding a Protein with Equal Probability of Being Helix or Hairpin
Lin, Chun-Yu; Chen, Nan-Yow; Mou, Chung Yu
2012-01-01
We explore the possibility for the native structure of a protein being inherently multiconformational in an ab initio coarse-grained model. Based on the Wang-Landau algorithm, the complete free energy landscape for the designed sequence 2DX4: INYWLAHAKAGYIVHWTA is constructed. It is shown that 2DX4 possesses two nearly degenerate native structures: one is a helix structure with the other a hairpinlike structure, and their free energy difference is <2% of that of local minima. Two degenerate native structures are stabilized by an energy barrier of ∼10 kcal/mol. Furthermore, the hydrogen-bond and dipole-dipole interactions are found to be two major competing interactions in transforming one conformation into the other. Our results indicate that two degenerate native structures are stabilized by subtle balance between different interactions in proteins. In particular, for small proteins, balance between the hydrogen-bond and dipole-dipole interactions happens for proteins of sizes being ∼18 amino acids and is shown to the main driving mechanism for the occurrence of degeneracy. These results provide important clues to the study of native structures of proteins. PMID:22828336
Bhattacharjee, Snehasish; Chakraborty, Sandipan; Sengupta, Pradeep K; Bhowmik, Sudipta
2016-09-01
Guanine-rich sequences have the propensity to fold into a four-stranded DNA structure known as a G-quadruplex (G4). G4 forming sequences are abundant in the promoter region of several oncogenes and become a key target for anticancer drug binding. Here we have studied the interactions of two structurally similar dietary plant flavonoids fisetin and naringenin with G4 as well as double stranded (duplex) DNA by using different spectroscopic and modeling techniques. Our study demonstrates the differential binding ability of the two flavonoids with G4 and duplex DNA. Fisetin more strongly interacts with parallel G4 structure than duplex DNA, whereas naringenin shows stronger binding affinity to duplex rather than G4 DNA. Molecular docking results also corroborate our spectroscopic results, and it was found that both of the ligands are stacked externally in the G4 DNA structure. C-ring planarity of the flavonoid structure appears to be a crucial factor for preferential G4 DNA recognition of flavonoids. The goal of this study is to explore the critical effects of small differences in the structure of closely similar chemical classes of such small molecules (flavonoids) which lead to the contrasting binding properties with the two different forms of DNA. The resulting insights may be expected to facilitate the designing of the highly selective G4 DNA binders based on flavonoid scaffolds.
Conservation of tubulin-binding sequences in TRPV1 throughout evolution.
Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan
2012-01-01
Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Our analysis identifies the regions of TRPV1, which are important for structure-function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol®-induced neuropathy.
Lin, H; Rao, V B; Black, L W
1999-06-04
Bacteriophage DNA packaging results from an ATP-driven translocation of concatemeric DNA into the prohead by the phage terminase complexed with the portal vertex dodecamer of the prohead. Functional domains of the bacteriophage T4 terminase and portal gene 20 product (gp20) were determined by mutant analysis and sequence localization within the structural genes. Interaction regions of the portal vertex and large terminase subunit (gp17) were determined by genetic (terminase-portal intergenic suppressor mutations), biochemical (column retention of gp17 and inhibition of in vitro DNA packaging by gp20 peptides), and immunological (co-immunoprecipitation of polymerized gp20 peptide and gp17) studies. The specificity of the interaction was tested by means of a phage T4 HOC (highly antigenicoutercapsid protein) display system in which wild-type, cs20, and scrambled portal peptide sequences were displayed on the HOC protein of phage T4. Binding affinities of these recombinant phages as determined by the retention of these phages by a His-tag immobilized gp17 column, and by co-immunoprecipitation with purified terminase supported the specific nature of the portal protein and terminase interaction sites. In further support of specificity, a gp20 peptide corresponding to a portion of the identified site inhibited packaging whereas the scrambled sequence peptide did not block DNA packaging in vitro. The portal interaction site is localized to 28 residues in the central portion of the linear sequence of gp20 (524 residues). As judged by two pairs of intergenic portal-terminase suppressor mutations, two separate regions of the terminase large subunit gp17 (central and COOH-terminal) interact through hydrophobic contacts at the portal site. Although the terminase apparently interacts with this gp20 portal peptide, polyclonal antibody against the portal peptide appears unable to access it in the native structure, suggesting intimate association of gp20 and gp17 possibly internalizes terminase regions within the portal in the packasome complex. Both similarities and differences are seen in comparison to analogous sites which have been identified in phages T3 and lambda. Copyright 1999 Academic Press.
Suplatov, Dmitry; Sharapova, Yana; Timonina, Daria; Kopylov, Kirill; Švedas, Vytas
2018-04-01
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
A Water‐Soluble Tetraazaperopyrene Dye as Strong G‐Quadruplex DNA Binder
Hahn, Lena
2016-01-01
Abstract The interactions of the water‐soluble tetraazaperopyrene dye 1 with ct‐DNA, duplex‐[(dAdT)12 ⋅(dAdT)12], duplex‐[(dGdC)12 ⋅(dGdC)12] as well as with two G‐quadruplex‐forming sequences, namely the human telomeric 22AG and the promotor sequence c‐myc, were investigated by means of UV/visible and fluorescence spectroscopy, isothermal titration calorimetry (ITC) and molecular docking studies. Dye 1 exhibits a high affinity for G‐quadruplex structures over duplex DNA structures. Furthermore, the ligand shows promising G‐quadruplex discrimination, with an affinity towards c‐myc of 2×107 m −1 (i.e., K d=50 nm), which is higher than for 22AG (4×106 m −1). The ITC data reveal that compound 1 interacts with c‐myc in a stoichiometric ratio of 1:1 but also indicate the presence of two identical lower affinity secondary binding sites per quadruplex. In 22AG, there are two high affinity binding sites per quadruplex, that is, one on each side, with a further four weaker binding sites. For both quadruplex structures, the high affinity interactions between compound 1 and the quadruplex‐forming nucleic acid structures are weakly endothermic. Molecular docking studies suggest an end‐stacking binding mode for compound 1 interacting with quadruplex structures, and a higher affinity for the parallel conformation of c‐myc than for the mixed‐hybrid conformation of 22AG. In addition, docking studies also suggest that the reduced affinity for duplex DNA structures is due to the non‐viability of an intercalative binding mode. PMID:26997208
Sharma, Alok; Pohlentz, Gottfried; Bobbili, Kishore Babu; Jeyaprakash, A Arockia; Chandran, Thyageshwar; Mormann, Michael; Swamy, Musti J; Vijayan, M
2013-08-01
The sequence and structure of snake gourd seed lectin (SGSL), a nontoxic homologue of type II ribosome-inactivating proteins (RIPs), have been determined by mass spectrometry and X-ray crystallography, respectively. As in type II RIPs, the molecule consists of a lectin chain made up of two β-trefoil domains. The catalytic chain, which is connected through a disulfide bridge to the lectin chain in type II RIPs, is cleaved into two in SGSL. However, the integrity of the three-dimensional structure of the catalytic component of the molecule is preserved. This is the first time that a three-chain RIP or RIP homologue has been observed. A thorough examination of the sequence and structure of the protein and of its interactions with the bound methyl-α-galactose indicate that the nontoxicity of SGSL results from a combination of changes in the catalytic and the carbohydrate-binding sites. Detailed analyses of the sequences of type II RIPs of known structure and their homologues with unknown structure provide valuable insights into the evolution of this class of proteins. They also indicate some variability in carbohydrate-binding sites, which appears to contribute to the different levels of toxicity exhibited by lectins from various sources.
Role of different β-turns in β-hairpin conformation and stability studied by optical spectroscopy.
Wu, Ling; McElheny, Dan; Setnicka, Vladimír; Hilario, Jovencio; Keiderling, Timothy A
2012-01-01
Model β-hairpin peptides based on variations in the turn sequence of Cochran's tryptophan zipper peptide, SWTWENGKWTWK, were studied using electronic circular dichroism (ECD), fluorescence, and infrared (IR) spectroscopies. The trpzip2 Asn-Gly turn sequence was substituted with Thr-Gly, Aib-Gly, (D)Pro-Gly, and Gly-Asn (trpzip1) to study the impact of turn stability on β-hairpin formation. Stability and conformational changes of these hairpins were monitored by thermodynamic analyses of the temperature variation of both FTIR (amide I') and ECD spectral intensities. These changes were fit to a two-state model which yielded different T(m) values, representing the folding/unfolding process, for hairpins with different β-turns. Different β-turns show systematic contributions to hairpin structure formation, and their inclusion in hairpin design can modify the folding pathways. Aib-Gly or (D)Pro-Gly sequences stabilize the turn resulting in residual Trp-Trp interaction at high temperatures, but at the same time the β-structure (cross strand H-bonds) can become less stable due to constraints of the turn, as seen for (D)Pro-Gly. The structure of the Aib-Gly turn containing hairpin was determined by NMR and was shown to be like trpzip2 (Asn-Gly turn) as regards turn and strand geometries, but to differ from trpzip1 (Gly-Asn turn). The Munoz and Eaton statistical mechanically derived multistate model, tested as an alternate point of view, represented contributions from H-bonds and hydrophobic interactions as well as conformational change as interdependent. Use of different spectral methods that vary in dependence on these physical interactions along with the structural variations provided insight to the complex folding pathways of these small, well-folded peptides. Copyright © 2011 Wiley Periodicals, Inc.
Modular protein domains: an engineering approach toward functional biomaterials.
Lin, Charng-Yu; Liu, Julie C
2016-08-01
Protein domains and peptide sequences are a powerful tool for conferring specific functions to engineered biomaterials. Protein sequences with a wide variety of functionalities, including structure, bioactivity, protein-protein interactions, and stimuli responsiveness, have been identified, and advances in molecular biology continue to pinpoint new sequences. Protein domains can be combined to make recombinant proteins with multiple functionalities. The high fidelity of the protein translation machinery results in exquisite control over the sequence of recombinant proteins and the resulting properties of protein-based materials. In this review, we discuss protein domains and peptide sequences in the context of functional protein-based materials, composite materials, and their biological applications. Copyright © 2016 Elsevier Ltd. All rights reserved.
He, Xiaoyuan; Wang, Liqin; Wang, Shuishu
2016-04-15
The transcriptional regulator PhoP is an essential virulence factor in Mycobacterium tuberculosis, and it presents a target for the development of new anti-tuberculosis drugs and attenuated tuberculosis vaccine strains. PhoP binds to DNA as a highly cooperative dimer by recognizing direct repeats of 7-bp motifs with a 4-bp spacer. To elucidate the PhoP-DNA binding mechanism, we determined the crystal structure of the PhoP-DNA complex. The structure revealed a tandem PhoP dimer that bound to the direct repeat. The surprising tandem arrangement of the receiver domains allowed the four domains of the PhoP dimer to form a compact structure, accounting for the strict requirement of a 4-bp spacer and the highly cooperative binding of the dimer. The PhoP-DNA interactions exclusively involved the effector domain. The sequence-recognition helix made contact with the bases of the 7-bp motif in the major groove, and the wing interacted with the adjacent minor groove. The structure provides a starting point for the elucidation of the mechanism by which PhoP regulates the virulence of M. tuberculosis and guides the design of screening platforms for PhoP inhibitors.
RNA Bricks—a database of RNA 3D motifs and their interactions
Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.
2014-01-01
The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091
Yadav, Saurabh; Kumari, Pragati; Kushwaha, Hemant Ritturaj
2013-01-01
Glutaredoxins are enzymatic antioxidants which are small, ubiquitous, glutathione dependent and essentially classified under thioredoxin-fold superfamily. Glutaredoxins are classified into two types: dithiol and monothiol. Monothiol glutaredoxins which carry the signature "CGFS" as a redox active motif is known for its role in oxidative stress, inside the cell. In the present analysis, the 138 amino acid long monothiol glutaredoxin, AgGRX1 from Ashbya gossypii was identified and has been used for the analysis. The multiple sequence alignment of the AgGRX1 protein sequence revealed the characteristic motif of typical monothiol glutaredoxin as observed in various other organisms. The proposed structure of the AgGRX1 protein was used to analyze signature folds related to the thioredoxin superfamily. Further, the study highlighted the structural features pertaining to the complex mechanism of glutathione docking and interacting residues.
Structure and inhibition analysis of the mouse SAD-B C-terminal fragment.
Ma, Hui; Wu, Jing-Xiang; Wang, Jue; Wang, Zhi-Xin; Wu, Jia-Wei
2016-10-01
The SAD (synapses of amphids defective) kinases, including SAD-A and SAD-B, play important roles in the regulation of neuronal development, cell cycle, and energy metabolism. Our recent study of mouse SAD-A identified a unique autoinhibitory sequence (AIS), which binds at the junction of the kinase domain (KD) and the ubiquitin-associated (UBA) domain and exerts autoregulation in cooperation with UBA. Here, we report the crystal structure of the mouse SAD-B C-terminal fragment including the AIS and the kinase-associated domain 1 (KA1) at 2.8 Å resolution. The KA1 domain is structurally conserved, while the isolated AIS sequence is highly flexible and solvent-accessible. Our biochemical studies indicated that the SAD-B AIS exerts the same autoinhibitory role as that in SAD-A. We believe that the flexible isolated AIS sequence is readily available for interaction with KD-UBA and thus inhibits SAD-B activity.
Community detection in sequence similarity networks based on attribute clustering
Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.
2017-07-24
Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.
Neuwald, Andrew F; Altschul, Stephen F
2016-12-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).
Community detection in sequence similarity networks based on attribute clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.
Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Rudnizky, Sergei; Khamis, Hadeel; Malik, Omri; Squires, Allison H; Meller, Amit; Melamed, Philippa
2018-01-01
Abstract Most functional transcription factor (TF) binding sites deviate from their ‘consensus’ recognition motif, although their sites and flanking sequences are often conserved across species. Here, we used single-molecule DNA unzipping with optical tweezers to study how Egr-1, a TF harboring three zinc fingers (ZF1, ZF2 and ZF3), is modulated by the sequence and context of its functional sites in the Lhb gene promoter. We find that both the core 9 bp bound to Egr-1 in each of the sites, and the base pairs flanking them, modulate the affinity and structure of the protein–DNA complex. The effect of the flanking sequences is asymmetric, with a stronger effect for the sequence flanking ZF3. Characterization of the dissociation time of Egr-1 revealed that a local, mechanical perturbation of the interactions of ZF3 destabilizes the complex more effectively than a perturbation of the ZF1 interactions. Our results reveal a novel role for ZF3 in the interaction of Egr-1 with other proteins and the DNA, providing insight on the regulation of Lhb and other genes by Egr-1. Moreover, our findings reveal the potential of small changes in DNA sequence to alter transcriptional regulation, and may shed light on the organization of regulatory elements at promoters. PMID:29253225
Statistical physics of nucleosome positioning and chromatin structure
NASA Astrophysics Data System (ADS)
Morozov, Alexandre
2012-02-01
Genomic DNA is packaged into chromatin in eukaryotic cells. The fundamental building block of chromatin is the nucleosome, a 147 bp-long DNA molecule wrapped around the surface of a histone octamer. Arrays of nucleosomes are positioned along DNA according to their sequence preferences and folded into higher-order chromatin fibers whose structure is poorly understood. We have developed a framework for predicting sequence-specific histone-DNA interactions and the effective two-body potential responsible for ordering nucleosomes into regular higher-order structures. Our approach is based on the analogy between nucleosomal arrays and a one-dimensional fluid of finite-size particles with nearest-neighbor interactions. We derive simple rules which allow us to predict nucleosome occupancy solely from the dinucleotide content of the underlying DNA sequences.Dinucleotide content determines the degree of stiffness of the DNA polymer and thus defines its ability to bend into the nucleosomal superhelix. As expected, the nucleosome positioning rules are universal for chromatin assembled in vitro on genomic DNA from baker's yeast and from the nematode worm C.elegans, where nucleosome placement follows intrinsic sequence preferences and steric exclusion. However, the positioning rules inferred from in vivo C.elegans chromatin are affected by global nucleosome depletion from chromosome arms relative to central domains, likely caused by the attachment of the chromosome arms to the nuclear membrane. Furthermore, intrinsic nucleosome positioning rules are overwritten in transcribed regions, indicating that chromatin organization is actively managed by the transcriptional and splicing machinery.
Przybilski, Rita; Hammann, Christian
2007-01-01
Tertiary interacting elements are important features of functional RNA molecules, for example, in all small nucleolytic ribozymes. The recent crystal structure of a tertiary stabilized type I hammerhead ribozyme revealed a conventional Watson–Crick base pair in the catalytic core, formed between nucleotides C3 and G8. We show that any Watson–Crick base pair between these positions retains cleavage competence in two type III ribozymes. In the Arabidopsis thaliana sequence, only moderate differences in cleavage rates are observed for the different base pairs, while the peach latent mosaic viroid (PLMVd) ribozyme exhibits a preference for a pyrimidine at position 3 and a purine at position 8. To understand these differences, we created a series of chimeric ribozymes in which we swapped sequence elements that surround the catalytic core. The kinetic characterization of the resulting ribozymes revealed that the tertiary interacting loop sequences of the PLMVd ribozyme are sufficient to induce the preference for Y3–R8 base pairs in the A. thaliana hammerhead ribozyme. In contrast to this, only when the entire stem–loops I and II of the A. thaliana sequences are grafted on the PLMVd ribozyme is any Watson–Crick base pair similarly tolerated. The data provide evidence for a complex interplay of secondary and tertiary structure elements that lead, mediated by long-range effects, to an individual modulation of the local structure in the catalytic core of different hammerhead ribozymes. PMID:17666711
Samson, Marie-Laure
2008-01-01
Background The Drosophila gene embryonic lethal abnormal visual system (elav) is the prototype of a gene family present in all metazoans. Its members encode structurally conserved neuronal proteins with three RNA Recognition Motifs (RRM) but they paradoxically act at diverse levels of post-transcriptional regulation. In an attempt to understand the history of this family, we searched for orthologs in eleven completely sequenced genomes, including those of humans, D. melanogaster and C. elegans, for which cDNAs are available. Results We analyzed 23 orthologs/paralogs of elav, and found evidence of gain/loss of gene copy number. For one set of genes, including elav itself, the coding sequences are free of introns and their products most resemble ELAV. The remaining genes show remarkable conservation of their exon organization, and their products most resemble FNE and RBP9, proteins encoded by the two elav paralogs of Drosophila. Remarkably, three of the conserved exon junctions are both close to structural elements, involved respectively in protein-RNA interactions and in the regulation of sub-cellular localization, and in the vicinity of diverse sequence variations. Conclusion The data indicate that the essential elav gene of Drosophila is newly emerged, restricted to dipterans and of retrotransposed origin. We propose that the conserved exon junctions constitute potential sites for sequence/function modifications, and that RRM binding proteins, whose function relies upon plastic RNA-protein interactions, may have played an important role in brain evolution. PMID:18715504
Emelyanenko, A V; Osipov, M A
2003-11-01
A general phenomenological description and a simple molecular model is proposed for the "discrete" flexoelectric effect in tilted smectic liquid crystal phases. This effect defines a polarization in a smectic layer induced by a difference of director orientations in the two smectic layers adjacent to it. It is shown that the "discrete" flexoelectric effect is determined by electrostatic dipole-quadrupole interaction between positionally correlated molecules located in adjacent smectic layers, while the corresponding dipole-dipole interaction is responsible for a coupling between polarization vectors in neighboring layers. It is shown that a simple phenomenological model of a ferrielectric smectic liquid crystal, which has recently been proposed in the literature, can be used to describe the whole sequence of intermediate chiral smectic C* phases with increasing periods, and to determine the nonplanar structure of each phase without additional assumptions. In this sequence the phases with three- and four-layer periodicities have the same structure, as observed in the experiment. The theory predicts also the structure of intermediate phases with longer periods that have not been studied experimentally so far. The structures of intermediate phases with periodicities of up to nine layers are presented together with the phase diagrams, and a relationship between molecular chirality and the three-dimensional structure of intermediate phases is discussed. It is considered also how the coupling between the spontaneous polarization determined by molecular chirality and the induced polarization determined by the discrete flexoelectric effect stabilizes the nonplanar structure of intermediate phases.
Wang, Yongxiang; Li, Jishan; Wang, Hao; Jin, Jianyu; Liu, Jinhua; Wang, Kemin; Tan, Weihong; Yang, Ronghua
2010-08-01
Conformationally constraint nucleic acid probes were usually designed by forming an intramolecular duplex based on Watson-Crick hydrogen bonds. The disadvantages of these approaches are the inflexibility and instability in complex environment of the Watson-Crick-based duplex. We report that this hydrogen bonding pattern can be replaced by metal-ligation between specific metal ions and the natural bases. To demonstrate the feasibility of this principle, two linear oligonucleotides and silver ions were examined as models for DNA hybridization assay and adenosine triphosphate detection. The both nucleic acids contain target binding sequences in the middle and cytosine (C)-rich sequences at the lateral portions. The strong interaction between Ag(+) ions and cytosines forms stable C-Ag(+)-C structures, which promises the oligonucleotides to form conformationally constraint formations. In the presence of its target, interaction between the loop sequences and the target unfolds the C-Ag(+)-C structures, and the corresponding probes unfolding can be detected by a change in their fluorescence emission. We discuss the thermodynamic and kinetic opportunities that are provided by using Ag(+) ion complexes instead of traditional Watson-Crick-based duplex. In particular, the intrinsic feature of the metal-ligation motif facilitates the design of functional nucleic acids probes by independently varying the concentration of Ag(+) ions in the medium.
The binding modes of carbazole derivatives with telomere G-quadruplex
NASA Astrophysics Data System (ADS)
Zhang, Xiu-feng; Zhang, Hui-juan; Xiang, Jun-feng; Li, Qian; Yang, Qian-fan; Shang, Qian; Zhang, Yan-xia; Tang, Ya-lin
2010-10-01
It is reported that carbazole derivatives can stabilize G-quadruplex DNA structure formed by human telomeric sequence, and therefore, they have the potential to serve as anti-cancer agents. In this present study, in order to further explore the binding mode between carbazole derivatives and G-quadruplex formed by human telomeric sequence, two carbazole iodides (BMVEC, MVEC) molecules were synthesized and used to investigate the interaction with the human telomeric parallel and antiparallel G-quadruplex structures by NMR, CD and molecular modeling study. Interestingly, it is the pivotal the cationic charge pendant groups of pyridinium rings of carbazole that plays an essential role in the stabilizing and binding mode of the human telomeric sequences G-quadruplex structure. It was found that BMVEC with two cationic charge pendant groups of pyridinium rings of 9-ethylcarbazole cannot only stabilize parallel G-quadruple of Hum6 by groove binding and G-tetrad stacking modes and antiparallel G-quadruplex of Hum22 by groove binding, but also induce the formation of mixed G-quadruplex of Hum22. While MVEC with one cationic charge pendant groups of pyridinium ring only can bind with the parallel G-quadruplex of Hum6 by the stacking onto the G4 G-tetrad and could not interact with the G-quadruplex of Hum22.
Learning of pitch and time structures in an artificial grammar setting.
Prince, Jon B; Stevens, Catherine J; Jones, Mari Riess; Tillmann, Barbara
2018-04-12
Despite the empirical evidence for the power of the cognitive capacity of implicit learning of structures and regularities in several modalities and materials, it remains controversial whether implicit learning extends to the learning of temporal structures and regularities. We investigated whether (a) an artificial grammar can be learned equally well when expressed in duration sequences as when expressed in pitch sequences, (b) learning of the artificial grammar in either duration or pitch (as the primary dimension) sequences can be influenced by the properties of the secondary dimension (invariant vs. randomized), and (c) learning can be boosted when the artificial grammar is expressed in both pitch and duration. After an exposure phase with grammatical sequences, learning in a subsequent test phase was assessed in a grammaticality judgment task. Participants in both the pitch and duration conditions showed incidental (not fully implicit) learning of the artificial grammar when the secondary dimension was invariant, but randomizing the pitch sequence prevented learning of the artificial grammar in duration sequences. Expressing the artificial grammar in both pitch and duration resulted in disproportionately better performance, suggesting an interaction between the learning of pitch and temporal structure. The findings are relevant to research investigating the learning of temporal structures and the learning of structures presented simultaneously in 2 dimensions (e.g., space and time, space and objects). By investigating learning, the findings provide further insight into the potential specificity of pitch and time processing, and their integrated versus independent processing, as previously debated in music cognition research. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
SANSparallel: interactive homology search against Uniprot
Somervuo, Panu; Holm, Liisa
2015-01-01
Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811
Insights in connecting phenotypes in bacteria to coevolutionary information
NASA Astrophysics Data System (ADS)
Cheng, Ryan; Morcos, Faruck; Hayes, Ryan; Helm, Rodney; Levine, Herbert; Onuchic, Jose
It has long been known that protein sequences are far from random. These sequences have been evolutionarily selected to maintain their ability to fold into stable, three-dimensional folded structures as well as their ability to form macromolecular assemblies, perform catalytic functions, etc. For these reasons, there exist quantifiable mutational patterns in the collection of sequence data for a protein family arising from the need to maintain favorable residue-residue interactions to facilitate folding as well as cellular function. Here, we focus on studying the correlated mutational patterns that give rise to interaction specificity in bacterial two-component signaling (TCS) systems. TCS proteins have evolved to be able to preferentially bind and transfer a phosphate group to their signaling partner while avoiding phosphotransfer with non-partners. We infer a Potts model Hamiltonian governing the correlated mutational patterns that are observed in the sequence data of TCS partners and apply this model to recently published in vivo mutational data. Our findings further support the notion that statistical models built from sequence data can be used to predict bacterial phenotypes as well as engineer interaction specificity between non-partner TCS proteins. This research has been supported by the NSF INSPIRE Award (MCB-1241332) and by the CTBP sponsored by the NSF (Grant PHY- 1427654).
PredictProtein—an open resource for online prediction of protein structural and functional features
Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard
2014-01-01
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431
A bacterial Argonaute with noncanonical guide RNA specificity
Kaya, Emine; Doxzen, Kevin W.; Knoll, Kilian R.; Wilson, Ross C.; Strutt, Steven C.; Kranzusch, Philip J.; Doudna, Jennifer A.
2016-01-01
Eukaryotic Argonaute proteins induce gene silencing by small RNA-guided recognition and cleavage of mRNA targets. Although structural similarities between human and prokaryotic Argonautes are consistent with shared mechanistic properties, sequence and structure-based alignments suggested that Argonautes encoded within CRISPR-cas [clustered regularly interspaced short palindromic repeats (CRISPR)-associated] bacterial immunity operons have divergent activities. We show here that the CRISPR-associated Marinitoga piezophila Argonaute (MpAgo) protein cleaves single-stranded target sequences using 5′-hydroxylated guide RNAs rather than the 5′-phosphorylated guides used by all known Argonautes. The 2.0-Å resolution crystal structure of an MpAgo–RNA complex reveals a guide strand binding site comprising residues that block 5′ phosphate interactions. Using structure-based sequence alignment, we were able to identify other putative MpAgo-like proteins, all of which are encoded within CRISPR-cas loci. Taken together, our data suggest the evolution of an Argonaute subclass with noncanonical specificity for a 5′-hydroxylated guide. PMID:27035975
PROFESS: a PROtein Function, Evolution, Structure and Sequence database
Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter
2010-01-01
The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718
Ahmed, Saami; Kaushik, Mahima; Chaudhary, Swati; Kukreti, Shrikant
2018-05-01
Sequence recognition and conformational polymorphism enable DNA to emerge out as a substantial tool in fabricating the devices within nano-dimensions. These DNA associated nano devices work on the principle of conformational switches, which can be facilitated by many factors like sequence of DNA/RNA strand, change in pH or temperature, enzyme or ligand interactions etc. Thus, controlling these DNA conformational changes to acquire the desired function is significant for evolving DNA hybridization biosensor, used in genetic screening and molecular diagnosis. For exploring this conformational switching ability of cytosine-rich DNA oligonucleotides as a function of pH for their potential usage as biosensors, this study has been designed. A C-rich stretch of DNA sequence (5'-TCCCCCAATTAATTCCCCCA-3'; SG20c) has been investigated using UV-Thermal denaturation, poly-acrylamide gel electrophoresis and CD spectroscopy. The SG20c sequence is shown to adopt various topologies of i-motif structure at low pH. This pH dependent transition of SG20c from unstructured single strand to unimolecular and bimolecular i-motif structures can further be exploited for its utilization as switching on/off pH-based biosensors. Copyright © 2018. Published by Elsevier B.V.
Peer-Peer Interaction in a Speaking Test: The Case of the "First Certificate in English" Examination
ERIC Educational Resources Information Center
Galaczi, Evelina D.
2008-01-01
This discourse-based study turns its attention to paired test-taker discourse in the First Certificate in English speaking test. Its primary aim is to focus on fundamental conversation management concepts, such as overall structural organisation, turn-taking, sequencing, and topic organisation found in the dyadic test-taker interaction in 30 pairs…
Fe65-PTB2 Dimerization Mimics Fe65-APP Interaction.
Feilen, Lukas P; Haubrich, Kevin; Strecker, Paul; Probst, Sabine; Eggert, Simone; Stier, Gunter; Sinning, Irmgard; Konietzko, Uwe; Kins, Stefan; Simon, Bernd; Wild, Klemens
2017-01-01
Physiological function and pathology of the Alzheimer's disease causing amyloid precursor protein (APP) are correlated with its cytosolic adaptor Fe65 encompassing a WW and two phosphotyrosine-binding domains (PTBs). The C-terminal Fe65-PTB2 binds a large portion of the APP intracellular domain (AICD) including the GYENPTY internalization sequence fingerprint. AICD binding to Fe65-PTB2 opens an intra-molecular interaction causing a structural change and altering Fe65 activity. Here we show that in the absence of the AICD, Fe65-PTB2 forms a homodimer in solution and determine its crystal structure at 2.6 Å resolution. Dimerization involves the unwinding of a C-terminal α-helix that mimics binding of the AICD internalization sequence, thus shielding the hydrophobic binding pocket. Specific dimer formation is validated by nuclear magnetic resonance (NMR) techniques and cell-based analyses reveal that Fe65-PTB2 together with the WW domain are necessary and sufficient for dimerization. Together, our data demonstrate that Fe65 dimerizes via its APP interaction site, suggesting that besides intra- also intermolecular interactions between Fe65 molecules contribute to homeostatic regulation of APP mediated signaling.
Fe65-PTB2 Dimerization Mimics Fe65-APP Interaction
Feilen, Lukas P.; Haubrich, Kevin; Strecker, Paul; Probst, Sabine; Eggert, Simone; Stier, Gunter; Sinning, Irmgard; Konietzko, Uwe; Kins, Stefan; Simon, Bernd; Wild, Klemens
2017-01-01
Physiological function and pathology of the Alzheimer’s disease causing amyloid precursor protein (APP) are correlated with its cytosolic adaptor Fe65 encompassing a WW and two phosphotyrosine-binding domains (PTBs). The C-terminal Fe65-PTB2 binds a large portion of the APP intracellular domain (AICD) including the GYENPTY internalization sequence fingerprint. AICD binding to Fe65-PTB2 opens an intra-molecular interaction causing a structural change and altering Fe65 activity. Here we show that in the absence of the AICD, Fe65-PTB2 forms a homodimer in solution and determine its crystal structure at 2.6 Å resolution. Dimerization involves the unwinding of a C-terminal α-helix that mimics binding of the AICD internalization sequence, thus shielding the hydrophobic binding pocket. Specific dimer formation is validated by nuclear magnetic resonance (NMR) techniques and cell-based analyses reveal that Fe65-PTB2 together with the WW domain are necessary and sufficient for dimerization. Together, our data demonstrate that Fe65 dimerizes via its APP interaction site, suggesting that besides intra- also intermolecular interactions between Fe65 molecules contribute to homeostatic regulation of APP mediated signaling. PMID:28553201
Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.
2003-01-01
STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333
Levels of integration in cognitive control and sequence processing in the prefrontal cortex.
Bahlmann, Jörg; Korb, Franziska M; Gratton, Caterina; Friederici, Angela D
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex.
Levels of Integration in Cognitive Control and Sequence Processing in the Prefrontal Cortex
Bahlmann, Jörg; Korb, Franziska M.; Gratton, Caterina; Friederici, Angela D.
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex. PMID:22952762
``Sequence space soup'' of proteins and copolymers
NASA Astrophysics Data System (ADS)
Chan, Hue Sun; Dill, Ken A.
1991-09-01
To study the protein folding problem, we use exhaustive computer enumeration to explore ``sequence space soup,'' an imaginary solution containing the ``native'' conformations (i.e., of lowest free energy) under folding conditions, of every possible copolymer sequence. The model is of short self-avoiding chains of hydrophobic (H) and polar (P) monomers configured on the two-dimensional square lattice. By exhaustive enumeration, we identify all native structures for every possible sequence. We find that random sequences of H/P copolymers will bear striking resemblance to known proteins: Most sequences under folding conditions will be approximately as compact as known proteins, will have considerable amounts of secondary structure, and it is most probable that an arbitrary sequence will fold to a number of lowest free energy conformations that is of order one. In these respects, this simple model shows that proteinlike behavior should arise simply in copolymers in which one monomer type is highly solvent averse. It suggests that the structures and uniquenesses of native proteins are not consequences of having 20 different monomer types, or of unique properties of amino acid monomers with regard to special packing or interactions, and thus that simple copolymers might be designable to collapse to proteinlike structures and properties. A good strategy for designing a sequence to have a minimum possible number of native states is to strategically insert many P monomers. Thus known proteins may be marginally stable due to a balance: More H residues stabilize the desired native state, but more P residues prevent simultaneous stabilization of undesired native states.
Adam, Benoit; Charloteaux, Benoit; Beaufays, Jerome; Vanhamme, Luc; Godfroid, Edmond; Brasseur, Robert; Lins, Laurence
2008-01-01
Background Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi. The pairwise sequence identity is low among this family, often below 30%, despite a well conserved tertiary structure. Under the 30% identity threshold, alignment methods do not correctly assign and align proteins. The only safe way to assign a sequence to that family is by experimental determination. However, these procedures are long and costly and cannot always be applied. A way to circumvent the experimental approach is sequence and structure analyze. To further help in that task, the residues implicated in the stabilisation of the lipocalin fold were determined. This was done by analyzing the conserved interactions for ten lipocalins having a maximum pairwise identity of 28% and various functions. Results It was determined that two hydrophobic clusters of residues are conserved by analysing the ten lipocalin structures and sequences. One cluster is internal to the barrel, involving all strands and the 310 helix. The other is external, involving four strands and the helix lying parallel to the barrel surface. These clusters are also present in RaHBP2, a unusual "outlier" lipocalin from tick Rhipicephalus appendiculatus. This information was used to assess assignment of LIR2 a protein from Ixodes ricinus and to build a 3D model that helps to predict function. FTIR data support the lipocalin fold for this protein. Conclusion By sequence and structural analyzes, two conserved clusters of hydrophobic residues in interactions have been identified in lipocalins. Since the residues implicated are not conserved for function, they should provide the minimal subset necessary to confer the lipocalin fold. This information has been used to assign LIR2 to lipocalins and to investigate its structure/function relationship. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins. PMID:18190694
Liquid-gas phase transition in asymmetric nuclear matter at finite temperature
NASA Astrophysics Data System (ADS)
Maruyama, Toshiki; Tatsumi, Toshitaka; Chiba, Satoshi
2010-03-01
Liquid-gas phase transition is discussed in warm asymmetric nuclear matter. Some peculiar features are figured out from the viewpoint of the basic thermodynamics about the phase equilibrium. We treat the mixed phase of the binary system based on the Gibbs conditions. When the Coulomb interaction is included, the mixed phase is no more uniform and the sequence of the pasta structures appears. Comparing the results with those given by the simple bulk calculation without the Coulomb interaction, we extract specific features of the pasta structures at finite temperature.
Extreme disorder in an ultrahigh-affinity protein complex
NASA Astrophysics Data System (ADS)
Borgia, Alessandro; Borgia, Madeleine B.; Bugge, Katrine; Kissling, Vera M.; Heidarsson, Pétur O.; Fernandes, Catarina B.; Sottini, Andrea; Soranno, Andrea; Buholzer, Karin J.; Nettels, Daniel; Kragelund, Birthe B.; Best, Robert B.; Schuler, Benjamin
2018-03-01
Molecular communication in biology is mediated by protein interactions. According to the current paradigm, the specificity and affinity required for these interactions are encoded in the precise complementarity of binding interfaces. Even proteins that are disordered under physiological conditions or that contain large unstructured regions commonly interact with well-structured binding sites on other biomolecules. Here we demonstrate the existence of an unexpected interaction mechanism: the two intrinsically disordered human proteins histone H1 and its nuclear chaperone prothymosin-α associate in a complex with picomolar affinity, but fully retain their structural disorder, long-range flexibility and highly dynamic character. On the basis of closely integrated experiments and molecular simulations, we show that the interaction can be explained by the large opposite net charge of the two proteins, without requiring defined binding sites or interactions between specific individual residues. Proteome-wide sequence analysis suggests that this interaction mechanism may be abundant in eukaryotes.
Wlodawer, A.; Pavlovsky, A.; Gustchina, A.
1993-01-01
Crystal and NMR structures of helical cytokines--interleukin-4 (IL-4), granulocyte-macrophage colony-stimulating factor (GM-CSF), and interleukin-2 (IL-2)--have been compared. Root mean square deviations in the C alpha coordinates for the conserved regions of the helices were 1-2 A between different cytokines, about twice the differences observed for independently determined crystal and solution structures of IL-4. Considerable similarity in amino acid sequence in the areas expected to interact with the receptors was detected, and the available mutagenesis data for these cytokines were correlated with structure conservation. Models of cytokine-receptor interactions were postulated for IL-4 based on its structure as well as on the published structure of human growth hormone interacting with its receptors (de Vos, A.M., Ultsch, M., & Kossiakoff, A.A., 1992, Science 255, 306-312). Patches of positively charged residues on the surfaces of helices C and D of IL-4 may be responsible for the interactions with the negatively charged residues found in the complementary parts of the IL-4 receptors. PMID:8401223
Shi, Jian-Yu; Yiu, Siu-Ming; Li, Yiming; Leung, Henry C M; Chin, Francis Y L
2015-07-15
Predicting drug-target interaction using computational approaches is an important step in drug discovery and repositioning. To predict whether there will be an interaction between a drug and a target, most existing methods identify similar drugs and targets in the database. The prediction is then made based on the known interactions of these drugs and targets. This idea is promising. However, there are two shortcomings that have not yet been addressed appropriately. Firstly, most of the methods only use 2D chemical structures and protein sequences to measure the similarity of drugs and targets respectively. However, this information may not fully capture the characteristics determining whether a drug will interact with a target. Secondly, there are very few known interactions, i.e. many interactions are "missing" in the database. Existing approaches are biased towards known interactions and have no good solutions to handle possibly missing interactions which affect the accuracy of the prediction. In this paper, we enhance the similarity measures to include non-structural (and non-sequence-based) information and introduce the concept of a "super-target" to handle the problem of possibly missing interactions. Based on evaluations on real data, we show that our similarity measure is better than the existing measures and our approach is able to achieve higher accuracy than the two best existing algorithms, WNN-GIP and KBMF2K. Our approach is available at http://web.hku.hk/∼liym1018/projects/drug/drug.html or http://www.bmlnwpu.org/us/tools/PredictingDTI_S2/METHODS.html. Copyright © 2015 Elsevier Inc. All rights reserved.
Bamford, Vicki A.; Armour, Maria; Mitchell, Sue A.; Cartron, Michaël; Andrews, Simon C.; Watson, Kimberly A.
2008-01-01
YqjH is a cytoplasmic FAD-containing protein from Escherichia coli; based on homology to ViuB of Vibrio cholerae, it potentially acts as a ferri-siderophore reductase. This work describes its overexpression, purification, crystallization and structure solution at 3.0 Å resolution. YqjH shares high sequence similarity with a number of known siderophore-interacting proteins and its structure was solved by molecular replacement using the siderophore-interacting protein from Shewanella putrefaciens as the search model. The YqjH structure resembles those of other members of the NAD(P)H:flavin oxidoreductase superfamily. PMID:18765906
Binding properties of SUMO-interacting motifs (SIMs) in yeast.
Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich
2015-03-01
Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.
The zero age main sequence of WIMP burners
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fairbairn, Malcolm; Scott, Pat; Edsjoe, Joakim
2008-02-15
We modify a stellar structure code to estimate the effect upon the main sequence of the accretion of weakly-interacting dark matter onto stars and its subsequent annihilation. The effect upon the stars depends upon whether the energy generation rate from dark matter annihilation is large enough to shut off the nuclear burning in the star. Main sequence weakly-interacting massive particles (WIMP) burners look much like proto-stars moving on the Hayashi track, although they are in principle completely stable. We make some brief comments about where such stars could be found, how they might be observed and more detailed simulations whichmore » are currently in progress. Finally we comment on whether or not it is possible to link the paradoxically hot, young stars found at the galactic center with WIMP burners.« less
Modelling and enhanced molecular dynamics to steer structure-based drug discovery.
Kalyaanamoorthy, Subha; Chen, Yi-Ping Phoebe
2014-05-01
The ever-increasing gap between the availabilities of the genome sequences and the crystal structures of proteins remains one of the significant challenges to the modern drug discovery efforts. The knowledge of structure-dynamics-functionalities of proteins is important in order to understand several key aspects of structure-based drug discovery, such as drug-protein interactions, drug binding and unbinding mechanisms and protein-protein interactions. This review presents a brief overview on the different state of the art computational approaches that are applied for protein structure modelling and molecular dynamics simulations of biological systems. We give an essence of how different enhanced sampling molecular dynamics approaches, together with regular molecular dynamics methods, assist in steering the structure based drug discovery processes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Identifying functionally informative evolutionary sequence profiles.
Gil, Nelson; Fiser, Andras
2018-04-15
Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
Adelman, K; Salmon, B; Baines, J D
2001-03-13
The product of the herpes simplex virus type 1 U(L)28 gene is essential for cleavage of concatemeric viral DNA into genome-length units and packaging of this DNA into viral procapsids. To address the role of U(L)28 in this process, purified U(L)28 protein was assayed for the ability to recognize conserved herpesvirus DNA packaging sequences. We report that DNA fragments containing the pac1 DNA packaging motif can be induced by heat treatment to adopt novel DNA conformations that migrate faster than the corresponding duplex in nondenaturing gels. Surprisingly, these novel DNA structures are high-affinity substrates for U(L)28 protein binding, whereas double-stranded DNA of identical sequence composition is not recognized by U(L)28 protein. We demonstrate that only one strand of the pac1 motif is responsible for the formation of novel DNA structures that are bound tightly and specifically by U(L)28 protein. To determine the relevance of the observed U(L)28 protein-pac1 interaction to the cleavage and packaging process, we have analyzed the binding affinity of U(L)28 protein for pac1 mutants previously shown to be deficient in cleavage and packaging in vivo. Each of the pac1 mutants exhibited a decrease in DNA binding by U(L)28 protein that correlated directly with the reported reduction in cleavage and packaging efficiency, thereby supporting a role for the U(L)28 protein-pac1 interaction in vivo. These data therefore suggest that the formation of novel DNA structures by the pac1 motif confers added specificity on recognition of DNA packaging sequences by the U(L)28-encoded component of the herpesvirus cleavage and packaging machinery.
Li, Yang; Yang, Jianyi
2017-04-24
The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.
Effect of the SH3-SH2 domain linker sequence on the structure of Hck kinase.
Meiselbach, Heike; Sticht, Heinrich
2011-08-01
The coordination of activity in biological systems requires the existence of different signal transduction pathways that interact with one another and must be precisely regulated. The Src-family tyrosine kinases, which are found in many signaling pathways, differ in their physiological function despite their high overall structural similarity. In this context, the differences in the SH3-SH2 domain linkers might play a role for differential regulation, but the structural consequences of linker sequence remain poorly understood. We have therefore performed comparative molecular dynamics simulations of wildtype Hck and of a mutant Hck in which the SH3-SH2 domain linker is replaced by the corresponding sequence from the homologous kinase Lck. These simulations reveal that linker replacement not only affects the orientation of the SH3 domain itself, but also leads to an alternative conformation of the activation segment in the Hck kinase domain. The sequence of the SH3-SH2 domain linker thus exerts a remote effect on the active site geometry and might therefore play a role in modulating the structure of the inactive kinase or in fine-tuning the activation process itself.
Evolution of coreceptor utilization to escape CCR5 antagonist therapy.
Zhang, Jie; Gao, Xiang; Martin, John; Rosa, Bruce; Chen, Zheng; Mitreva, Makedonka; Henrich, Timothy; Kuritzkes, Daniel; Ratner, Lee
2016-07-01
The HIV-1 envelope interacts with coreceptors CCR5 and CXCR4 in a dynamic, multi-step process, its molecular details not clearly delineated. Use of CCR5 antagonists results in tropism shift and therapeutic failure. Here we describe a novel approach using full-length patient-derived gp160 quasispecies libraries cloned into HIV-1 molecular clones, their separation based on phenotypic tropism in vitro, and deep sequencing of the resultant variants for structure-function analyses. Analysis of functionally validated envelope sequences from patients who failed CCR5 antagonist therapy revealed determinants strongly associated with coreceptor specificity, especially at the gp120-gp41 and gp41-gp41 interaction surfaces that invite future research on the roles of subunit interaction and envelope trimer stability in coreceptor usage. This study identifies important structure-function relationships in HIV-1 envelope, and demonstrates proof of concept for a new integrated analysis method that facilitates laboratory discovery of resistant mutants to aid in development of other therapeutic agents. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Insights into Structural and Mechanistic Features of Viral IRES Elements
Martinez-Salas, Encarnacion; Francisco-Velilla, Rosario; Fernandez-Chamorro, Javier; Embarek, Azman M.
2018-01-01
Internal ribosome entry site (IRES) elements are cis-acting RNA regions that promote internal initiation of protein synthesis using cap-independent mechanisms. However, distinct types of IRES elements present in the genome of various RNA viruses perform the same function despite lacking conservation of sequence and secondary RNA structure. Likewise, IRES elements differ in host factor requirement to recruit the ribosomal subunits. In spite of this diversity, evolutionarily conserved motifs in each family of RNA viruses preserve sequences impacting on RNA structure and RNA–protein interactions important for IRES activity. Indeed, IRES elements adopting remarkable different structural organizations contain RNA structural motifs that play an essential role in recruiting ribosomes, initiation factors and/or RNA-binding proteins using different mechanisms. Therefore, given that a universal IRES motif remains elusive, it is critical to understand how diverse structural motifs deliver functions relevant for IRES activity. This will be useful for understanding the molecular mechanisms beyond cap-independent translation, as well as the evolutionary history of these regulatory elements. Moreover, it could improve the accuracy to predict IRES-like motifs hidden in genome sequences. This review summarizes recent advances on the diversity and biological relevance of RNA structural motifs for viral IRES elements. PMID:29354113
The identification and functional annotation of RNA structures conserved in vertebrates
Seemann, Stefan E.; Mirza, Aashiq H.; Hansen, Claus; Bang-Berthelsen, Claus H.; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T.; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L.; Gorodkin, Jan
2017-01-01
Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. PMID:28487280
Plantinga, Matthew J; Korennykh, Alexei V; Piccirilli, Joseph A; Correll, Carl C
2008-08-26
Restrictocin, a member of the alpha-sarcin family of site-specific endoribonucleases, uses electrostatic interactions to bind to the ribosome and to RNA oligonucleotides, including the minimal specific substrate, the sarcin/ricin loop (SRL) of 23S-28S rRNA. Restrictocin binds to the SRL by forming a ground-state E:S complex that is stabilized predominantly by Coulomb interactions and depends on neither the sequence nor structure of the RNA, suggesting a nonspecific complex. The 22 cationic residues of restrictocin are dispersed throughout this protein surface, complicating a priori identification of a Coulomb interacting surface. Structural studies have identified an enzyme-substrate interface, which is expected to overlap with the electrostatic E:S interface. Here, we identified restrictocin residues that contribute to binding in the E:S complex by determining the salt dependence [partial differential log(k 2/ K 1/2)/ partial differential log[KCl
Kumar, Avishek; Butler, Brandon M.; Kumar, Sudhir; Ozkan, S. Banu
2016-01-01
Summary Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. PMID:26684487
Stability of local secondary structure determines selectivity of viral RNA chaperones.
Bravo, Jack P K; Borodavka, Alexander; Barth, Anders; Calabrese, Antonio N; Mojzes, Peter; Cockburn, Joseph J B; Lamb, Don C; Tuma, Roman
2018-05-18
To maintain genome integrity, segmented double-stranded RNA viruses of the Reoviridae family must accurately select and package a complete set of up to a dozen distinct genomic RNAs. It is thought that the high fidelity segmented genome assembly involves multiple sequence-specific RNA-RNA interactions between single-stranded RNA segment precursors. These are mediated by virus-encoded non-structural proteins with RNA chaperone-like activities, such as rotavirus (RV) NSP2 and avian reovirus σNS. Here, we compared the abilities of NSP2 and σNS to mediate sequence-specific interactions between RV genomic segment precursors. Despite their similar activities, NSP2 successfully promotes inter-segment association, while σNS fails to do so. To understand the mechanisms underlying such selectivity in promoting inter-molecular duplex formation, we compared RNA-binding and helix-unwinding activities of both proteins. We demonstrate that octameric NSP2 binds structured RNAs with high affinity, resulting in efficient intramolecular RNA helix disruption. Hexameric σNS oligomerizes into an octamer that binds two RNAs, yet it exhibits only limited RNA-unwinding activity compared to NSP2. Thus, the formation of intersegment RNA-RNA interactions is governed by both helix-unwinding capacity of the chaperones and stability of RNA structure. We propose that this protein-mediated RNA selection mechanism may underpin the high fidelity assembly of multi-segmented RNA genomes in Reoviridae.
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie
2016-06-15
Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Computational design of d-peptide inhibitors of hepatitis delta antigen dimerization
NASA Astrophysics Data System (ADS)
Elkin, Carl D.; Zuccola, Harmon J.; Hogle, James M.; Joseph-McCarthy, Diane
2000-11-01
Hepatitis delta virus (HDV) encodes a single polypeptide called hepatitis delta antigen (DAg). Dimerization of DAg is required for viral replication. The structure of the dimerization region, residues 12 to 60, consists of an anti-parallel coiled coil [Zuccola et al., Structure, 6 (1998) 821]. Multiple Copy Simultaneous Searches (MCSS) of the hydrophobic core region formed by the bend in the helix of one monomer of this structure were carried out for many diverse functional groups. Six critical interaction sites were identified. The Protein Data Bank was searched for backbone templates to use in the subsequent design process by matching to these sites. A 14 residue helix expected to bind to the d-isomer of the target structure was selected as the template. Over 200 000 mutant sequences of this peptide were generated based on the MCSS results. A secondary structure prediction algorithm was used to screen all sequences, and in general only those that were predicted to be highly helical were retained. Approximately 100 of these 14-mers were model built as d-peptides and docked with the l-isomer of the target monomer. Based on calculated interaction energies, predicted helicity, and intrahelical salt bridge patterns, a small number of peptides were selected as the most promising candidates. The ligand design approach presented here is the computational analogue of mirror image phage display. The results have been used to characterize the interactions responsible for formation of this model anti-parallel coiled coil and to suggest potential ligands to disrupt it.
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo
2018-01-01
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. PMID:29596423
Development of Novel p16INK4a Mimetics as Anticancer Therapy
2015-10-01
peptide (or substituted peptide) or the crystal structure of the relevant sequence from p16INK4 ( PDB 1BI7) was used as the starting structure . Model...small peptides that interact with CDK4/6. The specific aims are as follows. (1) Determine structure -function relationships of overlapping peptides...Determine structure -function relationships of overlapping peptides derived from p16 INK4a that inhibit the activity of CDK4/6 and identify stabilized
Matrix metalloproteinases: structures, evolution, and diversification.
Massova, I; Kotra, L P; Fridman, R; Mobashery, S
1998-09-01
A comprehensive sequence alignment of 64 members of the family of matrix metalloproteinases (MMPs) for the entire sequences, and subsequently the catalytic and the hemopexin-like domains, have been performed. The 64 MMPs were selected from plants, invertebrates, and vertebrates. The analyses disclosed that as many as 23 distinct subfamilies of these proteins are known to exist. Information from the sequence alignments was correlated with structures, both crystallographic as well as computational, of the catalytic domains for the 23 representative members of the MMP family. A survey of the metal binding sites and two loops containing variable sequences of amino acids, which are important for substrate interactions, are discussed. The collective data support the proposal that the assembly of the domains into multidomain enzymes was likely to be an early evolutionary event. This was followed by diversification, perhaps in parallel among the MMPs, in a subsequent evolutionary time scale. Analysis indicates that a retrograde structure simplification may have accounted for the evolution of MMPs with simple domain constituents, such as matrilysin, from the larger and more elaborate enzymes.
de Castro Nunes, Renata; Orozco-Arias, Simon; Crouzillat, Dominique; Mueller, Lukas A.; Strickler, Suzy R.; Descombes, Patrick; Fournier, Coralie; Moine, Deborah; de Kochko, Alexandre; Yuyama, Priscila M.; Vanzela, André L. L.; Guyot, Romain
2018-01-01
Centromeric regions of plants are generally composed of large array of satellites from a specific lineage of Gypsy LTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in the genus Coffea, we annotated and classified Centromeric Retrotransposons sequences from the allotetraploid C. arabica genome and its two diploid ancestors: Coffea canephora and C. eugenioides. Ten distinct CRC (Centromeric Retrotransposons in Coffea) families were found. The sequence mapping and FISH experiments of CRC Reverse Transcriptase domains in C. canephora, C. eugenioides, and C. arabica clearly indicate a strong and specific targeting mainly onto proximal chromosome regions, which can be associated also with heterochromatin. PacBio genome sequence analyses of putative centromeric regions on C. arabica and C. canephora chromosomes showed an exceptional density of one family of CRC elements, and the complete absence of satellite arrays, contrasting with usual structure of plant centromeres. Altogether, our data suggest a specific centromere organization in Coffea, contrasting with other plant genomes. PMID:29497436
Butler-Cole, Christine; Wagner, Mary J; Da Silva, Melissa; Brown, Gordon D; Burke, Robert D; Upton, Chris
2007-07-24
Profilins are critical to cytoskeletal dynamics in eukaryotes; however, little is known about their viral counterparts. In this study, a poxviral profilin homolog, ectromelia virus strain Moscow gene 141 (ECTV-PH), was investigated by a variety of experimental and bioinformatics techniques to characterize its interactions with cellular and viral proteins. Profilin-like proteins are encoded by all orthopoxviruses sequenced to date, and share over 90% amino acid (aa) identity. Sequence comparisons show highest similarity to mammalian type 1 profilins; however, a conserved 3 aa deletion in mammalian type 3 and poxviral profilins suggests that these homologs may be more closely related. Structural analysis shows that ECTV-PH can be successfully modelled onto both the profilin 1 crystal structure and profilin 3 homology model, though few of the surface residues thought to be required for binding actin, poly(L-proline), and PIP2 are conserved. Immunoprecipitation and mass spectrometry identified two proteins that interact with ECTV-PH within infected cells: alpha-tropomyosin, a 38 kDa cellular actin-binding protein, and the 84 kDa product of vaccinia virus strain Western Reserve (VACV-WR) 148, which is the truncated VACV counterpart of the orthopoxvirus A-type inclusion (ATI) protein. Western and far-western blots demonstrated that the interaction with alpha-tropomyosin is direct, and immunofluorescence experiments suggest that ECTV-PH and alpha-tropomyosin may colocalize to structures that resemble actin tails and cellular protrusions. Sequence comparisons of the poxviral ATI proteins show that although full-length orthologs are only present in cowpox and ectromelia viruses, an ~ 700 aa truncated ATI protein is conserved in over 90% of sequenced orthopoxviruses. Immunofluorescence studies indicate that ECTV-PH localizes to cytoplasmic inclusion bodies formed by both truncated and full-length versions of the viral ATI protein. Furthermore, colocalization of ECTV-PH and truncated ATI protein to protrusions from the cell surface was observed. These results suggest a role for ECTV-PH in intracellular transport of viral proteins or intercellular spread of the virus. Broader implications include better understanding of the virus-host relationship and mechanisms by which cells organize and control the actin cytoskeleton.
Simulation studies of DNA at the nanoscale: Interactions with proteins, polycations, and surfaces
NASA Astrophysics Data System (ADS)
Elder, Robert M.
Understanding the nanoscale interactions of DNA, a multifunctional biopolymer with sequence-dependent properties, with other biological and synthetic substrates and molecules is essential to advancing these technologies. This doctoral thesis research is aimed at understanding the thermodynamics and molecular-level structure when DNA interacts with proteins, polycations, and functionalized surfaces. First, we investigate the ability of a DNA damage recognition protein (HMGB1a) to bind to anti-cancer drug-induced DNA damage, seeking to explain how HMGB1a differentiates between the drugs in vivo. Using atomistic molecular dynamics simulations, we show that the structure of the drug-DNA molecule exhibits drug- and base sequence-dependence that explains some of the experimentally observed differential recognition of the drugs in various sequence contexts. Then, we show how steric hindrance from the drug decreases the deformability of the drug-DNA molecule, which decreases recognition by the protein, a concept that can be applied to rational drug design. Second, we study how polycation architecture and chemistry affect polycation-DNA binding so as to design optimal polycations for high efficiency gene (DNA) delivery. Using a multiscale computational approach involving atomistic and coarse-grained simulations, we examine how rearranging polylysine from a linear to a grafted architecture, and several aspects of the grafted architecture, affect polycation-DNA binding and the structure of polycation-DNA complexes. Next, going beyond lysine we examine how oligopeptide chemistry and sequence in the grafted architecture affects polycation-DNA binding and find that strategic placement of hydrophobic peptides might be used to tailor binding strength. Third, we study the adsorption and conformations of single-stranded DNA (an amphiphilic biopolymer) on model hydrophilic and hydrophobic surfaces. Short ssDNA oligomers adsorb to both surfaces with similar strength, with the strength of adsorption to the hydrophobic surface depending on the composition of the DNA strands, i.e. purine or pyrimidine bases. Additionally, DNA-surface and DNA-water interactions near the surfaces govern the adsorption. For longer ssDNA oligomers, the effects of surface chemistry and temperature on ssDNA conformations are rather small, but either the hydrophilic surface or increased temperature favor slightly more compact conformations due to energetic and entropic effects, respectively.
ERIC Educational Resources Information Center
Breit-Smith, Allison; Olszewski, Arnold; Swoboda, Christopher; Guo, Ying; Prendeville, Jo-Anne
2017-01-01
This study explores the outcomes of an interactive book reading intervention featuring expository picture books. This small-group intervention was delivered by four practitioners (two early childhood special education teachers and two speech-language pathologists) three times per week for 8 weeks to 6 preschool-age children (3 years 1 month to 4…
Visualization of protein sequence features using JavaScript and SVG with pViz.js.
Mukhyala, Kiran; Masselot, Alexandre
2014-12-01
pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
2015-01-01
Guanine-rich oligonucleotides can adopt noncanonical tertiary structures known as G-quadruplexes, which can exist in different forms depending on experimental conditions. High-resolution structural methods, such as X-ray crystallography and NMR spectroscopy, have been of limited usefulness in resolving the inherent structural polymorphism associated with G-quadruplex formation. The lack of, or the ambiguous nature of, currently available high-resolution structural data, in turn, has severely hindered investigations into the nature of these structures and their interactions with small-molecule inhibitors. We have used molecular dynamics in conjunction with hydrodynamic bead modeling to study the structures of the human telomeric G-quadruplex-forming sequences at the atomic level. We demonstrated that molecular dynamics can reproduce experimental hydrodynamic measurements and thus can be a powerful tool in the structural study of existing G-quadruplex sequences or in the prediction of new G-quadruplex structures. PMID:24779348
Michetti, Davide; Brandsdal, Bjørn Olav; Bon, Davide; Isaksen, Geir Villy; Tiberti, Matteo; Papaleo, Elena
2017-01-01
The psychrophilic and mesophilic endonucleases A (EndA) from Aliivibrio salmonicida (VsEndA) and Vibrio cholera (VcEndA) have been studied experimentally in terms of the biophysical properties related to thermal adaptation. The analyses of their static X-ray structures was no sufficient to rationalize the determinants of their adaptive traits at the molecular level. Thus, we used Molecular Dynamics (MD) simulations to compare the two proteins and unveil their structural and dynamical differences. Our simulations did not show a substantial increase in flexibility in the cold-adapted variant on the nanosecond time scale. The only exception is a more rigid C-terminal region in VcEndA, which is ascribable to a cluster of electrostatic interactions and hydrogen bonds, as also supported by MD simulations of the VsEndA mutant variant where the cluster of interactions was introduced. Moreover, we identified three additional amino acidic substitutions through multiple sequence alignment and the analyses of MD-based protein structure networks. In particular, T120V occurs in the proximity of the catalytic residue H80 and alters the interaction with the residue Y43, which belongs to the second coordination sphere of the Mg2+ ion. This makes T120V an amenable candidate for future experimental mutagenesis.
Structural diversity of domain superfamilies in the CATH database.
Reeves, Gabrielle A; Dallman, Timothy J; Redfern, Oliver C; Akpor, Adrian; Orengo, Christine A
2006-07-14
The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).
Characterizing protein domain associations by Small-molecule ligand binding
Li, Qingliang; Cheng, Tiejun; Wang, Yanli; Bryant, Stephen H.
2012-01-01
Background Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. Methods We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. Results We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. Conclusions A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing. PMID:23745168
Kalloush, Rawan M.; Vivet-Boudou, Valérie; Ali, Lizna M.; Mustafa, Farah; Marquet, Roland; Rizvi, Tahir A.
2016-01-01
MPMV has great potential for development as a vector for gene therapy. In this respect, precisely defining the sequences and structural motifs that are important for dimerization and packaging of its genomic RNA (gRNA) are of utmost importance. A distinguishing feature of the MPMV gRNA packaging signal is two phylogenetically conserved long-range interactions (LRIs) between U5 and gag complementary sequences, LRI-I and LRI-II. To test their biological significance in the MPMV life cycle, we introduced mutations into these structural motifs and tested their effects on MPMV gRNA packaging and propagation. Furthermore, we probed the structure of key mutants using SHAPE (selective 2′hydroxyl acylation analyzed by primer extension). Disrupting base-pairing of the LRIs affected gRNA packaging and propagation, demonstrating their significance to the MPMV life cycle. A double mutant restoring a heterologous LRI-I was fully functional, whereas a similar LRI-II mutant failed to restore gRNA packaging and propagation. These results demonstrate that while LRI-I acts at the structural level, maintaining base-pairing is not sufficient for LRI-II function. In addition, in vitro RNA dimerization assays indicated that the loss of RNA packaging in LRI mutants could not be attributed to the defects in dimerization. Our findings suggest that U5-gag LRIs play an important architectural role in maintaining the structure of the 5′ region of the MPMV gRNA, expanding the crucial role of LRIs to the nonlentiviral group of retroviruses. PMID:27095024
Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting
2016-01-01
ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181
Gopal, J; Yebra, M J; Bhagwat, A S
1994-01-01
The methyltransferase (MTase) in the DsaV restriction--modification system methylates within 5'-CCNGG sequences. We have cloned the gene for this MTase and determined its sequence. The predicted sequence of the MTase protein contains sequence motifs conserved among all cytosine-5 MTases and is most similar to other MTases that methylate CCNGG sequences, namely M.ScrFI and M.SsoII. All three MTases methylate the internal cytosine within their recognition sequence. The 'variable' region within the three enzymes that methylate CCNGG can be aligned with the sequences of two enzymes that methylate CCWGG sequences. Remarkably, two segments within this region contain significant similarity with the region of M.HhaI that is known to contact DNA bases. These alignments suggest that many cytosine-5 MTases are likely to interact with DNA using a similar structural framework. Images PMID:7971279
Golovenko, Dmitrij; Manakova, Elena; Zakrys, Linas; Zaremba, Mindaugas; Sasnauskas, Giedrius; Gražulis, Saulius; Siksnys, Virginijus
2014-01-01
The B3 DNA-binding domains (DBDs) of plant transcription factors (TF) and DBDs of EcoRII and BfiI restriction endonucleases (EcoRII-N and BfiI-C) share a common structural fold, classified as the DNA-binding pseudobarrel. The B3 DBDs in the plant TFs recognize a diverse set of target sequences. The only available co-crystal structure of the B3-like DBD is that of EcoRII-N (recognition sequence 5′-CCTGG-3′). In order to understand the structural and molecular mechanisms of specificity of B3 DBDs, we have solved the crystal structure of BfiI-C (recognition sequence 5′-ACTGGG-3′) complexed with 12-bp cognate oligoduplex. Structural comparison of BfiI-C–DNA and EcoRII-N–DNA complexes reveals a conserved DNA-binding mode and a conserved pattern of interactions with the phosphodiester backbone. The determinants of the target specificity are located in the loops that emanate from the conserved structural core. The BfiI-C–DNA structure presented here expands a range of templates for modeling of the DNA-bound complexes of the B3 family of plant TFs. PMID:24423868
Undergraduates improve upon published crystal structure in class assignment.
Horowitz, Scott; Koldewey, Philipp; Bardwell, James C
2014-01-01
Recently, 57 undergraduate students at the University of Michigan were assigned the task of solving a crystal structure, given only the electron density map of a 1.3 Å crystal structure from the electron density server, and the position of the N-terminal amino acid. To test their knowledge of amino acid chemistry, the students were not given the protein sequence. With minimal direction from the instructor on how the students should complete the assignment, the students fared remarkably well in this task, with over half the class able to reconstruct the original sequence with over 77% sequence identity, and with structures whose median ranked in the 91(st) percentile of all structures of comparable resolution in terms of structure quality. Fourteen percent of the students' structures produced Molprobity steric clash validation scores even better than that of the original structure, suggesting that multiple students achieved an improvement in the overall structure quality compared to the published structure. Students were able to delineate limiting case chemical environments, such as charged interactions or complete solvent exposure, but were less able to distinguish finer details of hydrogen bonding or hydrophobicity. Our results prompt several questions: why were students able to perform so well in their structural validation scores? How were some students able to outperform the 88% sequence identity mark that would constitute a perfect score, given the level of degenerate density or surface residues with poor density? And how can the methodology used by the best students inform the practices of professional X-ray crystallographers? Copyright © 2014 Wiley Periodicals, Inc.
Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong
2015-01-01
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Preservation of protein clefts in comparative models.
Piedra, David; Lois, Sergi; de la Cruz, Xavier
2008-01-16
Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein. We studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality - measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues. We have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.
Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui
2012-11-07
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Musi, Valeria; Birdsall, Berry; Fernandez-Ballester, Gregorio; Guerrini, Remo; Salvatori, Severo; Serrano, Luis; Pastore, Annalisa
2006-04-01
SH3 domains are small protein modules that are involved in protein-protein interactions in several essential metabolic pathways. The availability of the complete genome and the limited number of clearly identifiable SH3 domains make the yeast Saccharomyces cerevisae an ideal proteomic-based model system to investigate the structural rules dictating the SH3-mediated protein interactions and to develop new tools to assist these studies. In the present work, we have determined the solution structure of the SH3 domain from Myo3 and modeled by homology that of the highly homologous Myo5, two myosins implicated in actin polymerization. We have then implemented an integrated approach that makes use of experimental and computational methods to characterize their binding properties. While accommodating their targets in the classical groove, the two domains have selectivity in both orientation and sequence specificity of the target peptides. From our study, we propose a consensus sequence that may provide a useful guideline to identify new natural partners and suggest a strategy of more general applicability that may be of use in other structural proteomic studies.
Structure of the N-terminal domain of human thioredoxin-interacting protein.
Polekhina, Galina; Ascher, David Benjamin; Kok, Shie Foong; Beckham, Simone; Wilce, Matthew; Waltham, Mark
2013-03-01
Thioredoxin-interacting protein (TXNIP) is one of the six known α-arrestins and has recently received considerable attention owing to its involvement in redox signalling and metabolism. Various stress stimuli such as high glucose, heat shock, UV, H2O2 and mechanical stress among others robustly induce the expression of TXNIP, resulting in the sequestration and inactivation of thioredoxin, which in turn leads to cellular oxidative stress. While TXNIP is the only α-arrestin known to bind thioredoxin, TXNIP and two other α-arrestins, Arrdc4 and Arrdc3, have been implicated in metabolism. Furthermore, owing to its roles in the pathologies of diabetes and cardiovascular disease, TXNIP is considered to be a promising drug target. Based on their amino-acid sequences, TXNIP and the other α-arrestins are remotely related to β-arrestins. Here, the crystal structure of the N-terminal domain of TXNIP is reported. It provides the first structural information on any of the α-arrestins and reveals that although TXNIP adopts a β-arrestin fold as predicted, it is structurally more similar to Vps26 proteins than to β-arrestins, while sharing below 15% pairwise sequence identity with either.
Guédin, Aurore; Lin, Linda Yingqi; Armane, Samir; Lacroix, Laurent; Mergny, Jean-Louis; Thore, Stéphane; Yatsunyk, Liliya A
2018-06-01
Guanine-rich DNA has the potential to fold into non-canonical G-quadruplex (G4) structures. Analysis of the genome of the social amoeba Dictyostelium discoideum indicates a low number of sequences with G4-forming potential (249-1055). Therefore, D. discoideum is a perfect model organism to investigate the relationship between the presence of G4s and their biological functions. As a first step in this investigation, we crystallized the dGGGGGAGGGGTACAGGGGTACAGGGG sequence from the putative promoter region of two divergent genes in D. discoideum. According to the crystal structure, this sequence folds into a four-quartet intramolecular antiparallel G4 with two lateral and one diagonal loops. The G-quadruplex core is further stabilized by a G-C Watson-Crick base pair and a A-T-A triad and displays high thermal stability (Tm > 90°C at 100 mM KCl). Biophysical characterization of the native sequence and loop mutants suggests that the DNA adopts the same structure in solution and in crystalline form, and that loop interactions are important for the G4 stability but not for its folding. Four-tetrad G4 structures are sparse. Thus, our work advances understanding of the structural diversity of G-quadruplexes and yields coordinates for in silico drug screening programs and G4 predictive tools.
Structural basis of DNA target recognition by the B3 domain of Arabidopsis epigenome reader VAL1
Sasnauskas, Giedrius; Kauneckaitė, Kotryna; Siksnys, Virginijus
2018-01-01
Abstract Arabidopsis thaliana requires a prolonged period of cold exposure during winter to initiate flowering in a process termed vernalization. Exposure to cold induces epigenetic silencing of the FLOWERING LOCUS C (FLC) gene by Polycomb group (PcG) proteins. A key role in this epigenetic switch is played by transcriptional repressors VAL1 and VAL2, which specifically recognize Sph/RY DNA sequences within FLC via B3 DNA binding domains, and mediate recruitment of PcG silencing machinery. To understand the structural mechanism of site-specific DNA recognition by VAL1, we have solved the crystal structure of VAL1 B3 domain (VAL1-B3) bound to a 12 bp oligoduplex containing the canonical Sph/RY DNA sequence 5′-CATGCA-3′/5′-TGCATG-3′. We find that VAL1-B3 makes H-bonds and van der Waals contacts to DNA bases of all six positions of the canonical Sph/RY element. In agreement with the structure, in vitro DNA binding studies show that VAL1-B3 does not tolerate substitutions at any position of the 5′-TGCATG-3′ sequence. The VAL1-B3–DNA structure presented here provides a structural model for understanding the specificity of plant B3 domains interacting with the Sph/RY and other DNA sequences. PMID:29660015
R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server
Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles
2015-01-01
The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960
Lu, Changrui; Smith, Angela M; Fuchs, Ryan T; Ding, Fang; Rajashankar, Kanagalaghatta; Henkin, Tina M; Ke, Ailong
2011-01-01
Three distinct classes of S-adenosyl-l-methionine (SAM)-responsive riboswitches have been identified that regulate bacterial gene expression at the levels of transcription attenuation or translation inhibition. The SMK box (SAM-III) translational riboswitch has been identified in the SAM synthetase gene in members of the Lactobacillales. Here we report the 2.2-Å crystal structure of the Enterococcus faecalis SMK box riboswitch. The Y-shaped riboswitch organizes its conserved nucleotides around a three-way junction for SAM recognition. The Shine-Dalgarno sequence, which is sequestered by base-pairing with the anti–Shine-Dalgarno sequence in response to SAM binding, also directly participates in SAM recognition. The riboswitch makes extensive interactions with the adenosine and sulfonium moieties of SAM but does not appear to recognize the tail of the methionine moiety. We captured a structural snapshot of the SMK box riboswitch sampling the near-cognate ligand S-adenosyl-l-homocysteine (SAH) in which SAH was found to adopt an alternative conformation and fails to make several key interactions. PMID:18806797
Ambrosi, Emmanuele; Capaldi, Stefano; Bovi, Michele; Saccomani, Gianmaria; Perduca, Massimiliano; Monaco, Hugo L.
2011-01-01
The SOUL protein is known to induce apoptosis by provoking the mitochondrial permeability transition, and a sequence homologous with the BH3 (Bcl-2 homology 3) domains has recently been identified in the protein, thus making it a potential new member of the BH3-only protein family. In the present study, we provide NMR, SPR (surface plasmon resonance) and crystallographic evidence that a peptide spanning residues 147–172 in SOUL interacts with the anti-apoptotic protein Bcl-xL. We have crystallized SOUL alone and the complex of its BH3 domain peptide with Bcl-xL, and solved their three-dimensional structures. The SOUL monomer is a single domain organized as a distorted β-barrel with eight anti-parallel strands and two α-helices. The BH3 domain extends across 15 residues at the end of the second helix and eight amino acids in the chain following it. There are important structural differences in the BH3 domain in the intact SOUL molecule and the same sequence bound to Bcl-xL. PMID:21639858
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, C.; Smith, A.M.; Fuchs, R.T.
2010-01-07
Three distinct classes of S-adenosyl-L-methionine (SAM)-responsive riboswitches have been identified that regulate bacterial gene expression at the levels of transcription attenuation or translation inhibition. The SMK box (SAM-III) translational riboswitch has been identified in the SAM synthetase gene in members of the Lactobacillales. Here we report the 2.2-{angstrom} crystal structure of the Enterococcus faecalis SMK box riboswitch. The Y-shaped riboswitch organizes its conserved nucleotides around a three-way junction for SAM recognition. The Shine-Dalgarno sequence, which is sequestered by base-pairing with the anti-Shine-Dalgarno sequence in response to SAM binding, also directly participates in SAM recognition. The riboswitch makes extensive interactions withmore » the adenosine and sulfonium moieties of SAM but does not appear to recognize the tail of the methionine moiety. We captured a structural snapshot of the SMK box riboswitch sampling the near-cognate ligand S-adenosyl-L-homocysteine (SAH) in which SAH was found to adopt an alternative conformation and fails to make several key interactions.« less
Liu, Chang
2017-01-01
The spatial organization of the genome in the nucleus is critical for many cellular processes. It has been broadly accepted that the packing of chromatin inside the nucleus is not random, but structured at several hierarchical levels. The Hi-C method combines Chromatin Conformation Capture and high-throughput sequencing, which allows interrogating genome-wide chromatin interactions. Depending on the sequencing depth, chromatin packing patterns derived from Hi-C experiments can be viewed on a chromosomal scale or at a local genic level. Here, I describe a protocol of plant in situ Hi-C library preparation, which covers procedures starting from tissue fixation to library amplification.
Fedoreyeva, L I; Kireev, I I; Khavinson, V Kh; Vanyushin, B F
2011-11-01
Marked fluorescence in cytoplasm, nucleus, and nucleolus was observed in HeLa cells after incubation with each of several fluorescein isothiocyanate-labeled peptides (epithalon, Ala-Glu-Asp-Gly; pinealon, Glu-Asp-Arg; testagen, Lys-Glu-Asp-Gly). This means that short biologically active peptides are able to penetrate into an animal cell and its nucleus and, in principle they may interact with various components of cytoplasm and nucleus including DNA and RNA. It was established that various initial (intact) peptides differently affect the fluorescence of the 5,6-carboxyfluorescein-labeled deoxyribooligonucleotides and DNA-ethidium bromide complexes. The Stern-Volmer constants characterizing the degree of fluorescence quenching of various single- and double-stranded fluorescence-labeled deoxyribooligonucleotides with short peptides used were different depending on the peptide primary structures. This indicates the specific interaction between short biologically active peptides and nucleic acid structures. On binding to them, the peptides discriminate between different nucleotide sequences and recognize even their cytosine methylation status. Judging from corresponding constants of the fluorescence quenching, the epithalon, pinealon, and bronchogen (Ala-Glu-Asp-Leu) bind preferentially with deoxyribooligonucleotides containing CNG sequence (CNG sites are targets for cytosine DNA methylation in eukaryotes). Epithalon, testagen, and pinealon seem to preferentially bind with CAG- but bronchogen with CTG-containing sequences. The site-specific interactions of peptides with DNA can control epigenetically the cell genetic functions, and they seem to play an important role in regulation of gene activity even at the earliest stages of life origin and in evolution.
Brucet, Marina; Querol-Audí, Jordi; Serra, Maria; Ramirez-Espain, Ximena; Bertlik, Kamila; Ruiz, Lidia; Lloberas, Jorge; Macias, Maria J; Fita, Ignacio; Celada, Antonio
2007-05-11
TREX1 is the most abundant mammalian 3' --> 5' DNA exonuclease. It has been described to form part of the SET complex and is responsible for the Aicardi-Goutières syndrome in humans. Here we show that the exonuclease activity is correlated to the binding preferences toward certain DNA sequences. In particular, we have found three motifs that are selected, GAG, ACA, and CTGC. To elucidate how the discrimination occurs, we determined the crystal structures of two murine TREX1 complexes, with a nucleotide product of the exonuclease reaction, and with a single-stranded DNA substrate. Using confocal microscopy, we observed TREX1 both in nuclear and cytoplasmic subcellular compartments. Remarkably, the presence of TREX1 in the nucleus requires the loss of a C-terminal segment, which we named leucine-rich repeat 3. Furthermore, we detected the presence of a conserved proline-rich region on the surface of TREX1. This observation points to interactions with proline-binding domains. The potential interacting motif "PPPVPRPP" does not contain aromatic residues and thus resembles other sequences that select SH3 and/or Group 2 WW domains. By means of nuclear magnetic resonance titration experiments, we show that, indeed, a polyproline peptide derived from the murine TREX1 sequence interacted with the WW2 domain of the elongation transcription factor CA150. Co-immunoprecipitation studies confirmed this interaction with the full-length TREX1 protein, thereby suggesting that TREX1 participates in more functional complexes than previously thought.
Molecular dynamics studies of the 3D structure and planar ligand binding of a quadruplex dimer.
Li, Ming-Hui; Luo, Quan; Xue, Xiang-Gui; Li, Ze-Sheng
2011-03-01
G-rich sequences can fold into a four-stranded structure called a G-quadruplex, and sequences with short loops are able to aggregate to form stable quadruplex multimers. Few studies have characterized the properties of this variety of quadruplex multimers. Using molecular modeling and molecular dynamics simulations, the present study investigated a dimeric G-quadruplex structure formed from a simple sequence of d(GGGTGGGTGGGTGGGT) (G1), and its interactions with a planar ligand of a perylene derivative (Tel03). A series of analytical methods, including free energy calculations and principal components analysis (PCA), was used. The results show that a dimer structure with stacked parallel monomer structures is maintained well during the entire simulation. Tel03 can bind to the dimer efficiently through end stacking, and the binding mode of the ligand stacked with the 3'-terminal thymine base is most favorable. PCA showed that the dominant motions in the free dimer occur on the loop regions, and the presence of the ligand reduces the flexibility of the loops. Our investigation will assist in understanding the geometric structure of stacked G-quadruplex multimers and may be helpful as a platform for rational drug design.
CD94-NKG2A recognition of human leukocyte antigen (HLA)-E bound to an HLA class I leader sequence.
Petrie, Emma J; Clements, Craig S; Lin, Jie; Sullivan, Lucy C; Johnson, Darryl; Huyton, Trevor; Heroux, Annie; Hoare, Hilary L; Beddoe, Travis; Reid, Hugh H; Wilce, Matthew C J; Brooks, Andrew G; Rossjohn, Jamie
2008-03-17
The recognition of human leukocyte antigen (HLA)-E by the heterodimeric CD94-NKG2 natural killer (NK) receptor family is a central innate mechanism by which NK cells monitor the expression of other HLA molecules, yet the structural basis of this highly specific interaction is unclear. Here, we describe the crystal structure of CD94-NKG2A in complex with HLA-E bound to a peptide derived from the leader sequence of HLA-G. The CD94 subunit dominated the interaction with HLA-E, whereas the NKG2A subunit was more peripheral to the interface. Moreover, the invariant CD94 subunit dominated the peptide-mediated contacts, albeit with poor surface and chemical complementarity. This unusual binding mode was consistent with mutagenesis data at the CD94-NKG2A-HLA-E interface. There were few conformational changes in either CD94-NKG2A or HLA-E upon ligation, and such a "lock and key" interaction is typical of innate receptor-ligand interactions. Nevertheless, the structure also provided insight into how this interaction can be modulated by subtle changes in the peptide ligand or by the pairing of CD94 with other members of the NKG2 family. Differences in the docking strategies used by the NKG2D and CD94-NKG2A receptors provided a basis for understanding the promiscuous nature of ligand recognition by NKG2D compared with the fidelity of the CD94-NKG2 receptors.
CD94-NKG2A recognition of human leukocyte antigen (HLA)-E bound to an HLA class I leader sequence
Petrie, Emma J.; Clements, Craig S.; Lin, Jie; Sullivan, Lucy C.; Johnson, Darryl; Huyton, Trevor; Heroux, Annie; Hoare, Hilary L.; Beddoe, Travis; Reid, Hugh H.; Wilce, Matthew C.J.; Brooks, Andrew G.; Rossjohn, Jamie
2008-01-01
The recognition of human leukocyte antigen (HLA)-E by the heterodimeric CD94-NKG2 natural killer (NK) receptor family is a central innate mechanism by which NK cells monitor the expression of other HLA molecules, yet the structural basis of this highly specific interaction is unclear. Here, we describe the crystal structure of CD94-NKG2A in complex with HLA-E bound to a peptide derived from the leader sequence of HLA-G. The CD94 subunit dominated the interaction with HLA-E, whereas the NKG2A subunit was more peripheral to the interface. Moreover, the invariant CD94 subunit dominated the peptide-mediated contacts, albeit with poor surface and chemical complementarity. This unusual binding mode was consistent with mutagenesis data at the CD94-NKG2A–HLA-E interface. There were few conformational changes in either CD94-NKG2A or HLA-E upon ligation, and such a “lock and key” interaction is typical of innate receptor–ligand interactions. Nevertheless, the structure also provided insight into how this interaction can be modulated by subtle changes in the peptide ligand or by the pairing of CD94 with other members of the NKG2 family. Differences in the docking strategies used by the NKG2D and CD94-NKG2A receptors provided a basis for understanding the promiscuous nature of ligand recognition by NKG2D compared with the fidelity of the CD94-NKG2 receptors. PMID:18332182
Dawn of the in vivo RNA structurome and interactome.
Kwok, Chun Kit
2016-10-15
RNA is one of the most fascinating biomolecules in living systems given its structural versatility to fold into elaborate architectures for important biological functions such as gene regulation, catalysis, and information storage. Knowledge of RNA structures and interactions can provide deep insights into their functional roles in vivo For decades, RNA structural studies have been conducted on a transcript-by-transcript basis. The advent of next-generation sequencing (NGS) has enabled the development of transcriptome-wide structural probing methods to profile the global landscape of RNA structures and interactions, also known as the RNA structurome and interactome, which transformed our understanding of the RNA structure-function relationship on a transcriptomic scale. In this review, molecular tools and NGS methods used for RNA structure probing are presented, novel insights uncovered by RNA structurome and interactome studies are highlighted, and perspectives on current challenges and potential future directions are discussed. A more complete understanding of the RNA structures and interactions in vivo will help illuminate the novel roles of RNA in gene regulation, development, and diseases. © 2016 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.
Sequence-encoded colloidal origami and microbot assemblies from patchy magnetic cubes
Han, Koohee; Shields, C. Wyatt; Diwakar, Nidhi M.; Bharti, Bhuvnesh; López, Gabriel P.; Velev, Orlin D.
2017-01-01
Colloidal-scale assemblies that reconfigure on demand may serve as the next generation of soft “microbots,” artificial muscles, and other biomimetic devices. This requires the precise arrangement of particles into structures that are preprogrammed to reversibly change shape when actuated by external fields. The design and making of colloidal-scale assemblies with encoded directional particle-particle interactions remain a major challenge. We show how assemblies of metallodielectric patchy microcubes can be engineered to store energy through magnetic polarization and release it on demand by microscale reconfiguration. The dynamic pattern of folding and reconfiguration of the chain-like assemblies can be encoded in the sequence of the cube orientation. The residual polarization of the metallic facets on the microcubes leads to local interactions between the neighboring particles, which is directed by the conformational restrictions of their shape after harvesting energy from external magnetic fields. These structures can also be directionally moved, steered, and maneuvered by global forces from external magnetic fields. We illustrate these capabilities by examples of assemblies of specific sequences that can be actuated, reoriented, and spatially maneuvered to perform microscale operations such as capturing and transporting live cells, acting as prototypes of microbots, micromixers, and other active microstructures. PMID:28798960
Bejerman, Nicolás; Giolitti, Fabián; Trucco, Verónica; de Breuil, Soledad; Dietzgen, Ralf G; Lenardon, Sergio
2016-07-01
Alfalfa dwarf disease, probably caused by synergistic interactions of mixed virus infections, is a major and emergent disease that threatens alfalfa production in Argentina. Deep sequencing of diseased alfalfa plant samples from the central region of Argentina resulted in the identification of a new virus genome resembling enamoviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Enamovirus, family Luteoviridae. The virus is tentatively named "alfalfa enamovirus 1" (AEV-1). The availability of the AEV-1 genome sequence will make it possible to assess the genetic variability of this virus and to construct an infectious clone to investigate its role in alfalfa dwarfism disease.
Structural Basis for Sialoglycan Binding by the Streptococcus sanguinis SrpA Adhesin*♦
Bensing, Barbara A.; Loukachevitch, Lioudmila V.; McCulloch, Kathryn M.; Yu, Hai; Vann, Kendra R.; Wawrzak, Zdzislaw; Anderson, Spencer; Chen, Xi; Sullam, Paul M.; Iverson, T. M.
2016-01-01
Streptococcus sanguinis is a leading cause of infective endocarditis, a life-threatening infection of the cardiovascular system. An important interaction in the pathogenesis of infective endocarditis is attachment of the organisms to host platelets. S. sanguinis expresses a serine-rich repeat adhesin, SrpA, similar in sequence to platelet-binding adhesins associated with increased virulence in this disease. In this study, we determined the first crystal structure of the putative binding region of SrpA (SrpABR) both unliganded and in complex with a synthetic disaccharide ligand at 1.8 and 2.0 Å resolution, respectively. We identified a conserved Thr-Arg motif that orients the sialic acid moiety and is required for binding to platelet monolayers. Furthermore, we propose that sequence insertions in closely related family members contribute to the modulation of structural and functional properties, including the quaternary structure, the tertiary structure, and the ligand-binding site. PMID:26833566
SANSparallel: interactive homology search against Uniprot.
Somervuo, Panu; Holm, Liisa
2015-07-01
Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Specificity determinants for the abscisic acid response element.
Sarkar, Aditya Kumar; Lahiri, Ansuman
2013-01-01
Abscisic acid (ABA) response elements (ABREs) are a group of cis-acting DNA elements that have been identified from promoter analysis of many ABA-regulated genes in plants. We are interested in understanding the mechanism of binding specificity between ABREs and a class of bZIP transcription factors known as ABRE binding factors (ABFs). In this work, we have modeled the homodimeric structure of the bZIP domain of ABRE binding factor 1 from Arabidopsis thaliana (AtABF1) and studied its interaction with ACGT core motif-containing ABRE sequences. We have also examined the variation in the stability of the protein-DNA complex upon mutating ABRE sequences using the protein design algorithm FoldX. The high throughput free energy calculations successfully predicted the ability of ABF1 to bind to alternative core motifs like GCGT or AAGT and also rationalized the role of the flanking sequences in determining the specificity of the protein-DNA interaction.
Sequence periodicity in nucleosomal DNA and intrinsic curvature.
Nair, T Murlidharan
2010-05-17
Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.
Intramolecular interactions regulate SAP97 binding to GKAP
Wu, Hongju; Reissner, Carsten; Kuhlendahl, Sven; Coblentz, Blake; Reuver, Susanne; Kindler, Stefan; Gundelfinger, Eckart D.; Garner, Craig C.
2000-01-01
Membrane-associated guanylate kinase homologs (MAGUKs) are multidomain proteins found to be central organizers of cellular junctions. In this study, we examined the molecular mechanisms that regulate the interaction of the MAGUK SAP97 with its GUK domain binding partner GKAP (GUK-associated protein). The GKAP–GUK interaction is regulated by a series of intramolecular interactions. Specifically, the association of the Src homology 3 (SH3) domain and sequences situated between the SH3 and GUK domains with the GUK domain was found to interfere with GKAP binding. In contrast, N-terminal sequences that precede the first PDZ domain in SAP97, facilitated GKAP binding via its association with the SH3 domain. Utilizing crystal structure data available for PDZ, SH3 and GUK domains, molecular models of SAP97 were generated. These models revealed that SAP97 can exist in a compact U-shaped conformation in which the N-terminal domain folds back and interacts with the SH3 and GUK domains. These models support the biochemical data and provide new insights into how intramolecular interactions may regulate the association of SAP97 with its binding partners. PMID:11060025
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helander, Sara; Montecchio, Meri; Lemak, Alexander
Highlights: • We describe the structure of a novel fold in FKBP25 and HectD. • The new fold is named the Basic Tilted Helix Bundle (BTHB) domain. • A conserved basic surface patch is presented, suggesting a functional role. - Abstract: In this paper, we describe the structure of a N-terminal domain motif in nuclear-localized FKBP25{sub 1–73}, a member of the FKBP family, together with the structure of a sequence-related subdomain of the E3 ubiquitin ligase HectD1 that we show belongs to the same fold. This motif adopts a compact 5-helix bundle which we name the Basic Tilted Helix Bundlemore » (BTHB) domain. A positively charged surface patch, structurally centered around the tilted helix H4, is present in both FKBP25 and HectD1 and is conserved in both proteins, suggesting a conserved functional role. We provide detailed comparative analysis of the structures of the two proteins and their sequence similarities, and analysis of the interaction of the proposed FKBP25 binding protein YY1. We suggest that the basic motif in BTHB is involved in the observed DNA binding of FKBP25, and that the function of this domain can be affected by regulatory YY1 binding and/or interactions with adjacent domains.« less
Ran, Kun; Yang, Hongqiang; Sun, Xiaoli; Li, Qiang; Jiang, Qianqian; Zhang, Weiwei; Shen, Wei
2014-05-01
Vacuolar processing enzymes (VPEs) have received considerable attention recently, as they exhibit caspase-1-like cleavage activity and regulate the process of PCD. However, knowledge about their detailed characteristics and structures is relatively limited. In this study, a gamma vacuolar processing enzyme gene, MhVPEγ, has been isolated from the leaves of Malus hupehensis (Ramp) Rehd. var pinyiensis Jiang. MhVPEγ coded-translated protein sequence comprised of 494 amino acids with a signal peptide and a transmembrane helix structure at N-terminal, peptidase_C13 domain, and vacuolar sorting signal at C-terminal. Consequently, genomic walking approach was performed for the isolation of its upstream sequence. Computational analysis demonstrated several motifs of the promoter exhibiting hypothetic MeJA, ABA, and light-induced characteristics, as well as some typical domains universally discovered in promoter, such as TATA-box and CAAT-box. MhVPEγ transcript level was enhanced during wounding treatment, and WUN-motif, as one of the cis-acting regulatory elements existing in the upstream sequence perhaps regulates its expression. In silico-constructed 3D models revealed that MhCPYL successively interacts with MhVPEγ like that of "Induced Fit-Lock and Key" model, providing molecular conformation evidence that CPY is a direct substrate of VPEγ. This study is the first stride to understand the molecular mechanism of VPEγ and CPYL interactions.
GeneBee-net: Internet-based server for analyzing biopolymers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brodsky, L.I.; Ivanov, V.V.; Nikolaev, V.K.
This work describes a network server for searching databanks of biopolymer structures and performing other biocomputing procedures; it is available via direct Internet connection. Basic server procedures are dedicated to homology (similarity) search of sequence and 3D structure of proteins. The homologies found could be used to build multiple alignments, predict protein and RNA secondary structure, and construct phylogenetic trees. In addition to traditional methods of sequence similarity search, the authors propose {open_quotes}non-matrix{close_quotes} (correlational) search. An analogous approach is used to identify regions of similar tertiary structure of proteins. Algorithm concepts and usage examples are presented for new methods. Servicemore » logic is based upon interaction of a client program and server procedures. The client program allows the compilation of queries and the processing of results of an analysis.« less
NASA Astrophysics Data System (ADS)
Gallet, F.; Bolmont, E.; Mathis, S.; Charbonnel, C.; Amard, L.
2017-08-01
Context. Star-planet interactions must be taken into account in stellar models to understand the dynamical evolution of close-in planets. The dependence of the tidal interactions on the structural and rotational evolution of the star is of particular importance and should be correctly treated. Aims: We quantify how tidal dissipation in the convective envelope of rotating low-mass stars evolves from the pre-main sequence up to the red-giant branch depending on the initial stellar mass. We investigate the consequences of this evolution on planetary orbital evolution. Methods: We couple the tidal dissipation formalism previously described to the stellar evolution code STAREVOL and apply this coupling to rotating stars with masses between 0.3 and 1.4 M⊙. As a first step, this formalism assumes a simplified bi-layer stellar structure with corresponding averaged densities for the radiative core and the convective envelope. We use a frequency-averaged treatment of the dissipation of tidal inertial waves in the convection zone (but neglect the dissipation of tidal gravity waves in the radiation zone). In addition, we generalize a recent work by following the orbital evolution of close-in planets using the new tidal dissipation predictions for advanced phases of stellar evolution. Results: On the pre-main sequence the evolution of tidal dissipation is controlled by the evolution of the internal structure of the contracting star. On the main sequence it is strongly driven by the variation of surface rotation that is impacted by magnetized stellar winds braking. The main effect of taking into account the rotational evolution of the stars is to lower the tidal dissipation strength by about four orders of magnitude on the main sequence, compared to a normalized dissipation rate that only takes into account structural changes. Conclusions: The evolution of the dissipation strongly depends on the evolution of the internal structure and rotation of the star. From the pre-main sequence up to the tip of the red-giant branch, it varies by several orders of magnitude, with strong consequences for the orbital evolution of close-in massive planets. These effects are the strongest during the pre-main sequence, implying that the planets are mainly sensitive to the star's early history.
Jagadish, Nirmala; Rana, Ritu; Selvi, Ramasamy; Mishra, Deepshikha; Garg, Manoj; Yadav, Shikha; Herr, John C.; Okumura, Katsuzumi; Hasegawa, Akiko; Koyama, Koji; Suri, Anil
2005-01-01
We report a novel SPAG9 (sperm-associated antigen 9) protein having structural homology with JNK (c-Jun N-terminal kinase)-interacting protein 3. SPAG9, a single copy gene mapped to the human chromosome 17q21.33 syntenic with location of mouse chromosome 11, was earlier shown to be expressed exclusively in testis [Shankar, Mohapatra and Suri (1998) Biochem. Biophys. Res. Commun. 243, 561–565]. The SPAG9 amino acid sequence analysis revealed identity with the JNK-binding domain and predicted coiled-coil, leucine zipper and transmembrane domains. The secondary structure analysis predicted an α-helical structure for SPAG9 that was confirmed by CD spectra. Microsequencing of higher-order aggregates of recombinant SPAG9 by tandem MS confirmed the amino acid sequence and mono atomic mass of 83.9 kDa. Transient expression of SPAG9 and its deletion mutants revealed that both leucine zipper with extended coiled-coil domains and transmembrane domain of SPAG9 were essential for dimerization and proper localization. Studies of MAPK (mitogenactivated protein kinase) interactions demonstrated that SPAG9 interacted with higher binding affinity to JNK3 and JNK2 compared with JNK1. No interaction was observed with p38α or extracellular-signal-regulated kinase pathways. Polyclonal antibodies raised against recombinant SPAG9 recognized native protein in human sperm extracts and localized specifically on the acrosomal compartment of intact human spermatozoa. Acrosome-reacted spermatozoa demonstrated SPAG9 immunofluorescence, indicating its retention on the equatorial segment after the acrosome reaction. Further, anti-SPAG9 antibodies inhibited the binding of human spermatozoa to intact human oocytes as well as to matched hemizona. This is the first report of sperm-associated JNK-binding protein that may have a role in spermatozoa–egg interaction. PMID:15693750
Jagadish, Nirmala; Rana, Ritu; Selvi, Ramasamy; Mishra, Deepshikha; Garg, Manoj; Yadav, Shikha; Herr, John C; Okumura, Katsuzumi; Hasegawa, Akiko; Koyama, Koji; Suri, Anil
2005-07-01
We report a novel SPAG9 (sperm-associated antigen 9) protein having structural homology with JNK (c-Jun N-terminal kinase)-interacting protein 3. SPAG9, a single copy gene mapped to the human chromosome 17q21.33 syntenic with location of mouse chromosome 11, was earlier shown to be expressed exclusively in testis [Shankar, Mohapatra and Suri (1998) Biochem. Biophys. Res. Commun. 243, 561-565]. The SPAG9 amino acid sequence analysis revealed identity with the JNK-binding domain and predicted coiled-coil, leucine zipper and transmembrane domains. The secondary structure analysis predicted an alpha-helical structure for SPAG9 that was confirmed by CD spectra. Microsequencing of higher-order aggregates of recombinant SPAG9 by tandem MS confirmed the amino acid sequence and mono atomic mass of 83.9 kDa. Transient expression of SPAG9 and its deletion mutants revealed that both leucine zipper with extended coiled-coil domains and transmembrane domain of SPAG9 were essential for dimerization and proper localization. Studies of MAPK (mitogenactivated protein kinase) interactions demonstrated that SPAG9 interacted with higher binding affinity to JNK3 and JNK2 compared with JNK1. No interaction was observed with p38alpha or extracellular-signal-regulated kinase pathways. Polyclonal antibodies raised against recombinant SPAG9 recognized native protein in human sperm extracts and localized specifically on the acrosomal compartment of intact human spermatozoa. Acrosome-reacted spermatozoa demonstrated SPAG9 immunofluorescence, indicating its retention on the equatorial segment after the acrosome reaction. Further, anti-SPAG9 antibodies inhibited the binding of human spermatozoa to intact human oocytes as well as to matched hemizona. This is the first report of sperm-associated JNK-binding protein that may have a role in spermatozoa-egg interaction.
Generation of animation sequences of three dimensional models
NASA Technical Reports Server (NTRS)
Poi, Sharon (Inventor); Bell, Brad N. (Inventor)
1990-01-01
The invention is directed toward a method and apparatus for generating an animated sequence through the movement of three-dimensional graphical models. A plurality of pre-defined graphical models are stored and manipulated in response to interactive commands or by means of a pre-defined command file. The models may be combined as part of a hierarchical structure to represent physical systems without need to create a separate model which represents the combined system. System motion is simulated through the introduction of translation, rotation and scaling parameters upon a model within the system. The motion is then transmitted down through the system hierarchy of models in accordance with hierarchical definitions and joint movement limitations. The present invention also calls for a method of editing hierarchical structure in response to interactive commands or a command file such that a model may be included, deleted, copied or moved within multiple system model hierarchies. The present invention also calls for the definition of multiple viewpoints or cameras which may exist as part of a system hierarchy or as an independent camera. The simulated movement of the models and systems is graphically displayed on a monitor and a frame is recorded by means of a video controller. Multiple movement and hierarchy manipulations are then recorded as a sequence of frames which may be played back as an animation sequence on a video cassette recorder.
Genetic variability and evolutionary dynamics of viruses of the family Closteroviridae
Rubio, Luis; Guerri, José; Moreno, Pedro
2013-01-01
RNA viruses have a great potential for genetic variation, rapid evolution and adaptation. Characterization of the genetic variation of viral populations provides relevant information on the processes involved in virus evolution and epidemiology and it is crucial for designing reliable diagnostic tools and developing efficient and durable disease control strategies. Here we performed an updated analysis of sequences available in Genbank and reviewed present knowledge on the genetic variability and evolutionary processes of viruses of the family Closteroviridae. Several factors have shaped the genetic structure and diversity of closteroviruses. (I) A strong negative selection seems to be responsible for the high genetic stability in space and time for some viruses. (2) Long distance migration, probably by human transport of infected propagative plant material, have caused that genetically similar virus isolates are found in distant geographical regions. (3) Recombination between divergent sequence variants have generated new genotypes and plays an important role for the evolution of some viruses of the family Closteroviridae. (4) Interaction between virus strains or between different viruses in mixed infections may alter accumulation of certain strains. (5) Host change or virus transmission by insect vectors induced changes in the viral population structure due to positive selection of sequence variants with higher fitness for host-virus or vector-virus interaction (adaptation) or by genetic drift due to random selection of sequence variants during the population bottleneck associated to the transmission process. PMID:23805130
Kumar, Avishek; Butler, Brandon M; Kumar, Sudhir; Ozkan, S Banu
2015-12-01
Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. Copyright © 2015 Elsevier Ltd. All rights reserved.
Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio
2012-02-15
We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.
Bovine and human insulin adsorption at lipid monolayers: a comparison
NASA Astrophysics Data System (ADS)
Mauri, Sergio; Pandey, Ravindra; Rzeznicka, Izabela; Lu, Hao; Bonn, Mischa; Weidner, Tobias
2015-07-01
Insulin is a widely used peptide in protein research and it is utilised as a model peptide to understand the mechanics of fibril formation, which is believed to be the cause of diseases such as Alzheimer and Creutzfeld-Jakob syndrome. Insulin has been used as a model system due to its biomedical relevance, small size and relatively simple tertiary structure. The adsorption of insu lin on a variety of surfaces has become the focus of numerous studies lately. These works have helped in elucidating the consequence of surface/protein hydrophilic/hydrophobic interaction in terms of protein refolding and aggregation. Unfortunately, such model surfaces differ significantly from physiological surfaces. Here we spectroscopically investigate the adsorption of insulin at lipid monolayers, to further our understanding of the interaction of insulin with biological surfaces. In particular we study the effect of minor mutations of insulin’s primary amino acid sequence on its interaction with 1,2-Dipalmitoyl-sn-glycero-3-phosphoglycerol (DPPG) model lipid layers. We probe the structure of bovine and human insulin at the lipid/water interface using sum frequency generation spectroscopy (SFG). The SFG experiments are complemented with XPS analysis of Langmuir-Schaefer deposited lipid/insulin films. We find that bovine and human insulin, even though very similar in sequence, show a substantially different behavior when interacting with lipid films.
Replication Cycle and Molecular Biology of the West Nile Virus
Brinton, Margo A.
2013-01-01
West Nile virus (WNV) is a member of the genus Flavivirus in the family Flaviviridae. Flaviviruses replicate in the cytoplasm of infected cells and modify the host cell environment. Although much has been learned about virion structure and virion-endosomal membrane fusion, the cell receptor(s) used have not been definitively identified and little is known about the early stages of the virus replication cycle. Members of the genus Flavivirus differ from members of the two other genera of the family by the lack of a genomic internal ribosomal entry sequence and the creation of invaginations in the ER membrane rather than double-membrane vesicles that are used as the sites of exponential genome synthesis. The WNV genome 3' and 5' sequences that form the long distance RNA-RNA interaction required for minus strand initiation have been identified and contact sites on the 5' RNA stem loop for NS5 have been mapped. Structures obtained for many of the viral proteins have provided information relevant to their functions. Viral nonstructural protein interactions are complex and some may occur only in infected cells. Although interactions between many cellular proteins and virus components have been identified, the functions of most of these interactions have not been delineated. PMID:24378320
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Donnell, T.J.; Olson, A.J.
1981-08-01
GRAMPS, a graphics language interpreter has been developed in FORTRAN 77 to be used in conjunction with an interactive vector display list processor (Evans and Sutherland Multi-Picture-System). Several of the features of the language make it very useful and convenient for real-time scene construction, manipulation and animation. The GRAMPS language syntax allows natural interaction with scene elements as well as easy, interactive assignment of graphics input devices. GRAMPS facilitates the creation, manipulation and copying of complex nested picture structures. The language has a powerful macro feature that enables new graphics commands to be developed and incorporated interactively. Animation may bemore » achieved in GRAMPS by two different, yet mutually compatible means. Picture structures may contain framed data, which consist of a sequence of fixed objects. These structures may be displayed sequentially to give a traditional frame animation effect. In addition, transformation information on picture structures may be saved at any time in the form of new macro commands that will transform these structures from one saved state to another in a specified number of steps, yielding an interpolated transformation animation effect. An overview of the GRAMPS command structure is given and several examples of application of the language to molecular modeling and animation are presented.« less
Pillai, Harikrishna; Yadav, Brijesh Singh; Chaturvedi, Navaneet; Jan, Arif Tasleem; Gupta, Girish Kumar; Baig, Mohammad Hassan; Bhure, Sanjeev Kumar
2017-01-01
Regucalcin (RGN), a calcium regulating protein having anti-prolific, antiapoptotic functions, plays important part in the biosynthesis of ascorbic acid. It is a highly conserved protein that has been reported from many tissue types of various vertebrate species. Employing its effect of regulating enzyme activities through reaction with sulfhydryl group (-SH) and calcium, structural level study believed to offer a better understanding of binding properties and regulatory mechanisms of RGN, was performed. Using sample from testis of Bubalus bubalis, amplification of regucalcin (RGN) gene was subjected to characterization by performing digestion using different restriction endonucleases (RE). Alongside, cDNA was cloned into pPICZαC vector and transformed in DH5α host for custom sequencing. To get a better insight of its structural characteristics, three dimensional (3D) structure of protein sequence was generated using in silico molecular modelling approach. The full trajectory analysis of structure was achieved by the Molecular Dynamics (MD) that explains the stability, flexibility and robustness of protein during simulation in a time of 50ns. Molecular docking against 1,5-anhydrosorbitol was performed for functional characterization of RGN. Preliminary screening of amplified products on Agarose gel showed expected size of ~893 bp of PCR product corresponding to RGN. Following sequencing, BLASTp search of the target sequence revealed that it shares 91% similarity score with human senescence marker protein-30 (pdb id: 3G4E). Molecular docking of 1,5-anhydrosorbitol reveals information regarding important binding site residues of RGN. 1,5-anhydrosorbitol was found to interact with binding free energy of - 6.01 Kcal/mol. RMSD calculation of subunits A, B and D-F might be responsible for functional and conserved regions of modeled protein. Three dimensional structure of RGN was generated and its interactions with 1,5- anhydrosorbitol, demonstrates the role of key binding residues. Until now, no structural details were available for buffalo RGN proteins, hence this study will broaden the horizon towards understanding the structural and functional aspects of different proteins in cattle. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Zhao, Junhua; Wang, Guliang; Del Mundo, Imee M; McKinney, Jennifer A; Lu, Xiuli; Bacolla, Albino; Boulware, Stephen B; Zhang, Changsheng; Zhang, Haihua; Ren, Pengyu; Freudenreich, Catherine H; Vasquez, Karen M
2018-01-30
Sequences with the capacity to adopt alternative DNA structures have been implicated in cancer etiology; however, the mechanisms are unclear. For example, H-DNA-forming sequences within oncogenes have been shown to stimulate genetic instability in mammals. Here, we report that H-DNA-forming sequences are enriched at translocation breakpoints in human cancer genomes, further implicating them in cancer etiology. H-DNA-induced mutations were suppressed in human cells deficient in the nucleotide excision repair nucleases, ERCC1-XPF and XPG, but were stimulated in cells deficient in FEN1, a replication-related endonuclease. Further, we found that these nucleases cleaved H-DNA conformations, and the interactions of modeled H-DNA with ERCC1-XPF, XPG, and FEN1 proteins were explored at the sub-molecular level. The results suggest mechanisms of genetic instability triggered by H-DNA through distinct structure-specific, cleavage-based replication-independent and replication-dependent pathways, providing critical evidence for a role of the DNA structure itself in the etiology of cancer and other human diseases. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
A multimodal parallel architecture: A cognitive framework for multimodal interactions.
Cohn, Neil
2016-01-01
Human communication is naturally multimodal, and substantial focus has examined the semantic correspondences in speech-gesture and text-image relationships. However, visual narratives, like those in comics, provide an interesting challenge to multimodal communication because the words and/or images can guide the overall meaning, and both modalities can appear in complicated "grammatical" sequences: sentences use a syntactic structure and sequential images use a narrative structure. These dual structures create complexity beyond those typically addressed by theories of multimodality where only a single form uses combinatorial structure, and also poses challenges for models of the linguistic system that focus on single modalities. This paper outlines a broad theoretical framework for multimodal interactions by expanding on Jackendoff's (2002) parallel architecture for language. Multimodal interactions are characterized in terms of their component cognitive structures: whether a particular modality (verbal, bodily, visual) is present, whether it uses a grammatical structure (syntax, narrative), and whether it "dominates" the semantics of the overall expression. Altogether, this approach integrates multimodal interactions into an existing framework of language and cognition, and characterizes interactions between varying complexity in the verbal, bodily, and graphic domains. The resulting theoretical model presents an expanded consideration of the boundaries of the "linguistic" system and its involvement in multimodal interactions, with a framework that can benefit research on corpus analyses, experimentation, and the educational benefits of multimodality. Copyright © 2015.
Mukherjee, Sanchita; Kailasam, Senthilkumar; Bansal, Manju; Bhattacharyya, Dhananjay
2014-01-01
Double helical structures of DNA and RNA are mostly determined by base pair stacking interactions, which give them the base sequence-directed features, such as small roll values for the purine-pyrimidine steps. Earlier attempts to characterize stacking interactions were mostly restricted to calculations on fiber diffraction geometries or optimized structure using ab initio calculations lacking variation in geometry to comment on rather unusual large roll values observed in AU/AU base pair step in crystal structures of RNA double helices. We have generated stacking energy hyperspace by modeling geometries with variations along the important degrees of freedom, roll, and slide, which were chosen via statistical analysis as maximally sequence dependent. Corresponding energy contours were constructed by several quantum chemical methods including dispersion corrections. This analysis established the most suitable methods for stacked base pair systems despite the limitation imparted by number of atom in a base pair step to employ very high level of theory. All the methods predict negative roll value and near-zero slide to be most favorable for the purine-pyrimidine steps, in agreement with Calladine's steric clash based rule. Successive base pairs in RNA are always linked by sugar-phosphate backbone with C3'-endo sugars and this demands C1'-C1' distance of about 5.4 Å along the chains. Consideration of an energy penalty term for deviation of C1'-C1' distance from the mean value, to the recent DFT-D functionals, specifically ωB97X-D appears to predict reliable energy contour for AU/AU step. Such distance-based penalty improves energy contours for the other purine-pyrimidine sequences also. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 107-120, 2014. Copyright © 2013 Wiley Periodicals, Inc.
Doolan, Kyle M; Colby, David W
2015-01-30
Prion diseases are caused by a structural rearrangement of the cellular prion protein, PrP(C), into a disease-associated conformation, PrP(Sc), which may be distinguished from one another using conformation-specific antibodies. We used mutational scanning by cell-surface display to screen 1341 PrP single point mutants for attenuated interaction with four anti-PrP antibodies, including several with conformational specificity. Single-molecule real-time gene sequencing was used to quantify enrichment of mutants, returning 26,000 high-quality full-length reads for each screened population on average. Relative enrichment of mutants correlated to the magnitude of the change in binding affinity. Mutations that diminished binding of the antibody ICSM18 represented the core of contact residues in the published crystal structure of its complex. A similarly located binding site was identified for D18, comprising discontinuous residues in helix 1 of PrP, brought into close proximity to one another only when the alpha helix is intact. The specificity of these antibodies for the normal form of PrP likely arises from loss of this conformational feature after conversion to the disease-associated form. Intriguingly, 6H4 binding was found to depend on interaction with the same residues, among others, suggesting that its ability to recognize both forms of PrP depends on a structural rearrangement of the antigen. The application of mutational scanning and deep sequencing provides residue-level resolution of positions in the protein-protein interaction interface that are critical for binding, as well as a quantitative measure of the impact of mutations on binding affinity. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ferrocene-oligonucleotide conjugates for electrochemical probing of DNA.
Ihara, T; Maruo, Y; Takenaka, S; Takagi, M
1996-01-01
Toward the development of a universal, sensitive and convenient method of DNA (or RNA) detection, electrochemically active oligonucleotides were prepared by covalent linkage of a ferrocenyl group to the 5'-aminohexyl-terminated synthetic oligonucleotides. Using these electrochemically active probes, we have been able to demonstrate the detection of DNA and RNA at femtomole levels by HPLC equipped with an ordinary electrochemical detector (ECD) [Takenaka,S., Uto,Y., Kondo,H., Ihara,T. and Takagi,M. (1994) Anal. Biochem., 218, 436-443]. Thermodynamic and electrochemical studies of the interaction between the probes and the targets are presented here. The thermodynamics obtained revealed that the conjugation stabilizes the triple-helix complexes by 2-3 kcal mol-1 (1-2 orders increment in binding constant) at 298 K, which corresponds to the effect of elongation of additional several base triplets. The main cause of this thermodynamic stabilization by the conjugation is likely to be the overall conformational change of whole structure of the conjugate rather than the additional local interaction. The redox potential of the probe was independent of the target structure, which is either single- or double stranded. However, the potential is slightly dependent (with a 10-30 mV negative shift on complexation) on the extra sequence in the target, probably because the individual sequence is capable of contacting or interacting with the ferrocenyl group in a slightly different way from each other. This small potential shift itself, however, does not cause any inconvenience on practical applications in detecting the probes by using ECD. These results lead to the conclusion that the redox-active probes are very useful for the microanalysis of nucleic acids due to the stability of the complexes, high detection sensitivity and wide applicability to the target structures (DNA and RNA; single- and double strands) and the sequences. PMID:8932383
LookSeq: a browser-based viewer for deep sequencing data.
Manske, Heinrich Magnus; Kwiatkowski, Dominic P
2009-11-01
Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
CRF: detection of CRISPR arrays using random forest.
Wang, Kai; Liang, Chun
2017-01-01
CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.
Yang, Qin; Gilmartin, Gregory M.; Doublié, Sylvie
2010-01-01
Human Cleavage Factor Im (CFIm) is an essential component of the pre-mRNA 3′ processing complex that functions in the regulation of poly(A) site selection through the recognition of UGUA sequences upstream of the poly(A) site. Although the highly conserved 25 kDa subunit (CFIm25) of the CFIm complex possesses a characteristic α/β/α Nudix fold, CFIm25 has no detectable hydrolase activity. Here we report the crystal structures of the human CFIm25 homodimer in complex with UGUAAA and UUGUAU RNA sequences. CFIm25 is the first Nudix protein to be reported to bind RNA in a sequence-specific manner. The UGUA sequence contributes to binding specificity through an intramolecular G:A Watson–Crick/sugar-edge base interaction, an unusual pairing previously found to be involved in the binding specificity of the SAM-III riboswitch. The structures, together with mutational data, suggest a novel mechanism for the simultaneous sequence-specific recognition of two UGUA elements within the pre-mRNA. Furthermore, the mutually exclusive binding of RNA and the signaling molecule Ap4A (diadenosine tetraphosphate) by CFIm25 suggests a potential role for small molecules in the regulation of mRNA 3′ processing. PMID:20479262
Yang, Qin; Gilmartin, Gregory M; Doublié, Sylvie
2010-06-01
Human Cleavage Factor Im (CFI(m)) is an essential component of the pre-mRNA 3' processing complex that functions in the regulation of poly(A) site selection through the recognition of UGUA sequences upstream of the poly(A) site. Although the highly conserved 25 kDa subunit (CFI(m)25) of the CFI(m) complex possesses a characteristic alpha/beta/alpha Nudix fold, CFI(m)25 has no detectable hydrolase activity. Here we report the crystal structures of the human CFI(m)25 homodimer in complex with UGUAAA and UUGUAU RNA sequences. CFI(m)25 is the first Nudix protein to be reported to bind RNA in a sequence-specific manner. The UGUA sequence contributes to binding specificity through an intramolecular G:A Watson-Crick/sugar-edge base interaction, an unusual pairing previously found to be involved in the binding specificity of the SAM-III riboswitch. The structures, together with mutational data, suggest a novel mechanism for the simultaneous sequence-specific recognition of two UGUA elements within the pre-mRNA. Furthermore, the mutually exclusive binding of RNA and the signaling molecule Ap(4)A (diadenosine tetraphosphate) by CFI(m)25 suggests a potential role for small molecules in the regulation of mRNA 3' processing.
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops
Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude
2011-01-01
The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr. PMID:21665924
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.
Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude
2011-07-01
The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.
Investigation of interaction between Pax-5 isoforms and thioredoxin using de novo modelling methods.
Cuperlovic-Culf, Miroslava; Robichaud, Gilles A; Nardini, Michel; Ouellette, Rodney J
2003-01-01
Pax-5 transcription factor plays a crucial role in B-cell development, activation and differentiation. In murine B-cells four different isoforms of Pax-5 have been identified, and their role in the regulation of the activity of the wild-type protein was revealed although still not fully understood. Using theoretical methods, we investigated the properties of one region of the Pax-5e and Pax-5d isoforms (named UDE domain) and we present a possible theoretical model for the interaction of this domain with thioredoxin that have been previously postulated based on the experimental results. Domain UDE (MW 4.8 kDa) is characterised by an extremely high ratio of positively charged residues (8) in comparisons to negatively charged amino acids (3), as well as unusually large concentrations of prolines (11.6%) and cysteines (4.7%). This is indicative of its role in protein-protein interaction. The experimental 3D structure for either UDE domain or for any analogous sequence is not yet available, and therefore we resorted to various bioinformatics methods in order to predict the secondary and 3D structure from the primary sequence of UDE. Physicochemical properties of the predicted UDE structure gave more indication about possibilities for UDE-thioredoxin binding. In addition, UDE domain was shown to have both sequence and structure analogous to a segment of NAD-reducing hydrogenase HOXS a subunit which is believed to interact with thioredoxin. These studies showed that the UDE domain in Pax-5d and Pax-5e represents an ideal binding site for thioredoxin and we developed a model of UDE-TRX complex with two disulphide bridges. The active site of thioredoxin remained exposed after binding to UDE in this model and therefore binding of thioredoxin to Pax-5d could explain the unexpectedly high resistance of this isoform to oxidation. The complex between thioredoxin and Pax-5e can be a method for transportation of thioredoxin into the nucleus and also into the the vicinity of Pax-5a, explaining the observed activator role of Pax-5e.
Mirza, Zeenat; Pillai, Vikram Gopalakrishna; Zhong, Wei-Zhu
2014-03-10
Alzheimer's disease (AD) is one of the most significant social and health burdens of the present century. Plaques formed by extracellular deposits of amyloid β (Aβ) are the prime player of AD's neuropathology. Studies have implicated the varied role of phospholipase A2 (PLA2) in brain where it contributes to neuronal growth and inflammatory response. Overall contour and chemical nature of the substrate-binding channel in the low molecular weight PLA2s are similar. This study involves the reductionist fragment-based approach to understand the structure adopted by N-terminal fragment of Alzheimer's Aβ peptide in its complex with PLA2. In the current communication, we report the structure determined by X-ray crystallography of N-terminal sequence Asp-Ala-Glu-Phe-Arg-His-Asp-Ser (DAEFRHDS) of Aβ-peptide with a Group I PLA2 purified from venom of Andaman Cobra sub-species Naja naja sagittifera at 2.0 Å resolution (Protein Data Bank (PDB) Code: 3JQ5). This is probably the first attempt to structurally establish interaction between amyloid-β peptide fragment and hydrophobic substrate binding site of PLA2 involving H bond and van der Waals interactions. We speculate that higher affinity between Aβ and PLA2 has the therapeutic potential of decreasing the Aβ-Aβ interaction, thereby reducing the amyloid aggregation and plaque formation in AD.
Purification and sequence characterization of chondroitin sulfate and dermatan sulfate from fishes.
Lin, Na; Mo, Xiaoli; Yang, Yang; Zhang, Hong
2017-04-01
Chondroitin sulfate (CS) and dermatan sulfate (DS) were extracted and purified from skins or bones of salmon (Salmo salar), snakehead (Channa argus), monkfish (Lophius litulon) and skipjack tuna (Katsuwonus pelamis). Size, structural sequences and sulfate groups of oligosaccharides in the purified CS and DS could be characterized and identified using high performance liquid chromatography (HPLC) combined with Orbitrap mass spectrometry. CS and DS chain structure varies depending on origin, but motif structure appears consistent. Structures of CS and DS oligosaccharides with different size and sulfate groups were compared between fishes and other animals, and results showed that some minor differences of special structures could be identified by hydrophilic interaction chromatography-liquid chromatography-fourier transform-mass/mass spectrometry (HILIC-LC-FT-MS/MS). For example, data showed that salmon and skipjack CS had a higher percentage content of high-level sulfated oligosaccharides than that porcine CS. In addition, structural information of different origins of CS and DS was analyzed by principal component analysis (PCA) and results showed that CS and DS samples could be differentiated according to their molecular conformation and oligosaccharide fragments information. Understanding CS and DS structure derived from different origins may lead to the production of CS or DS with unique disaccharides or oligosaccharides sequence composition and biological functions.
The identification and functional annotation of RNA structures conserved in vertebrates.
Seemann, Stefan E; Mirza, Aashiq H; Hansen, Claus; Bang-Berthelsen, Claus H; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L; Gorodkin, Jan
2017-08-01
Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human-mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3' ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. © 2017 Seemann et al.; Published by Cold Spring Harbor Laboratory Press.
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Deciphering the Hidden Informational Content of Protein Sequences
Liu, Ming; Hua, Qing-xin; Hu, Shi-Quan; Jia, Wenhua; Yang, Yanwu; Saith, Sunil Evan; Whittaker, Jonathan; Arvan, Peter; Weiss, Michael A.
2010-01-01
Protein sequences encode both structure and foldability. Whereas the interrelationship of sequence and structure has been extensively investigated, the origins of folding efficiency are enigmatic. We demonstrate that the folding of proinsulin requires a flexible N-terminal hydrophobic residue that is dispensable for the structure, activity, and stability of the mature hormone. This residue (PheB1 in placental mammals) is variably positioned within crystal structures and exhibits 1H NMR motional narrowing in solution. Despite such flexibility, its deletion impaired insulin chain combination and led in cell culture to formation of non-native disulfide isomers with impaired secretion of the variant proinsulin. Cellular folding and secretion were maintained by hydrophobic substitutions at B1 but markedly perturbed by polar or charged side chains. We propose that, during folding, a hydrophobic side chain at B1 anchors transient long-range interactions by a flexible N-terminal arm (residues B1–B8) to mediate kinetic or thermodynamic partitioning among disulfide intermediates. Evidence for the overall contribution of the arm to folding was obtained by alanine scanning mutagenesis. Together, our findings demonstrate that efficient folding of proinsulin requires N-terminal sequences that are dispensable in the native state. Such arm-dependent folding can be abrogated by mutations associated with β-cell dysfunction and neonatal diabetes mellitus. PMID:20663888
Ashkenazy, Haim; Abadi, Shiran; Martz, Eric; Chay, Ofer; Mayrose, Itay; Pupko, Tal; Ben-Tal, Nir
2016-01-01
The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree. PMID:27166375
Solution structure of the chick TGFbeta type II receptor ligand-binding domain.
Marlow, Michael S; Brown, Christopher B; Barnett, Joey V; Krezel, Andrzej M
2003-02-28
The transforming growth factor beta (TGFbeta) signaling pathway influences cell proliferation, immune responses, and extracellular matrix reorganization throughout the vertebrate life cycle. The signaling cascade is initiated by ligand-binding to its cognate type II receptor. Here, we present the structure of the chick type II TGFbeta receptor determined by solution NMR methods. Distance and angular constraints were derived from 15N and 13C edited NMR experiments. Torsion angle dynamics was used throughout the structure calculations and refinement. The 20 final structures were energy minimized using the generalized Born solvent model. For these 20 structures, the average backbone root-mean-square distance from the average structure is below 0.6A. The overall fold of this 109-residue domain is conserved within the superfamily of these receptors. Chick receptors fully recognize and respond to human TGFbeta ligands despite only 60% identity at the sequence level. Comparison with the human TGFbeta receptor determined by X-ray crystallography reveals different conformations in several regions. Sequence divergence and crystal packing interactions under low pH conditions are likely causes. This solution structure identifies regions were structural changes, however subtle, may occur upon ligand-binding. We also identified two very well conserved molecular surfaces. One was found to bind ligand in the crystallized human TGFbeta3:TGFbeta type II receptor complex. The other, newly identified area can be the interaction site with type I and/or type III receptors of the TGFbeta signaling complex.
Entering the Next Dimension: Plant Genomes in 3D.
Sotelo-Silveira, Mariana; Chávez Montes, Ricardo A; Sotelo-Silveira, Jose R; Marsch-Martínez, Nayelli; de Folter, Stefan
2018-04-24
After linear sequences of genomes and epigenomic landscape data, the 3D organization of chromatin in the nucleus is the next level to be explored. Different organisms present a general hierarchical organization, with chromosome territories at the top. Chromatin interaction maps, obtained by chromosome conformation capture (3C)-based methodologies, for eight plant species reveal commonalities, but also differences, among them and with animals. The smallest structures, found in high-resolution maps of the Arabidopsis genome, are single genes. Epigenetic marks (histone modification and DNA methylation), transcriptional activity, and chromatin interaction appear to be correlated, and whether structure is the cause or consequence of the function of interacting regions is being actively investigated. Copyright © 2018 Elsevier Ltd. All rights reserved.
Selection of homeotic proteins for binding to a human DNA replication origin.
de Stanchina, E; Gabellini, D; Norio, P; Giacca, M; Peverali, F A; Riva, S; Falaschi, A; Biamonti, G
2000-06-09
We have previously shown that a cell cycle-dependent nucleoprotein complex assembles in vivo on a 74 bp sequence within the human DNA replication origin associated to the Lamin B2 gene. Here, we report the identification, using a one-hybrid screen in yeast, of three proteins interacting with the 74 bp sequence. All of them, namely HOXA13, HOXC10 and HOXC13, are orthologues of the Abdominal-B gene of Drosophila melanogaster and are members of the homeogene family of developmental regulators. We describe the complete open reading frame sequence of HOXC10 and HOXC13 along with the structure of the HoxC13 gene. The specificity of binding of these two proteins to the Lamin B2 origin is confirmed by both band-shift and in vitro footprinting assays. In addition, the ability of HOXC10 and HOXC13 to increase the activity of a promoter containing the 74 bp sequence, as assayed by CAT-assay experiments, demonstrates a direct interaction of these homeoproteins with the origin sequence in mammalian cells. We also show that HOXC10 expression is cell-type-dependent and positively correlates with cell proliferation. Copyright 2000 Academic Press.
Molecular trophic markers in marine food webs and their potential use for coral ecology.
Leal, Miguel Costa; Ferrier-Pagès, Christine
2016-10-01
Notable advances in ecological genomics have been driven by high-throughput sequencing technology and taxonomically broad sequence repositories that allow us to accurately assess species interactions with great taxonomic resolution. The use of DNA as a marker for ingested food is particularly relevant to address predator-prey interactions and disentangle complex marine food webs. DNA-based methods benefit from reductionist molecular approaches to address ecosystem scale processes, such as community structure and energy flow across trophic levels, among others. Here we review how molecular trophic markers have been used to better understand trophic interactions in the marine environment and their advantages and limitations. We focus on animal groups where research has been focused, such as marine mammals, seabirds, fishes, pelagic invertebrates and benthic invertebrates, and use case studies to illustrate how DNA-based methods unraveled food-web interactions. The potential of molecular trophic markers for disentangling the complex trophic ecology of corals is also discussed. Copyright © 2016 Elsevier B.V. All rights reserved.
Laine, Elodie; Carbone, Alessandra
2015-01-01
Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684
Alignment of RNA molecules: Binding energy and statistical properties of random sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Valba, O. V., E-mail: valbaolga@gmail.com; Nechaev, S. K., E-mail: sergei.nechaev@gmail.com; Tamm, M. V., E-mail: thumm.m@gmail.com
2012-02-15
A new statistical approach to the problem of pairwise alignment of RNA sequences is proposed. The problem is analyzed for a pair of interacting polymers forming an RNA-like hierarchical cloverleaf structures. An alignment is characterized by the numbers of matches, mismatches, and gaps. A weight function is assigned to each alignment; this function is interpreted as a free energy taking into account both direct monomer-monomer interactions and a combinatorial contribution due to formation of various cloverleaf secondary structures. The binding free energy is determined for a pair of RNA molecules. Statistical properties are discussed, including fluctuations of the binding energymore » between a pair of RNA molecules and loop length distribution in a complex. Based on an analysis of the free energy per nucleotide pair complexes of random RNAs as a function of the number of nucleotide types c, a hypothesis is put forward about the exclusivity of the alphabet c = 4 used by nature.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buffalo, Cosmo Z.; Bahn-Suh, Adrian J.; Hirakis, Sophia P.
No vaccine exists against group A Streptococcus (GAS), a leading cause of worldwide morbidity and mortality. A severe hurdle is the hypervariability of its major antigen, the M protein, with >200 different M types known. Neutralizing antibodies typically recognize M protein hypervariable regions (HVRs) and confer narrow protection. In stark contrast, human C4b-binding protein (C4BP), which is recruited to the GAS surface to block phagocytic killing, interacts with a remarkably large number of M protein HVRs (apparently ~90%). Such broad recognition is rare, and we discovered a unique mechanism for this through the structure determination of four sequence-diverse M proteinsmore » in complexes with C4BP. The structures revealed a uniform and tolerant ‘reading head’ in C4BP, which detected conserved sequence patterns hidden within hypervariability. Our results open up possibilities for rational therapies that target the M–C4BP interaction, and also inform a path towards vaccine design.« less
Kalloush, Rawan M; Vivet-Boudou, Valérie; Ali, Lizna M; Mustafa, Farah; Marquet, Roland; Rizvi, Tahir A
2016-06-01
MPMV has great potential for development as a vector for gene therapy. In this respect, precisely defining the sequences and structural motifs that are important for dimerization and packaging of its genomic RNA (gRNA) are of utmost importance. A distinguishing feature of the MPMV gRNA packaging signal is two phylogenetically conserved long-range interactions (LRIs) between U5 and gag complementary sequences, LRI-I and LRI-II. To test their biological significance in the MPMV life cycle, we introduced mutations into these structural motifs and tested their effects on MPMV gRNA packaging and propagation. Furthermore, we probed the structure of key mutants using SHAPE (selective 2'hydroxyl acylation analyzed by primer extension). Disrupting base-pairing of the LRIs affected gRNA packaging and propagation, demonstrating their significance to the MPMV life cycle. A double mutant restoring a heterologous LRI-I was fully functional, whereas a similar LRI-II mutant failed to restore gRNA packaging and propagation. These results demonstrate that while LRI-I acts at the structural level, maintaining base-pairing is not sufficient for LRI-II function. In addition, in vitro RNA dimerization assays indicated that the loss of RNA packaging in LRI mutants could not be attributed to the defects in dimerization. Our findings suggest that U5-gag LRIs play an important architectural role in maintaining the structure of the 5' region of the MPMV gRNA, expanding the crucial role of LRIs to the nonlentiviral group of retroviruses. © 2016 Kalloush et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Capturing RNA Folding Free Energy with Coarse-Grained Molecular Dynamics Simulations
Bell, David R.; Cheng, Sara Y.; Salazar, Heber; Ren, Pengyu
2017-01-01
We introduce a coarse-grained RNA model for molecular dynamics simulations, RACER (RnA CoarsE-gRained). RACER achieves accurate native structure prediction for a number of RNAs (average RMSD of 2.93 Å) and the sequence-specific variation of free energy is in excellent agreement with experimentally measured stabilities (R2 = 0.93). Using RACER, we identified hydrogen-bonding (or base pairing), base stacking, and electrostatic interactions as essential driving forces for RNA folding. Also, we found that separating pairing vs. stacking interactions allowed RACER to distinguish folded vs. unfolded states. In RACER, base pairing and stacking interactions each provide an approximate stability of 3–4 kcal/mol for an A-form helix. RACER was developed based on PDB structural statistics and experimental thermodynamic data. In contrast with previous work, RACER implements a novel effective vdW potential energy function, which led us to re-parameterize hydrogen bond and electrostatic potential energy functions. Further, RACER is validated and optimized using a simulated annealing protocol to generate potential energy vs. RMSD landscapes. Finally, RACER is tested using extensive equilibrium pulling simulations (0.86 ms total) on eleven RNA sequences (hairpins and duplexes). PMID:28393861
Single helically folded aromatic oligoamides that mimic the charge surface of double-stranded B-DNA
NASA Astrophysics Data System (ADS)
Ziach, Krzysztof; Chollet, Céline; Parissi, Vincent; Prabhakaran, Panchami; Marchivie, Mathieu; Corvaglia, Valentina; Bose, Partha Pratim; Laxmi-Reddy, Katta; Godde, Frédéric; Schmitter, Jean-Marie; Chaignepain, Stéphane; Pourquier, Philippe; Huc, Ivan
2018-05-01
Numerous essential biomolecular processes require the recognition of DNA surface features by proteins. Molecules mimicking these features could potentially act as decoys and interfere with pharmacologically or therapeutically relevant protein-DNA interactions. Although naturally occurring DNA-mimicking proteins have been described, synthetic tunable molecules that mimic the charge surface of double-stranded DNA are not known. Here, we report the design, synthesis and structural characterization of aromatic oligoamides that fold into single helical conformations and display a double helical array of negatively charged residues in positions that match the phosphate moieties in B-DNA. These molecules were able to inhibit several enzymes possessing non-sequence-selective DNA-binding properties, including topoisomerase 1 and HIV-1 integrase, presumably through specific foldamer-protein interactions, whereas sequence-selective enzymes were not inhibited. Such modular and synthetically accessible DNA mimics provide a versatile platform to design novel inhibitors of protein-DNA interactions.
Effects of Replication and Transcription on DNA Structure-Related Genetic Instability.
Wang, Guliang; Vasquez, Karen M
2017-01-05
Many repetitive sequences in the human genome can adopt conformations that differ from the canonical B-DNA double helix (i.e., non-B DNA), and can impact important biological processes such as DNA replication, transcription, recombination, telomere maintenance, viral integration, transposome activation, DNA damage and repair. Thus, non-B DNA-forming sequences have been implicated in genetic instability and disease development. In this article, we discuss the interactions of non-B DNA with the replication and/or transcription machinery, particularly in disease states (e.g., tumors) that can lead to an abnormal cellular environment, and how such interactions may alter DNA replication and transcription, leading to potential conflicts at non-B DNA regions, and eventually result in genetic stability and human disease.
Effects of Replication and Transcription on DNA Structure-Related Genetic Instability
Wang, Guliang; Vasquez, Karen M.
2017-01-01
Many repetitive sequences in the human genome can adopt conformations that differ from the canonical B-DNA double helix (i.e., non-B DNA), and can impact important biological processes such as DNA replication, transcription, recombination, telomere maintenance, viral integration, transposome activation, DNA damage and repair. Thus, non-B DNA-forming sequences have been implicated in genetic instability and disease development. In this article, we discuss the interactions of non-B DNA with the replication and/or transcription machinery, particularly in disease states (e.g., tumors) that can lead to an abnormal cellular environment, and how such interactions may alter DNA replication and transcription, leading to potential conflicts at non-B DNA regions, and eventually result in genetic stability and human disease. PMID:28067787
Travers, Timothy; Wang, Katherine J.; Lopez, Cesar A.; ...
2018-02-09
Gram-negative multidrug resistance currently presents a serious threat to public health with infections effectively rendered untreatable. Multiple molecular mechanisms exist that cause antibiotic resistance and in addition, the last three decades have seen slowing rates of new drug development. In this paper, we summarize the use of various computational techniques for investigating the mechanisms of multidrug resistance mediated by Gram-negative tripartite efflux pumps and membranes. Recent work in our lab combines data-driven sequence and structure analyses to study the interactions and dynamics of these bacterial components. Computational studies can complement experimental methodologies for gaining crucial insights into combatting multidrug resistance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Travers, Timothy; Wang, Katherine J.; Lopez, Cesar A.
Gram-negative multidrug resistance currently presents a serious threat to public health with infections effectively rendered untreatable. Multiple molecular mechanisms exist that cause antibiotic resistance and in addition, the last three decades have seen slowing rates of new drug development. In this paper, we summarize the use of various computational techniques for investigating the mechanisms of multidrug resistance mediated by Gram-negative tripartite efflux pumps and membranes. Recent work in our lab combines data-driven sequence and structure analyses to study the interactions and dynamics of these bacterial components. Computational studies can complement experimental methodologies for gaining crucial insights into combatting multidrug resistance.
R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.
Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles
2015-07-01
The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Shelar, Ashish; Bansal, Manju
2014-12-01
α-Helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. © 2014 Wiley Periodicals, Inc.
Babaei, Sepideh; Geranmayeh, Amir; Seyyedsalehi, Seyyed Ali
2010-12-01
The supervised learning of recurrent neural networks well-suited for prediction of protein secondary structures from the underlying amino acids sequence is studied. Modular reciprocal recurrent neural networks (MRR-NN) are proposed to model the strong correlations between adjacent secondary structure elements. Besides, a multilayer bidirectional recurrent neural network (MBR-NN) is introduced to capture the long-range intramolecular interactions between amino acids in formation of the secondary structure. The final modular prediction system is devised based on the interactive integration of the MRR-NN and the MBR-NN structures to arbitrarily engage the neighboring effects of the secondary structure types concurrent with memorizing the sequential dependencies of amino acids along the protein chain. The advanced combined network augments the percentage accuracy (Q₃) to 79.36% and boosts the segment overlap (SOV) up to 70.09% when tested on the PSIPRED dataset in three-fold cross-validation. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Kaplan, Oktay I; Berber, Burak; Hekim, Nezih; Doluca, Osman
2016-11-02
Many studies show that short non-coding sequences are widely conserved among regulatory elements. More and more conserved sequences are being discovered since the development of next generation sequencing technology. A common approach to identify conserved sequences with regulatory roles relies on topological changes such as hairpin formation at the DNA or RNA level. G-quadruplexes, non-canonical nucleic acid topologies with little established biological roles, are increasingly considered for conserved regulatory element discovery. Since the tertiary structure of G-quadruplexes is strongly dependent on the loop sequence which is disregarded by the generally accepted algorithm, we hypothesized that G-quadruplexes with similar topology and, indirectly, similar interaction patterns, can be determined using phylogenetic clustering based on differences in the loop sequences. Phylogenetic analysis of 52 G-quadruplex forming sequences in the Escherichia coli genome revealed two conserved G-quadruplex motifs with a potential regulatory role. Further analysis revealed that both motifs tend to form hairpins and G quadruplexes, as supported by circular dichroism studies. The phylogenetic analysis as described in this work can greatly improve the discovery of functional G-quadruplex structures and may explain unknown regulatory patterns. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hudson, William H.; Pickard, Mark R.; de Vera, Ian Mitchelle S.
2014-12-23
The majority of the eukaryotic genome is transcribed, generating a significant number of long intergenic noncoding RNAs (lincRNAs). Although lincRNAs represent the most poorly understood product of transcription, recent work has shown lincRNAs fulfill important cellular functions. In addition to low sequence conservation, poor understanding of structural mechanisms driving lincRNA biology hinders systematic prediction of their function. Here we report the molecular requirements for the recognition of steroid receptors (SRs) by the lincRNA growth arrest-specific 5 (Gas5), which regulates steroid-mediated transcriptional regulation, growth arrest and apoptosis. We identify the functional Gas5-SR interface and generate point mutations that ablate the SR-Gas5more » lincRNA interaction, altering Gas5-driven apoptosis in cancer cell lines. Further, we find that the Gas5 SR-recognition sequence is conserved among haplorhines, with its evolutionary origin as a splice acceptor site. This study demonstrates that lincRNAs can recognize protein targets in a conserved, sequence-specific manner in order to affect critical cell functions.« less
High-resolution biophysical analysis of the dynamics of nucleosome formation
Hatakeyama, Akiko; Hartmann, Brigitte; Travers, Andrew; Nogues, Claude; Buckle, Malcolm
2016-01-01
We describe a biophysical approach that enables changes in the structure of DNA to be followed during nucleosome formation in in vitro reconstitution with either the canonical “Widom” sequence or a judiciously mutated sequence. The rapid non-perturbing photochemical analysis presented here provides ‘snapshots’ of the DNA configuration at any given moment in time during nucleosome formation under a very broad range of reaction conditions. Changes in DNA photochemical reactivity upon protein binding are interpreted as being mainly induced by alterations in individual base pair roll angles. The results strengthen the importance of the role of an initial (H3/H4)2 histone tetramer-DNA interaction and highlight the modulation of this early event by the DNA sequence. (H3/H4)2 binding precedes and dictates subsequent H2A/H2B-DNA interactions, which are less affected by the DNA sequence, leading to the final octameric nucleosome. Overall, our results provide a novel, exciting way to investigate those biophysical properties of DNA that constitute a crucial component in nucleosome formation and stabilization. PMID:27263658
Genotype to Phenotype Mapping of the E. coli lac Promoter
NASA Astrophysics Data System (ADS)
Otwinowski, Jakub; Nemenman, Ilya
2014-03-01
Genotype-to-phenotype maps and the related fitness landscapes that include epistatic interactions are difficult to measure because of their high dimensional structure. Here we construct such a map using the recently collected corpora of high-throughput sequence data from the 75 base pairs long mutagenized E. coli lac promoter region, where each sequence is associated with induced transcriptional activity measured by a fluorescent reporter. We find that the additive (non-epistatic) contributions of individual mutations account for about two-thirds of the explainable phenotype variance, while pairwise epistasis explains about 7% of the variance for the full mutagenized sequence and about 15% for the subsequence associated with protein binding sites. Surprisingly, there is no evidence for third order epistatic contributions, and our inferred fitness landscape is essentially single peaked, with a small amount of antagonistic epistasis. We identify transcription factor (CRP) and RNA polymerase binding sites in the promotor region and their interactions. We conclude with a cautionary note that inferred properties of fitness landscapes may be severely influenced by biases in the sequence data. Funded in part by HFSP and James S. McDonnell Foundation.
Matveev, Vladimir V
2010-06-09
According to the hypothesis explored in this paper, native aggregation is genetically controlled (programmed) reversible aggregation that occurs when interacting proteins form new temporary structures through highly specific interactions. It is assumed that Anfinsen's dogma may be extended to protein aggregation: composition and amino acid sequence determine not only the secondary and tertiary structure of single protein, but also the structure of protein aggregates (associates). Cell function is considered as a transition between two states (two states model), the resting state and state of activity (this applies to the cell as a whole and to its individual structures). In the resting state, the key proteins are found in the following inactive forms: natively unfolded and globular. When the cell is activated, secondary structures appear in natively unfolded proteins (including unfolded regions in other proteins), and globular proteins begin to melt and their secondary structures become available for interaction with the secondary structures of other proteins. These temporary secondary structures provide a means for highly specific interactions between proteins. As a result, native aggregation creates temporary structures necessary for cell activity."One of the principal objects of theoretical research in any department of knowledge is to find the point of view from which the subject appears in its greatest simplicity."Josiah Willard Gibbs (1839-1903).
Agarwal, Sorabh
2018-01-01
Abstract Overexpression of the proinflammatory cytokine macrophage migration inhibitory factor (MIF) is linked to a number of autoimmune diseases and cancer. MIF production has been correlated to the number of CATT repeats in a microsatellite region upstream of the MIF gene. We have characterized the interaction of pituitary-specific positive transcription factor 1 (Pit-1) with a portion of the MIF promoter region flanking a microsatellite polymorphism (−794 CATT5–8). Using fluorescence anisotropy, we quantified tight complex formation between Pit-1 and an oligonucleotide consisting of eight consecutive CATT repeats (8xCATT) with an apparent Kd of 35 nM. Using competition experiments we found a 23 base pair oligonucleotide with 4xCATT repeats to be the minimum DNA sequence necessary for high affinity interaction with Pit-1. The stoichiometry of the Pit-1 DNA interaction was determined to be 2:1 and binding is cooperative in nature. We subsequently structurally characterized the complex and discovered a completely novel binding mode for Pit-1 in contrast to previously described Pit-1 complex structures. The affinity of Pit-1 for the CATT target sequence was found to be highly dependent on cooperativity. This work lays the groundwork for understanding transcriptional regulation of MIF and pursuing Pit-1 as a therapeutic target to treat MIF-mediated inflammatory disorders. PMID:29186613
Bioinformatics and molecular modeling in glycobiology
Schloissnig, Siegfried
2010-01-01
The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein–carbohydrate interaction are reviewed. PMID:20364395
Zaman, Aubhishek; Fancy, Nurun Nahar
2012-12-01
Vps mediated vesicular transport is important for transferring macromolecules trapped inside a vesicle. Although highly abundant, Vps shows tremendous sequence variation among diverse array of species. However, this difference in sequence, which seems to also translate into substantial functional variation, is hardly characterized in Corchorus spp. Here, our computational study investigates structural and functional features of one of the Vps subunit namely Vps51/Vps67 in C. olitorius. Broad scale structural characterization revealed novel information about the overall Vps structure and binding sites. Moreover, functional analyses indicate interaction partners which were unexplored to date. Since membrane trafficking is essentially associated with nutrient uptake and chemical de-toxification, characterization of the Vps subunit can well provide us with better insight into important agronomic traits such as stress response, immune response and phytoremediation capacity.
NASA Astrophysics Data System (ADS)
Zhao, Xiaoqing; Li, Hong; Bao, Tonglaga; Ying, Zhiqiang
2012-09-01
Many experiment evidences showed that sequence structures of introns and intron loss/gain can influence gene expression, but current mechanisms did not refer to the functions of post-spliced introns directly. We propose that postspliced introns play their functions in gene expression by interacting with their mRNA sequences and the interaction is characterized by the matched segments between introns and their CDS. In this study, we investigated the interaction characters with length series by improved Smith-Waterman local alignment software for the ribosomal protein genes in C. elegans and D. melanogaster. Our results showed that RF values of five intron groups are significantly high in the central non-conserved region and very low in 5'-end and 3'-end splicing region. It is interesting that the number of the optimal matched regions gradually increases with intron length. Distributions of the optimal matched regions are different for five intron groups. Our study revealed that there are more interaction regions between longer introns and their CDS than shorter, and it provides a positive pattern for regulating the gene expression.
Deciphering the glycosaminoglycan code with the help of microarrays.
de Paz, Jose L; Seeberger, Peter H
2008-07-01
Carbohydrate microarrays have become a powerful tool to elucidate the biological role of complex sugars. Microarrays are particularly useful for the study of glycosaminoglycans (GAGs), a key class of carbohydrates. The high-throughput chip format enables rapid screening of large numbers of potential GAG sequences produced via a complex biosynthesis while consuming very little sample. Here, we briefly highlight the most recent advances involving GAG microarrays built with synthetic or naturally derived oligosaccharides. These chips are powerful tools for characterizing GAG-protein interactions and determining structure-activity relationships for specific sequences. Thereby, they contribute to decoding the information contained in specific GAG sequences.
The disorderly conduct of Hsc70 and its interaction with the Alzheimer's related Tau protein.
Taylor, Isabelle R; Ahmad, Atta; Wu, Taia; Nordhues, Bryce A; Bhullar, Anup; Gestwicki, Jason E; Zuiderweg, Erik R P
2018-05-15
Hsp70 chaperones bind to various protein substrates for folding, trafficking, and degradation. Considerable structural information is available about how prokaryotic Hsp70 (DnaK) binds substrates, but less is known about mammalian Hsp70s, of which there are 13 isoforms encoded in the human genome. Here, we report the interaction between the human Hsp70 isoform heat shock cognate 71 KDa protein (Hsc70 or HSPA8) and peptides derived from the microtubule-associated protein tau, which is linked to Alzheimer's disease. For structural studies, we used an Hsc70 construct (called BETA) comprising the substrate-binding domain, but lacking the lid. Importantly, we found that truncating the lid does not significantly impair Hsc70's chaperone activity or allostery in vitro. Using NMR, we show that BETA is partially dynamically disordered in the absence of substrate and that binding of the tau sequence GKVQIINKKG (with a KD = 500 nM) causes dramatic rigidification of BETA. Nuclear Overhauser effect distance measurements revealed that tau binds to the canonical substrate-binding cleft, similar to the binding observed with DnaK. To further develop BETA as a tool for studying Hsc70 interactions, we also measured BETA binding in NMR and fluorescent competition assays to peptides derived from huntingtin, insulin, a second tau-recognition sequence, and a KFERQ-like sequence linked to chaperone-mediated autophagy. We found that the insulin C-peptide binds BETA with high affinity (KD < 100 nM), whereas the others do not (KD > 100 μM). Together, our findings reveal several similarities and differences in how prokaryotic and mammalian Hsp70 isoforms interact with different substrate peptides. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges
2015-01-01
Protein-DNA interaction is of fundamental importance in molecular biology, playing roles in functions as diverse as DNA transcription, DNA structure formation, and DNA repair. Protein-DNA association is also important in medicine; understanding Protein-DNA binding kinetics can assist in identifying disease root causes which can contribute to drug development. In this perspective, this work focuses on the transcription process by the GATA Transcription Factor (TF). GATA TF binds to DNA promoter region represented by `G,A,T,A' nucleotides sequence, and initiates transcription of target genes. When proper regulation fails due to some mutations on the GATA TF protein sequence or on the DNA promoter sequence (weak promoter), deregulation of the target genes might lead to various disorders. In this study, we aim to understand the electrostatic mechanism behind GATA TF and DNA promoter interactions, in order to predict Protein-DNA binding in the presence of mutations, while elaborating on non-covalent binding kinetics. To generate a family of mutants for the GATA:DNA complex, we replaced every charged amino acid, one at a time, with a neutral amino acid like Alanine (Ala). We then applied Poisson-Boltzmann electrostatic calculations feeding into free energy calculations, for each mutation. These calculations delineate the contribution to binding from each Ala-replaced amino acid in the GATA:DNA interaction. After analyzing the obtained data in view of a two-step model, we are able to identify potential key amino acids in binding. Finally, we applied the model to GATA-3:DNA (crystal structure with PDB-ID: 3DFV) binding complex and validated it against experimental results from the literature.
Sequence periodicity in nucleosomal DNA and intrinsic curvature
2010-01-01
Background Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Results Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. Conclusions The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA. PMID:20487515
Swellix: a computational tool to explore RNA conformational space.
Sloat, Nathan; Liu, Jui-Wen; Schroeder, Susan J
2017-11-21
The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods. The conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures. Swellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.
Darré, Leonardo; Machado, Matías Rodrigo; Brandner, Astrid Febe; González, Humberto Carlos; Ferreira, Sebastián; Pantano, Sergio
2015-02-10
Modeling of macromolecular structures and interactions represents an important challenge for computational biology, involving different time and length scales. However, this task can be facilitated through the use of coarse-grained (CG) models, which reduce the number of degrees of freedom and allow efficient exploration of complex conformational spaces. This article presents a new CG protein model named SIRAH, developed to work with explicit solvent and to capture sequence, temperature, and ionic strength effects in a topologically unbiased manner. SIRAH is implemented in GROMACS, and interactions are calculated using a standard pairwise Hamiltonian for classical molecular dynamics simulations. We present a set of simulations that test the capability of SIRAH to produce a qualitatively correct solvation on different amino acids, hydrophilic/hydrophobic interactions, and long-range electrostatic recognition leading to spontaneous association of unstructured peptides and stable structures of single polypeptides and protein-protein complexes.
Functional Classification of Immune Regulatory Proteins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rubinstein, Rotem; Ramagopal, Udupi A.; Nathenson, Stanley G.
2013-05-01
Members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity and are prime targets for the treatment of autoimmune diseases, infectious diseases, and malignancies. We describe a computational method, termed the Brotherhood algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies functional relationships within the IgSF and predicts additional receptor-ligand interactions. As a specific example, we examine the nectin/nectin-like family of cell adhesion and signaling proteins and propose receptor-ligand interactions within this family. We were guided by the Brotherhood approach and present the high-resolution structural characterization of a homophilic interaction involving themore » class-I MHC-restricted T-cell-associated molecule, which we now classify as a nectin-like family member. The Brotherhood algorithm is likely to have a significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative.« less
FragFit: a web-application for interactive modeling of protein segments into cryo-EM density maps.
Tiemann, Johanna K S; Rose, Alexander S; Ismer, Jochen; Darvish, Mitra D; Hilal, Tarek; Spahn, Christian M T; Hildebrand, Peter W
2018-05-21
Cryo-electron microscopy (cryo-EM) is a standard method to determine the three-dimensional structures of molecular complexes. However, easy to use tools for modeling of protein segments into cryo-EM maps are sparse. Here, we present the FragFit web-application, a web server for interactive modeling of segments of up to 35 amino acids length into cryo-EM density maps. The fragments are provided by a regularly updated database containing at the moment about 1 billion entries extracted from PDB structures and can be readily integrated into a protein structure. Fragments are selected based on geometric criteria, sequence similarity and fit into a given cryo-EM density map. Web-based molecular visualization with the NGL Viewer allows interactive selection of fragments. The FragFit web-application, accessible at http://proteinformatics.de/FragFit, is free and open to all users, without any login requirements.
Modeling DNA bubble formation at the atomic scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beleva, V; Rasmussen, K. O.; Garcia, A. E.
We describe the fluctuations of double stranded DNA molecules using a minimalist Go model over a wide range of temperatures. Minimalist models allow us to describe, at the atomic level, the opening and formation of bubbles in DNA double helices. This model includes all the geometrical constraints in helix melting imposed by the 3D structure of the molecule. The DNA forms melted bubbles within double helices. These bubbles form and break as a function of time. The equilibrium average number of broken base pairs shows a sharp change as a function of T. We observe a temperature profile of sequencemore » dependent bubble formation similar to those measured by Zeng et al. Long nuclei acid molecules melt partially through the formations of bubbles. It is known that CG rich sequences melt at higher temperatures than AT rich sequences. The melting temperature, however, is not solely determined by the CG content, but by the sequence through base stacking and solvent interactions. Recently, models that incorporate the sequence and nonlinear dynamics of DNA double strands have shown that DNA exhibits a very rich dynamics. Recent extensions of the Bishop-Peyrard model show that fluctuations in the DNA structure lead to opening in localized regions, and that these regions in the DNA are associated with transcription initiation sites. 1D and 2D models of DNA may contain enough information about stacking and base pairing interactions, but lack the coupling between twisting, bending and base pair opening imposed by the double helical structure of DNA that all atom models easily describe. However, the complexity of the energy function used in all atom simulations (including solvent, ions, etc) does not allow for the description of DNA folding/unfolding events that occur in the microsecond time scale.« less
Kinjo, Akira R; Nakamura, Haruki
2013-01-01
Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.
Cytomegalovirus Basic Phosphoprotein (pUL32) Binds to Capsids In Vitro through Its Amino One-Third
Baxter, Michael K.; Gibson, Wade
2001-01-01
The cytomegalovirus (CMV) basic phosphoprotein (BPP) is a component of the tegument. It remains with the nucleocapsid fraction under conditions that remove most other tegument proteins from the virion, suggesting a direct and perhaps tight interaction with the capsid. As a step toward localizing this protein within the molecular structure of the virion and understanding its function during infection, we have investigated the BPP-capsid interaction. In this report we present evidence that the BPP interacts selectively, through its amino one-third, with CMV capsids. Radiolabeled simian CMV (SCMV) BPP, synthesized in vitro, bound to SCMV B-capsids, and C-capsids to a lesser extent, following incubation with either isolated capsids or lysates of infected cells. Human CMV (HCMV) BPP (pUL32) also bound to SCMV capsids, and SCMV BPP likewise bound to HCMV capsids, indicating that the sequence(s) involved is conserved between the two proteins. Analysis of SCMV BPP truncation mutants localized the capsid-binding region to the amino one-third of the molecule—the portion of BPP showing the greatest sequence conservation between the SCMV and HCMV homologs. This general approach may have utility in studying the interactions of other proteins with conformation-dependent binding sites. PMID:11435566
Two short segments of Smad3 are important for specific interaction of Smad3 with c-Ski and SnoN.
Mizuide, Masafumi; Hara, Takane; Furuya, Toshio; Takeda, Masafumi; Kusanagi, Kiyoshi; Inada, Yuri; Mori, Masatomo; Imamura, Takeshi; Miyazawa, Keiji; Miyazono, Kohei
2003-01-03
c-Ski and SnoN are transcriptional co-repressors that inhibit transforming growth factor-beta signaling through interaction with Smad proteins. Among receptor-regulated Smads, c-Ski and SnoN bind more strongly to Smad2 and Smad3 than to Smad1. Here, we show that c-Ski and SnoN bind to the "SE" sequence in the C-terminal MH2 domain of Smad3, which is exposed on the N-terminal upper side of the toroidal structure of the MH2 oligomer. The "QPSMT" sequence, located in the vicinity of SE, supports the interaction with c-Ski and SnoN. Sequences similar to SE and QPSMT are found in Smad2, but not in Smad1. The N-terminal MH1 domain and linker region of Smad3 protrude from the N-terminal upper side of the MH2 oligomer toroid. Smurf2 induces ubiquitin-dependent degradation of SnoN, since it appears to be located close to SnoN through binding to the linker region of Smad2. In contrast, transcription factors Mixer and FoxH3 (FAST1) bind to the bottom side of the Smad3 MH2 toroid; therefore, c-Ski does not affect the interaction of Smads with these transcription factors. Our findings thus demonstrate the stoichiometry of how multiple molecules can associate with the Smad oligomers and how the Smad-interacting proteins functionally interact with each other.
CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription.
Tang, Zhonghui; Luo, Oscar Junhong; Li, Xingwang; Zheng, Meizhen; Zhu, Jacqueline Jufen; Szalaj, Przemyslaw; Trzaskoma, Pawel; Magalska, Adriana; Wlodarczyk, Jakub; Ruszczycki, Blazej; Michalski, Paul; Piecuch, Emaly; Wang, Ping; Wang, Danjuan; Tian, Simon Zhongyuan; Penrad-Mobayed, May; Sachs, Laurent M; Ruan, Xiaoan; Wei, Chia-Lin; Liu, Edison T; Wilczynski, Grzegorz M; Plewczynski, Dariusz; Li, Guoliang; Ruan, Yijun
2015-12-17
Spatial genome organization and its effect on transcription remains a fundamental question. We applied an advanced chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) strategy to comprehensively map higher-order chromosome folding and specific chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) with haplotype specificity and nucleotide resolution in different human cell lineages. We find that CTCF/cohesin-mediated interaction anchors serve as structural foci for spatial organization of constitutive genes concordant with CTCF-motif orientation, whereas RNAPII interacts within these structures by selectively drawing cell-type-specific genes toward CTCF foci for coordinated transcription. Furthermore, we show that haplotype variants and allelic interactions have differential effects on chromosome configuration, influencing gene expression, and may provide mechanistic insights into functions associated with disease susceptibility. 3D genome simulation suggests a model of chromatin folding around chromosomal axes, where CTCF is involved in defining the interface between condensed and open compartments for structural regulation. Our 3D genome strategy thus provides unique insights in the topological mechanism of human variations and diseases. Copyright © 2015 Elsevier Inc. All rights reserved.
In-cell RNA structure probing with SHAPE-MaP.
Smola, Matthew J; Weeks, Kevin M
2018-06-01
This protocol is an extension to: Nat. Protoc. 10, 1643-1669 (2015); doi:10.1038/nprot.2015.103; published online 01 October 2015RNAs play key roles in many cellular processes. The underlying structure of RNA is an important determinant of how transcripts function, are processed, and interact with RNA-binding proteins and ligands. RNA structure analysis by selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) takes advantage of the reactivity of small electrophilic chemical probes that react with the 2'-hydroxyl group to assess RNA structure at nucleotide resolution. When coupled with mutational profiling (MaP), in which modified nucleotides are detected as internal miscodings during reverse transcription and then read out by massively parallel sequencing, SHAPE yields quantitative per-nucleotide measurements of RNA structure. Here, we provide an extension to our previous in vitro SHAPE-MaP protocol with detailed guidance for undertaking and analyzing SHAPE-MaP probing experiments in live cells. The MaP strategy works for both abundant-transcriptome experiments and for cellular RNAs of low to moderate abundance, which are not well examined by whole-transcriptome methods. In-cell SHAPE-MaP, performed in roughly 3 d, can be applied in cell types ranging from bacteria to cultured mammalian cells and is compatible with a variety of structure-probing reagents. We detail several strategies by which in-cell SHAPE-MaP can inform new biological hypotheses and emphasize downstream analyses that reveal sequence or structure motifs important for RNA interactions in cells.
Tsigelny, Igor; Mahata, Sushil K.; Taupenot, Laurent; Preece, Nicholas E.; Mahata, Manjula; Khan, Imran; Parmer, Robert J.; O’Connor, Daniel T.
2009-01-01
A novel fragment of chromogranin A, known as ‘catestatin’ (bovine chromogranin A344–364), inhibits catecholamine release from chromaffin cells and noradrenergic neurons by acting as a non-competitive nicotinic cholinergic antagonist, and may therefore constitute an endogenous autocrine feedback regulator of sympathoadrenal activity. To characterize how this activity depends on the peptide’s structure, we searched for common 3-dimensional motifs for this primary structure or its homologs. Catestatin’s primary structure bore significant (29–35.5% identity, general alignment score 44–57) sequence homology to fragment sequences within three homologs of known 3-dimensional structures, based on solved X-ray crystals: 8FAB, 1PKM, and 2IG2. Each of these sequences exists in nature as a β-strand/loop/β-strand structure, stabilized by hydrophobic interactions between the β-strands. The catestatin structure was stable during molecular dynamics simulations. The catestatin loop contains three Arg residues, whose electropositive side chains form the terminus of the structure, and give rise to substantial uncompensated charge asymmetry in the molecule. A hydrophobic moment plot revealed that catestatin is the only segment of chromogranin A predicted to contain amphiphilic β-strand. Circular dichroism in the far ultraviolet showed substantial (63%) β-sheet structure, especially in a hydrophobic environment. Alanine-substitution mutants of catestatin established a crucial role for the three central arginine residues in the loop (Arg351, Arg353, and Arg358), though not for two arginine residues in the strand region toward the amino-terminus. [125I]Catestatin bound to Torpedo membranes at a site other than the nicotinic agonist binding site. When the catestatin structure was ‘docked’ with the extracellular domain of the Torpedo nicotinic cholinergic receptor, it interacted principally with the β and δ subunits, in a relatively hydrophobic region of the cation pore extracellular orifice, and the complex of ligand and receptor largely occluded the cation pore, providing a structural basis for the non-competitive nicotinic cholinergic antagonist properties of the peptide. We conclude that a homology model of catestatin correctly predicts actual features of the peptide, both physical and biological. The model suggests particular spatial and charge features of the peptide which may serve as starting points in the development of non-peptide mimetics of this endogenous nicotinic cholinergic antagonist. PMID:9809795
Computer-Aided Design of RNA Origami Structures.
Sparvath, Steffen L; Geary, Cody W; Andersen, Ebbe S
2017-01-01
RNA nanostructures can be used as scaffolds to organize, combine, and control molecular functionalities, with great potential for applications in nanomedicine and synthetic biology. The single-stranded RNA origami method allows RNA nanostructures to be folded as they are transcribed by the RNA polymerase. RNA origami structures provide a stable framework that can be decorated with functional RNA elements such as riboswitches, ribozymes, interaction sites, and aptamers for binding small molecules or protein targets. The rich library of RNA structural and functional elements combined with the possibility to attach proteins through aptamer-based binding creates virtually limitless possibilities for constructing advanced RNA-based nanodevices.In this chapter we provide a detailed protocol for the single-stranded RNA origami design method using a simple 2-helix tall structure as an example. The first step involves 3D modeling of a double-crossover between two RNA double helices, followed by decoration with tertiary motifs. The second step deals with the construction of a 2D blueprint describing the secondary structure and sequence constraints that serves as the input for computer programs. In the third step, computer programs are used to design RNA sequences that are compatible with the structure, and the resulting outputs are evaluated and converted into DNA sequences to order.
Fisher, Charles K.; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome. PMID:25054627
Functional RNA structures throughout the Hepatitis C Virus genome.
Adams, Rebecca L; Pirakitikulr, Nathan; Pyle, Anna Marie
2017-06-01
The single-stranded Hepatitis C Virus (HCV) genome adopts a set of elaborate RNA structures that are involved in every stage of the viral lifecycle. Recent advances in chemical probing, sequencing, and structural biology have facilitated analysis of RNA folding on a genome-wide scale, revealing novel structures and networks of interactions. These studies have underscored the active role played by RNA in every function of HCV and they open the door to new types of RNA-targeted therapeutics. Copyright © 2017 Elsevier B.V. All rights reserved.
Structural principles within the human-virus protein-protein interaction network
Franzosa, Eric A.; Xia, Yu
2011-01-01
General properties of the antagonistic biomolecular interactions between viruses and their hosts (exogenous interactions) remain poorly understood, and may differ significantly from known principles governing the cooperative interactions within the host (endogenous interactions). Systems biology approaches have been applied to study the combined interaction networks of virus and human proteins, but such efforts have so far revealed only low-resolution patterns of host-virus interaction. Here, we layer curated and predicted 3D structural models of human-virus and human-human protein complexes on top of traditional interaction networks to reconstruct the human-virus structural interaction network. This approach reveals atomic resolution, mechanistic patterns of host-virus interaction, and facilitates systematic comparison with the host’s endogenous interactions. We find that exogenous interfaces tend to overlap with and mimic endogenous interfaces, thereby competing with endogenous binding partners. The endogenous interfaces mimicked by viral proteins tend to participate in multiple endogenous interactions which are transient and regulatory in nature. While interface overlap in the endogenous network results largely from gene duplication followed by divergent evolution, viral proteins frequently achieve interface mimicry without any sequence or structural similarity to an endogenous binding partner. Finally, while endogenous interfaces tend to evolve more slowly than the rest of the protein surface, exogenous interfaces—including many sites of endogenous-exogenous overlap—tend to evolve faster, consistent with an evolutionary “arms race” between host and pathogen. These significant biophysical, functional, and evolutionary differences between host-pathogen and within-host protein-protein interactions highlight the distinct consequences of antagonism versus cooperation in biological networks. PMID:21680884
A coevolution analysis for identifying protein-protein interactions by Fourier transform.
Yin, Changchuan; Yau, Stephen S-T
2017-01-01
Protein-protein interactions (PPIs) play key roles in life processes, such as signal transduction, transcription regulations, and immune response, etc. Identification of PPIs enables better understanding of the functional networks within a cell. Common experimental methods for identifying PPIs are time consuming and expensive. However, recent developments in computational approaches for inferring PPIs from protein sequences based on coevolution theory avoid these problems. In the coevolution theory model, interacted proteins may show coevolutionary mutations and have similar phylogenetic trees. The existing coevolution methods depend on multiple sequence alignments (MSA); however, the MSA-based coevolution methods often produce high false positive interactions. In this paper, we present a computational method using an alignment-free approach to accurately detect PPIs and reduce false positives. In the method, protein sequences are numerically represented by biochemical properties of amino acids, which reflect the structural and functional differences of proteins. Fourier transform is applied to the numerical representation of protein sequences to capture the dissimilarities of protein sequences in biophysical context. The method is assessed for predicting PPIs in Ebola virus. The results indicate strong coevolution between the protein pairs (NP-VP24, NP-VP30, NP-VP40, VP24-VP30, VP24-VP40, and VP30-VP40). The method is also validated for PPIs in influenza and E.coli genomes. Since our method can reduce false positive and increase the specificity of PPI prediction, it offers an effective tool to understand mechanisms of disease pathogens and find potential targets for drug design. The Python programs in this study are available to public at URL (https://github.com/cyinbox/PPI).
Miras, Manuel; Rodríguez-Hernández, Ana M; Romero-López, Cristina; Berzal-Herranz, Alfredo; Colchero, Jaime; Aranda, Miguel A; Truniger, Verónica
2018-01-01
In eukaryotes, the formation of a 5'-cap and 3'-poly(A) dependent protein-protein bridge is required for translation of its mRNAs. In contrast, several plant virus RNA genomes lack both of these mRNA features, but instead have a 3'-CITE (for cap-independent translation enhancer), a RNA element present in their 3'-untranslated region that recruits translation initiation factors and is able to control its cap-independent translation. For several 3'-CITEs, direct RNA-RNA long-distance interactions based on sequence complementarity between the 5'- and 3'-ends are required for efficient translation, as they bring the translation initiation factors bound to the 3'-CITE to the 5'-end. For the carmovirus melon necrotic spot virus (MNSV), a 3'-CITE has been identified, and the presence of its 5'-end in cis has been shown to be required for its activity. Here, we analyze the secondary structure of the 5'-end of the MNSV RNA genome and identify two highly conserved nucleotide sequence stretches that are complementary to the apical loop of its 3'-CITE. In in vivo cap-independent translation assays with mutant constructs, by disrupting and restoring sequence complementarity, we show that the interaction between the 3'-CITE and at least one complementary sequence in the 5'-end is essential for virus RNA translation, although efficient virus translation and multiplication requires both connections. The complementary sequence stretches are invariant in all MNSV isolates, suggesting that the dual 5'-3' RNA:RNA interactions are required for optimal MNSV cap-independent translation and multiplication.
A coevolution analysis for identifying protein-protein interactions by Fourier transform
Yin, Changchuan; Yau, Stephen S. -T.
2017-01-01
Protein-protein interactions (PPIs) play key roles in life processes, such as signal transduction, transcription regulations, and immune response, etc. Identification of PPIs enables better understanding of the functional networks within a cell. Common experimental methods for identifying PPIs are time consuming and expensive. However, recent developments in computational approaches for inferring PPIs from protein sequences based on coevolution theory avoid these problems. In the coevolution theory model, interacted proteins may show coevolutionary mutations and have similar phylogenetic trees. The existing coevolution methods depend on multiple sequence alignments (MSA); however, the MSA-based coevolution methods often produce high false positive interactions. In this paper, we present a computational method using an alignment-free approach to accurately detect PPIs and reduce false positives. In the method, protein sequences are numerically represented by biochemical properties of amino acids, which reflect the structural and functional differences of proteins. Fourier transform is applied to the numerical representation of protein sequences to capture the dissimilarities of protein sequences in biophysical context. The method is assessed for predicting PPIs in Ebola virus. The results indicate strong coevolution between the protein pairs (NP-VP24, NP-VP30, NP-VP40, VP24-VP30, VP24-VP40, and VP30-VP40). The method is also validated for PPIs in influenza and E.coli genomes. Since our method can reduce false positive and increase the specificity of PPI prediction, it offers an effective tool to understand mechanisms of disease pathogens and find potential targets for drug design. The Python programs in this study are available to public at URL (https://github.com/cyinbox/PPI). PMID:28430779
High-resolution structures of a heterochiral coiled coil
Mortenson, David E.; Steinkruger, Jay D.; Kreitler, Dale F.; ...
2015-10-12
Interactions between polypeptide chains containing amino acid residues with opposite absolute configurations have long been a source of interest and speculation, but there is very little structural information for such heterochiral associations. The need to address this lacuna has grown in recent years because of increasing interest in the use of peptides generated from D amino acids (D peptides) as specific ligands for natural proteins, e.g., to inhibit deleterious protein–protein interactions. Coiled–coil interactions, between or among α-helices, represent the most common tertiary and quaternary packing motif in proteins. Heterochiral coiled–coil interactions were predicted over 50 years ago by Crick, andmore » limited experimental data obtained in solution suggest that such interactions can indeed occur. To address the dearth of atomic-level structural characterization of heterochiral helix pairings, we report in this paper two independent crystal structures that elucidate coiled-coil packing between L- and D-peptide helices. Both structures resulted from racemic crystallization of a peptide corresponding to the transmembrane segment of the influenza M2 protein. Networks of canonical knobs-into-holes side-chain packing interactions are observed at each helical interface. Finally, however, the underlying patterns for these heterochiral coiled coils seem to deviate from the heptad sequence repeat that is characteristic of most homochiral analogs, with an apparent preference for a hendecad repeat pattern.« less
2014-01-01
Background Osteopontin (Eta, secreted sialoprotein 1, opn) is secreted from different cell types including cancer cells. Three splice variant forms namely osteopontin-a, osteopontin-b and osteopontin-c have been identified. The main astonishing feature is that osteopontin-c is found to be elevated in almost all types of cancer cells. This was the vital point to consider it for sequence analysis and structure predictions which provide ample chances for prognostic, therapeutic and preventive cancer research. Methods Osteopontin-c gene sequence was determined from Breast Cancer sample and was translated to protein sequence. It was then analyzed using various software and web tools for binding pockets, docking and druggability analysis. Due to the lack of homological templates, tertiary structure was predicted using ab-initio method server – I-TASSER and was evaluated after refinement using web tools. Refined structure was compared with known bone sialoprotein electron microscopic structure and docked with CD44 for binding analysis and binding pockets were identified for drug designing. Results Signal sequence of about sixteen amino acid residues was identified using signal sequence prediction servers. Due to the absence of known structures of similar proteins, three dimensional structure of osteopontin-c was predicted using I-TASSER server. The predicted structure was refined with the help of SUMMA server and was validated using SAVES server. Molecular dynamic analysis was carried out using GROMACS software. The final model was built and was used for docking with CD44. Druggable pockets were identified using pocket energies. Conclusions The tertiary structure of osteopontin-c was predicted successfully using the ab-initio method and the predictions showed that osteopontin-c is of fibrous nature comparable to firbronectin. Docking studies showed the significant similarities of QSAET motif in the interaction of CD44 and osteopontins between the normal and splice variant forms of osteopontins and binding pockets analyses revealed several pockets which paved the way to the identification of a druggable pocket. PMID:24401206
Propensities of peptides containing the Asn-Gly segment to form β-turn and β-hairpin structures.
Kang, Young Kee; Yoo, In Kee
2016-09-01
The propensities of peptides that contain the Asn-Gly segment to form β-turn and β-hairpin structures were explored using the density functional methods and the implicit solvation model in CH2 Cl2 and water. The populations of preferred β-turn structures varied depending on the sequence and solvent polarity. In solution, β-hairpin structures with βI' turn motifs were most preferred for the heptapeptides containing the Asn-Gly segment regardless of the sequence of the strands. These preferences in solution are consistent with the corresponding X-ray structures. The sequence, H-bond strengths, solvent polarity, and conformational flexibility appeared to interact to determine the preferred β-hairpin structure of each heptapeptide, although the β-turn segments played a role in promoting the formation of β-hairpin structures and the β-hairpin propensity varied. In the heptapeptides containing the Asn-Gly segment, the β-hairpin formation was enthalpically favored and entropically disfavored at 25°C in water. The calculated results for β-turns and β-hairpins containing the Asn-Gly segment imply that these structural preferences may be useful for the design of bioactive macrocyclic peptides containing β-hairpin mimics and the design of binding epitopes for protein-protein and protein-nucleic acid recognitions. © 2016 Wiley Periodicals, Inc. Biopolymers 105: 653-664, 2016. © 2016 Wiley Periodicals, Inc.
Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences
NASA Astrophysics Data System (ADS)
Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou
2017-01-01
Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.
Sequence specificity of single-stranded DNA-binding proteins: a novel DNA microarray approach
Morgan, Hugh P.; Estibeiro, Peter; Wear, Martin A.; Max, Klaas E.A.; Heinemann, Udo; Cubeddu, Liza; Gallagher, Maurice P.; Sadler, Peter J.; Walkinshaw, Malcolm D.
2007-01-01
We have developed a novel DNA microarray-based approach for identification of the sequence-specificity of single-stranded nucleic-acid-binding proteins (SNABPs). For verification, we have shown that the major cold shock protein (CspB) from Bacillus subtilis binds with high affinity to pyrimidine-rich sequences, with a binding preference for the consensus sequence, 5′-GTCTTTG/T-3′. The sequence was modelled onto the known structure of CspB and a cytosine-binding pocket was identified, which explains the strong preference for a cytosine base at position 3. This microarray method offers a rapid high-throughput approach for determining the specificity and strength of ss DNA–protein interactions. Further screening of this newly emerging family of transcription factors will help provide an insight into their cellular function. PMID:17488853
Nandi, Sandip Kumar; Chakraborty, Ayon; Panda, Alok Kumar; Ray, Sougata Sinha; Kar, Rajiv Kumar; Bhunia, Anirban; Biswas, Ashis
2015-03-01
Adenosine-5'-triphosphate (ATP) is an important phosphate metabolite abundantly found in Mycobacterium leprae bacilli. This pathogen does not derive ATP from its host but has its own mechanism for the generation of ATP. Interestingly, this molecule as well as several antigenic proteins act as bio-markers for the detection of leprosy. One such bio-marker is the 18 kDa antigen. This 18 kDa antigen is a small heat shock protein (HSP18) whose molecular chaperone function is believed to help in the growth and survival of the pathogen. But, no evidences of interaction of ATP with HSP18 and its effect on the structure and chaperone function of HSP18 are available in the literature. Here, we report for the first time evidences of "HSP18-ATP" interaction and its consequences on the structure and chaperone function of HSP18. TNP-ATP binding experiment and surface plasmon resonance measurement showed that HSP18 interacts with ATP with a sub-micromolar binding affinity. Comparative sequence alignment between M. leprae HSP18 and αB-crystallin identified the sequence 49KADSLDIDIE58 of HSP18 as the Walker-B ATP binding motif. Molecular docking studies revealed that β4-β8 groove/strands as an ATP interactive region in M. leprae HSP18. ATP perturbs the tertiary structure of HSP18 mildly and makes it less susceptible towards tryptic cleavage. ATP triggers exposure of additional hydrophobic patches at the surface of HSP18 and induces more stability against chemical and thermal denaturation. In vitro aggregation and thermal inactivation assays clearly revealed that ATP enhances the chaperone function of HSP18. Our studies also revealed that the alteration in the chaperone function of HSP18 is reversible and is independent of ATP hydrolysis. As the availability and binding of ATP to HSP18 regulates its chaperone function, this functional inflection may play an important role in the survival of M. leprae in hosts.
RNA2DMut: a web tool for the design and analysis of RNA structure mutations.
Moss, Walter N
2018-03-01
With the widespread application of high-throughput sequencing, novel RNA sequences are being discovered at an astonishing rate. The analysis of function, however, lags behind. In both the cis - and trans -regulatory functions of RNA, secondary structure (2D base-pairing) plays essential regulatory roles. In order to test RNA function, it is essential to be able to design and analyze mutations that can affect structure. This was the motivation for the creation of the RNA2DMut web tool. With RNA2DMut, users can enter in RNA sequences to analyze, constrain mutations to specific residues, or limit changes to purines/pyrimidines. The sequence is analyzed at each base to determine the effect of every possible point mutation on 2D structure. The metrics used in RNA2DMut rely on the calculation of the Boltzmann structure ensemble and do not require a robust 2D model of RNA structure for designing mutations. This tool can facilitate a wide array of uses involving RNA: for example, in designing and evaluating mutants for biological assays, interrogating RNA-protein interactions, identifying key regions to alter in SELEX experiments, and improving RNA folding and crystallization properties for structural biology. Additional tools are available to help users introduce other mutations (e.g., indels and substitutions) and evaluate their effects on RNA structure. Example calculations are shown for five RNAs that require 2D structure for their function: the MALAT1 mascRNA, an influenza virus splicing regulatory motif, the EBER2 viral noncoding RNA, the Xist lncRNA repA region, and human Y RNA 5. RNA2DMut can be accessed at https://rna2dmut.bb.iastate.edu/. © 2018 Moss; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
The Influence of Task-Irrelevant Music on Language Processing: Syntactic and Semantic Structures
Hoch, Lisianne; Poulin-Charronnat, Benedicte; Tillmann, Barbara
2011-01-01
Recent research has suggested that music and language processing share neural resources, leading to new hypotheses about interference in the simultaneous processing of these two structures. The present study investigated the effect of a musical chord's tonal function on syntactic processing (Experiment 1) and semantic processing (Experiment 2) using a cross-modal paradigm and controlling for acoustic differences. Participants read sentences and performed a lexical decision task on the last word, which was, syntactically or semantically, expected or unexpected. The simultaneously presented (task-irrelevant) musical sequences ended on either an expected tonic or a less-expected subdominant chord. Experiment 1 revealed interactive effects between music-syntactic and linguistic-syntactic processing. Experiment 2 showed only main effects of both music-syntactic and linguistic-semantic expectations. An additional analysis over the two experiments revealed that linguistic violations interacted with musical violations, though not differently as a function of the type of linguistic violations. The present findings were discussed in light of currently available data on the processing of music as well as of syntax and semantics in language, leading to the hypothesis that resources might be shared for structural integration processes and sequencing. PMID:21713122
High taxonomic variability despite stable functional structure across microbial communities.
Louca, Stilianos; Jacques, Saulo M S; Pires, Aliny P F; Leal, Juliana S; Srivastava, Diane S; Parfrey, Laura Wegener; Farjalla, Vinicius F; Doebeli, Michael
2016-12-05
Understanding the processes that are driving variation of natural microbial communities across space or time is a major challenge for ecologists. Environmental conditions strongly shape the metabolic function of microbial communities; however, other processes such as biotic interactions, random demographic drift or dispersal limitation may also influence community dynamics. The relative importance of these processes and their effects on community function remain largely unknown. To address this uncertainty, here we examined bacterial and archaeal communities in replicate 'miniature' aquatic ecosystems contained within the foliage of wild bromeliads. We used marker gene sequencing to infer the taxonomic composition within nine metabolic functional groups, and shotgun environmental DNA sequencing to estimate the relative abundances of these groups. We found that all of the bromeliads exhibited remarkably similar functional community structures, but that the taxonomic composition within individual functional groups was highly variable. Furthermore, using statistical analyses, we found that non-neutral processes, including environmental filtering and potentially biotic interactions, at least partly shaped the composition within functional groups and were more important than spatial dispersal limitation and demographic drift. Hence both the functional structure and taxonomic composition within functional groups of natural microbial communities may be shaped by non-neutral and roughly separate processes.
Saturation scanning of ubiquitin variants reveals a common hot spot for binding to USP2 and USP21.
Leung, Isabel; Dekel, Ayelet; Shifman, Julia M; Sidhu, Sachdev S
2016-08-02
A detailed understanding of the molecular mechanisms whereby ubiquitin (Ub) recognizes enzymes in the Ub proteasome system is crucial for understanding the biological function of Ub. Many structures of Ub complexes have been solved and, in most cases, reveal a large structural epitope on a common face of the Ub molecule. However, owing to the generally weak nature of these interactions, it has been difficult to map in detail the functional contributions of individual Ub side chains to affinity and specificity. Here we took advantage of Ub variants (Ubvs) that bind tightly to particular Ub-specific proteases (USPs) and used phage display and saturation scanning mutagenesis to comprehensively map functional epitopes within the structural epitopes. We found that Ubvs that bind to USP2 or USP21 contain a remarkably similar core functional epitope, or "hot spot," consisting mainly of positions that are conserved as the wild type sequence, but also some positions that prefer mutant sequences. The Ubv core functional epitope contacts residues that are conserved in the human USP family, and thus it is likely important for the interactions of Ub across many family members.
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
Functional display of platelet-binding VWF fragments on filamentous bacteriophage.
Yee, Andrew; Tan, Fen-Lai; Ginsburg, David
2013-01-01
von Willebrand factor (VWF) tethers platelets to sites of vascular injury via interaction with the platelet surface receptor, GPIb. To further define the VWF sequences required for VWF-platelet interaction, a phage library displaying random VWF protein fragments was screened against formalin-fixed platelets. After 3 rounds of affinity selection, DNA sequencing of platelet-bound clones identified VWF peptides mapping exclusively to the A1 domain. Aligning these sequences defined a minimal, overlapping segment spanning P1254-A1461, which encompasses the C1272-C1458 cystine loop. Analysis of phage carrying a mutated A1 segment (C1272/1458A) confirmed the requirement of the cystine loop for optimal binding. Four rounds of affinity maturation of a randomly mutagenized A1 phage library identified 10 and 14 unique mutants associated with enhanced platelet binding in the presence and absence of botrocetin, respectively, with 2 mutants (S1370G and I1372V) common to both conditions. These results demonstrate the utility of filamentous phage for studying VWF protein structure-function and identify a minimal, contiguous peptide that bind to formalin-fixed platelets, confirming the importance of the VWF A1 domain with no evidence for another independently platelet-binding segment within VWF. These findings also point to key structural elements within the A1 domain that regulate VWF-platelet adhesion.
Benyo, B; Biro, J C; Benyo, Z
2004-01-01
The theory of "codon-amino acid coevolution" was first proposed by Woese in 1967. It suggests that there is a stereochemical matching - that is, affinity - between amino acids and certain of the base triplet sequences that code for those amino acids. We have constructed a common periodic table of codons and amino acids, where the nucleic acid table showed perfect axial symmetry for codons and the corresponding amino acid table also displayed periodicity regarding the biochemical properties (charge and hydrophobicity) of the 20 amino acids and the position of the stop signals. The table indicates that the middle (2/sup nd/) amino acid in the codon has a prominent role in determining some of the structural features of the amino acids. The possibility that physical contact between codons and amino acids might exist was tested on restriction enzymes. Many recognition site-like sequences were found in the coding sequences of these enzymes and as many as 73 examples of codon-amino acid co-location were observed in the 7 known 3D structures (December 2003) of endonuclease-nucleic acid complexes. These results indicate that the smallest possible units of specific nucleic acid-protein interaction are indeed the stereochemically compatible codons and amino acids.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Yuanyuan; Wang, Shuo; Holcomb, Joshua
2014-04-04
Highlights: • CXCR2–NHERF1–PLCβ3 complex regulates CXCR2 signaling in pancreatic cancer. • The crystal structure of the NHERF1 PDZ1 domain in complex with PLCβ3. • The structure reveals specificity determinants of PDZ1–PLCβ3 interaction. • Endogenous PLCβ3 in pancreatic cancer cells interacts with both PDZ1 and PDZ2. • Structural basis of the PDZ1–PLCβ3 interaction is valuable in selective drug design. - Abstract: The formation of CXCR2–NHERF1–PLCβ3 macromolecular complex in pancreatic cancer cells regulates CXCR2 signaling activity and plays an important role in tumor proliferation and invasion. We previously have shown that disruption of the NHERF1-mediated CXCR2–PLCβ3 interaction abolishes the CXCR2 signaling cascademore » and inhibits pancreatic tumor growth in vitro and in vivo. Here we report the crystal structure of the NHERF1 PDZ1 domain in complex with the C-terminal PLCβ3 sequence. The structure reveals that the PDZ1–PLCβ3 binding specificity is achieved by numerous hydrogen bonds and hydrophobic contacts with the last four PLCβ3 residues contributing to specific interactions. We also show that PLCβ3 can bind both NHERF1 PDZ1 and PDZ2 in pancreatic cancer cells, consistent with the observation that the peptide binding pockets of these PDZ domains are highly structurally conserved. This study provides an understanding of the structural basis for the PDZ-mediated NHERF1–PLCβ3 interaction that could prove valuable in selective drug design against CXCR2-related cancers.« less
Cheng, Yuan; Koh, Leng-Duei; Wang, Fan; Li, Dechang; Ji, Baohua; Yeo, Jingjie; Guan, Guijian; Han, Ming-Yong; Zhang, Yong-Wei
2017-07-06
Hybrid structures of nanomaterials (e.g. tubes, scrolls, threads, cages) and biomaterials (e.g. proteins) hold tremendous potential for applications as drug carriers, biosensors, tissue scaffolds, cancer therapeutic agents, etc. However, in many cases, the interacting forces at the nano-bio interfaces and their roles in controlling the structures and dynamics of nano-bio-hybrid systems are very complicated but poorly understood. In this study, we investigate the structure and mechanical behavior of a protein-based hybrid structure, i.e., a carbon nanoscroll (CNS)-silk crystallite with a hydration level controllable by an interlayer interaction in CNS. Our findings demonstrate that CNS with a reduced core size not only shields the crystallite from a weakening effect of water, but also markedly strengthens the crystallite. Besides water shielding, the enhanced strength arises from an enhanced interaction between the crystallite and CNS due to the enhanced interlayer interaction in CNS. In addition, the interfacial strength for pulling the crystallite out of the CNS-silk structure is found to be dependent on both the interlayer interaction energy in CNS as well as the sequence of protein at the CNS-silk interface. The present study is of significant value in designing drugs or protein delivery vehicles for biomedical applications, and serves as a general guide in designing novel devices based on rolled-up configurations of two-dimensional (2D) materials.
Tree-Structured Digital Organisms Model
NASA Astrophysics Data System (ADS)
Suzuki, Teruhiko; Nobesawa, Shiho; Tahara, Ikuo
Tierra and Avida are well-known models of digital organisms. They describe a life process as a sequence of computation codes. A linear sequence model may not be the only way to describe a digital organism, though it is very simple for a computer-based model. Thus we propose a new digital organism model based on a tree structure, which is rather similar to the generic programming. With our model, a life process is a combination of various functions, as if life in the real world is. This implies that our model can easily describe the hierarchical structure of life, and it can simulate evolutionary computation through mutual interaction of functions. We verified our model by simulations that our model can be regarded as a digital organism model according to its definitions. Our model even succeeded in creating species such as viruses and parasites.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hiraiwa, Akikazu; Yamanaka, Katsuo; Kwok, W.W.
Although HLA genes have been shown to be associated with certain diseases, the basis for this association is unknown. Recent studies, however, have documented patterns of nucleotide sequence variation among some HLA genes associated with a particular disease. For rheumatoid arthritis, HLA genes in most patients have a shared nucleotide sequence encoding a key structural element of an HLA class II polypeptide; this sequence element is critical for the interaction of the HLA molecule with antigenic peptides and with responding T cells, suggestive of a direct role for this sequence element in disease susceptibility. The authors describe the serological andmore » cellular immunologic characteristics encoded by this rheumatoid arthritis-associated sequence element. Site-directed mutagenesis of the DRB1 gene was used to define amino acids critical for antibody and T-cell recognition of this structural element, focusing on residues that distinguish the rheumatoid arthritis-associated alleles Dw4 and Dw14 from a closely related allele, Dw10, not associated with disease. Both the gain and loss of rheumatoid arthritis-associated epitopes were highly dependent on three residues within a discrete domain of the HLA-DR molecule. Recognition was most strongly influenced by the following amino acids (in order): 70 > 71 > 67. Some alloreactive T-cell clones were also influenced by amino acid variation in portions of the DR molecule lying outside the shared sequence element.« less
Wang, Lei; You, Zhu-Hong; Chen, Xing; Yan, Xin; Liu, Gang; Zhang, Wei
2018-01-01
Identification of interaction between drugs and target proteins plays an important role in discovering new drug candidates. However, through the experimental method to identify the drug-target interactions remain to be extremely time-consuming, expensive and challenging even nowadays. Therefore, it is urgent to develop new computational methods to predict potential drugtarget interactions (DTI). In this article, a novel computational model is developed for predicting potential drug-target interactions under the theory that each drug-target interaction pair can be represented by the structural properties from drugs and evolutionary information derived from proteins. Specifically, the protein sequences are encoded as Position-Specific Scoring Matrix (PSSM) descriptor which contains information of biological evolutionary and the drug molecules are encoded as fingerprint feature vector which represents the existence of certain functional groups or fragments. Four benchmark datasets involving enzymes, ion channels, GPCRs and nuclear receptors, are independently used for establishing predictive models with Rotation Forest (RF) model. The proposed method achieved the prediction accuracy of 91.3%, 89.1%, 84.1% and 71.1% for four datasets respectively. In order to make our method more persuasive, we compared our classifier with the state-of-theart Support Vector Machine (SVM) classifier. We also compared the proposed method with other excellent methods. Experimental results demonstrate that the proposed method is effective in the prediction of DTI, and can provide assistance for new drug research and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Universality Classes of Interaction Structures for NK Fitness Landscapes
NASA Astrophysics Data System (ADS)
Hwang, Sungmin; Schmiegelt, Benjamin; Ferretti, Luca; Krug, Joachim
2018-07-01
Kauffman's NK-model is a paradigmatic example of a class of stochastic models of genotypic fitness landscapes that aim to capture generic features of epistatic interactions in multilocus systems. Genotypes are represented as sequences of L binary loci. The fitness assigned to a genotype is a sum of contributions, each of which is a random function defined on a subset of k ≤ L loci. These subsets or neighborhoods determine the genetic interactions of the model. Whereas earlier work on the NK model suggested that most of its properties are robust with regard to the choice of neighborhoods, recent work has revealed an important and sometimes counter-intuitive influence of the interaction structure on the properties of NK fitness landscapes. Here we review these developments and present new results concerning the number of local fitness maxima and the statistics of selectively accessible (that is, fitness-monotonic) mutational pathways. In particular, we develop a unified framework for computing the exponential growth rate of the expected number of local fitness maxima as a function of L, and identify two different universality classes of interaction structures that display different asymptotics of this quantity for large k. Moreover, we show that the probability that the fitness landscape can be traversed along an accessible path decreases exponentially in L for a large class of interaction structures that we characterize as locally bounded. Finally, we discuss the impact of the NK interaction structures on the dynamics of evolution using adaptive walk models.
Universality Classes of Interaction Structures for NK Fitness Landscapes
NASA Astrophysics Data System (ADS)
Hwang, Sungmin; Schmiegelt, Benjamin; Ferretti, Luca; Krug, Joachim
2018-02-01
Kauffman's NK-model is a paradigmatic example of a class of stochastic models of genotypic fitness landscapes that aim to capture generic features of epistatic interactions in multilocus systems. Genotypes are represented as sequences of L binary loci. The fitness assigned to a genotype is a sum of contributions, each of which is a random function defined on a subset of k ≤ L loci. These subsets or neighborhoods determine the genetic interactions of the model. Whereas earlier work on the NK model suggested that most of its properties are robust with regard to the choice of neighborhoods, recent work has revealed an important and sometimes counter-intuitive influence of the interaction structure on the properties of NK fitness landscapes. Here we review these developments and present new results concerning the number of local fitness maxima and the statistics of selectively accessible (that is, fitness-monotonic) mutational pathways. In particular, we develop a unified framework for computing the exponential growth rate of the expected number of local fitness maxima as a function of L, and identify two different universality classes of interaction structures that display different asymptotics of this quantity for large k. Moreover, we show that the probability that the fitness landscape can be traversed along an accessible path decreases exponentially in L for a large class of interaction structures that we characterize as locally bounded. Finally, we discuss the impact of the NK interaction structures on the dynamics of evolution using adaptive walk models.
modPDZpep: a web resource for structure based analysis of human PDZ-mediated interaction networks.
Sain, Neetu; Mohanty, Debasisa
2016-09-21
PDZ domains recognize short sequence stretches usually present in C-terminal of their interaction partners. Because of the involvement of PDZ domains in many important biological processes, several attempts have been made for developing bioinformatics tools for genome-wide identification of PDZ interaction networks. Currently available tools for prediction of interaction partners of PDZ domains utilize machine learning approach. Since, they have been trained using experimental substrate specificity data for specific PDZ families, their applicability is limited to PDZ families closely related to the training set. These tools also do not allow analysis of PDZ-peptide interaction interfaces. We have used a structure based approach to develop modPDZpep, a program to predict the interaction partners of human PDZ domains and analyze structural details of PDZ interaction interfaces. modPDZpep predicts interaction partners by using structural models of PDZ-peptide complexes and evaluating binding energy scores using residue based statistical pair potentials. Since, it does not require training using experimental data on peptide binding affinity, it can predict substrates for diverse PDZ families. Because of the use of simple scoring function for binding energy, it is also fast enough for genome scale structure based analysis of PDZ interaction networks. Benchmarking using artificial as well as real negative datasets indicates good predictive power with ROC-AUC values in the range of 0.7 to 0.9 for a large number of human PDZ domains. Another novel feature of modPDZpep is its ability to map novel PDZ mediated interactions in human protein-protein interaction networks, either by utilizing available experimental phage display data or by structure based predictions. In summary, we have developed modPDZpep, a web-server for structure based analysis of human PDZ domains. It is freely available at http://www.nii.ac.in/modPDZpep.html or http://202.54.226.235/modPDZpep.html . This article was reviewed by Michael Gromiha and Zoltán Gáspári.
Kim, Doyoun; San, Boi Hoa; Moh, Sang Hyun; Park, Hyejin; Kim, Dong Young; Lee, Sangho; Kim, Kyeong Kyu
2010-01-01
Regulated cytosolic proteolysis is one of the key cellular processes ensuring proper functioning of a cell. M42 family proteases show a broad spectrum of substrate specificities, but the structural basis for such diversity of the substrate specificities is lagging behind biochemical data. Here we report the crystal structure of PepA from Streptococcus pneumoniae, a glutamyl aminopeptidase belonging to M42 family (SpPepA). We found that Arg-257 in the substrate binding pocket is strategically positioned so that Arg-257 can make electrostatic interactions with the acidic residue of a substrate at its N-terminus. Structural comparison of the substrate binding pocket of the M42 family proteases, along with the structure-based multiple sequence alignment, argues that the appropriate electrostatic interactions contribute to the selective substrate specificity of SpPepA. Copyright 2009 Elsevier Inc. All rights reserved.
Grison, Claire M.; Miles, Jennifer A.; Robin, Sylvie
2016-01-01
Abstract A major current challenge in bioorganic chemistry is the identification of effective mimics of protein secondary structures that act as inhibitors of protein–protein interactions (PPIs). In this work, trans‐2‐aminocyclobutanecarboxylic acid (tACBC) was used as the key β‐amino acid component in the design of α/β/γ‐peptides to structurally mimic a native α‐helix. Suitably functionalized α/β/γ‐peptides assume an α‐helix‐mimicking 12,13‐helix conformation in solution, exhibit enhanced proteolytic stability in comparison to the wild‐type α‐peptide parent sequence from which they are derived, and act as selective inhibitors of the p53/hDM2 interaction. PMID:27467859
Grison, Claire M; Miles, Jennifer A; Robin, Sylvie; Wilson, Andrew J; Aitken, David J
2016-09-05
A major current challenge in bioorganic chemistry is the identification of effective mimics of protein secondary structures that act as inhibitors of protein-protein interactions (PPIs). In this work, trans-2-aminocyclobutanecarboxylic acid (tACBC) was used as the key β-amino acid component in the design of α/β/γ-peptides to structurally mimic a native α-helix. Suitably functionalized α/β/γ-peptides assume an α-helix-mimicking 12,13-helix conformation in solution, exhibit enhanced proteolytic stability in comparison to the wild-type α-peptide parent sequence from which they are derived, and act as selective inhibitors of the p53/hDM2 interaction. © 2016 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Scoring functions for protein-protein interactions.
Moal, Iain H; Moretti, Rocco; Baker, David; Fernández-Recio, Juan
2013-12-01
The computational evaluation of protein-protein interactions will play an important role in organising the wealth of data being generated by high-throughput initiatives. Here we discuss future applications, report recent developments and identify areas requiring further investigation. Many functions have been developed to quantify the structural and energetic properties of interacting proteins, finding use in interrelated challenges revolving around the relationship between sequence, structure and binding free energy. These include loop modelling, side-chain refinement, docking, multimer assembly, affinity prediction, affinity change upon mutation, hotspots location and interface design. Information derived from models optimised for one of these challenges can be used to benefit the others, and can be unified within the theoretical frameworks of multi-task learning and Pareto-optimal multi-objective learning. Copyright © 2013 Elsevier Ltd. All rights reserved.
Ionic protein-lipid interaction at the plasma membrane: what can the charge do?
Li, Lunyi; Shi, Xiaoshan; Guo, Xingdong; Li, Hua; Xu, Chenqi
2014-03-01
Phospholipids are the major components of cell membranes, but they have functional roles beyond forming lipid bilayers. In particular, acidic phospholipids form microdomains in the plasma membrane and can ionically interact with proteins via polybasic sequences, which can have functional consequences for the protein. The list of proteins regulated by ionic protein-lipid interaction has been quickly expanding, and now includes membrane proteins, cytoplasmic soluble proteins, and viral proteins. Here we review how acidic phospholipids in the plasma membrane regulate protein structure and function via ionic interactions, and how Ca(2+) regulates ionic protein-lipid interactions via direct and indirect mechanisms. Copyright © 2014 Elsevier Ltd. All rights reserved.
Fields, Peter D.; Bourgeois, Yann; Du Pasquier, Louis; Ebert, Dieter
2017-01-01
Negative frequency-dependent selection (NFDS) is an evolutionary mechanism suggested to govern host-parasite coevolution and the maintenance of genetic diversity at host resistance loci, such as the vertebrate MHC and R-genes in plants. Matching-allele interactions of hosts and parasites that prevent the emergence of host and parasite genotypes that are universally resistant and infective are a genetic mechanism predicted to underpin NFDS. The underlying genetics of matching-allele interactions are unknown even in host-parasite systems with empirical support for coevolution by NFDS, as is the case for the planktonic crustacean Daphnia magna and the bacterial pathogen Pasteuria ramosa. We fine-map one locus associated with D. magna resistance to P. ramosa and genetically characterize two haplotypes of the Pasteuria resistance (PR-) locus using de novo genome and transcriptome sequencing. Sequence comparison of PR-locus haplotypes finds dramatic structural polymorphisms between PR-locus haplotypes including a large portion of each haplotype being composed of non-homologous sequences resulting in haplotypes differing in size by 66 kb. The high divergence of PR-locus haplotypes suggest a history of multiple, diverse and repeated instances of structural mutation events and restricted recombination. Annotation of the haplotypes reveals striking differences in gene content. In particular, a group of glycosyltransferase genes that is present in the susceptible but absent in the resistant haplotype. Moreover, in natural populations, we find that the PR-locus polymorphism is associated with variation in resistance to different P. ramosa genotypes, pointing to the PR-locus polymorphism as being responsible for the matching-allele interactions that have been previously described for this system. Our results conclusively identify a genetic basis for the matching-allele interaction observed in a coevolving host-parasite system and provide a first insight into its molecular basis. PMID:28222092
Bento, Gilberto; Routtu, Jarkko; Fields, Peter D; Bourgeois, Yann; Du Pasquier, Louis; Ebert, Dieter
2017-02-01
Negative frequency-dependent selection (NFDS) is an evolutionary mechanism suggested to govern host-parasite coevolution and the maintenance of genetic diversity at host resistance loci, such as the vertebrate MHC and R-genes in plants. Matching-allele interactions of hosts and parasites that prevent the emergence of host and parasite genotypes that are universally resistant and infective are a genetic mechanism predicted to underpin NFDS. The underlying genetics of matching-allele interactions are unknown even in host-parasite systems with empirical support for coevolution by NFDS, as is the case for the planktonic crustacean Daphnia magna and the bacterial pathogen Pasteuria ramosa. We fine-map one locus associated with D. magna resistance to P. ramosa and genetically characterize two haplotypes of the Pasteuria resistance (PR-) locus using de novo genome and transcriptome sequencing. Sequence comparison of PR-locus haplotypes finds dramatic structural polymorphisms between PR-locus haplotypes including a large portion of each haplotype being composed of non-homologous sequences resulting in haplotypes differing in size by 66 kb. The high divergence of PR-locus haplotypes suggest a history of multiple, diverse and repeated instances of structural mutation events and restricted recombination. Annotation of the haplotypes reveals striking differences in gene content. In particular, a group of glycosyltransferase genes that is present in the susceptible but absent in the resistant haplotype. Moreover, in natural populations, we find that the PR-locus polymorphism is associated with variation in resistance to different P. ramosa genotypes, pointing to the PR-locus polymorphism as being responsible for the matching-allele interactions that have been previously described for this system. Our results conclusively identify a genetic basis for the matching-allele interaction observed in a coevolving host-parasite system and provide a first insight into its molecular basis.
Quantitative theory of hydrophobic effect as a driving force of protein structure
Perunov, Nikolay; England, Jeremy L
2014-01-01
Various studies suggest that the hydrophobic effect plays a major role in driving the folding of proteins. In the past, however, it has been challenging to translate this understanding into a predictive, quantitative theory of how the full pattern of sequence hydrophobicity in a protein shapes functionally important features of its tertiary structure. Here, we extend and apply such a phenomenological theory of the sequence-structure relationship in globular protein domains, which had previously been applied to the study of allosteric motion. In an effort to optimize parameters for the model, we first analyze the patterns of backbone burial found in single-domain crystal structures, and discover that classic hydrophobicity scales derived from bulk physicochemical properties of amino acids are already nearly optimal for prediction of burial using the model. Subsequently, we apply the model to studying structural fluctuations in proteins and establish a means of identifying ligand-binding and protein–protein interaction sites using this approach. PMID:24408023
Structural Basis for Sialoglycan Binding by the Streptococcus sanguinis SrpA Adhesin.
Bensing, Barbara A; Loukachevitch, Lioudmila V; McCulloch, Kathryn M; Yu, Hai; Vann, Kendra R; Wawrzak, Zdzislaw; Anderson, Spencer; Chen, Xi; Sullam, Paul M; Iverson, T M
2016-04-01
Streptococcus sanguinisis a leading cause of infective endocarditis, a life-threatening infection of the cardiovascular system. An important interaction in the pathogenesis of infective endocarditis is attachment of the organisms to host platelets.S. sanguinisexpresses a serine-rich repeat adhesin, SrpA, similar in sequence to platelet-binding adhesins associated with increased virulence in this disease. In this study, we determined the first crystal structure of the putative binding region of SrpA (SrpABR) both unliganded and in complex with a synthetic disaccharide ligand at 1.8 and 2.0 Å resolution, respectively. We identified a conserved Thr-Arg motif that orients the sialic acid moiety and is required for binding to platelet monolayers. Furthermore, we propose that sequence insertions in closely related family members contribute to the modulation of structural and functional properties, including the quaternary structure, the tertiary structure, and the ligand-binding site. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Maranhão, Paulo A C; Teixeira, Claudener S; Sousa, Bruno L; Barroso-Neto, Ito L; Monteiro-Júnior, José E; Fernandes, Andreia V; Ramos, Marcio V; Vasconcelos, Ilka M; Gonçalves, José F C; Rocha, Bruno A M; Freire, Valder N; Grangeiro, Thalles B
2017-07-01
The genus Swartzia is a member of the tribe Swartzieae, whose genera constitute the living descendants of one of the early branches of the papilionoid legumes. Legume lectins comprise one of the main families of structurally and evolutionarily related carbohydrate-binding proteins of plant origin. However, these proteins have been poorly investigated in Swartzia and to date, only the lectin from S. laevicarpa seeds (SLL) has been purified. Moreover, no sequence information is known from lectins of any member of the tribe Swartzieae. In the present study, partial cDNA sequences encoding L-type lectins were obtained from developing seeds of S. simplex var. grandiflora. The amino acid sequences of the S. simplex grandiflora lectins (SSGLs) were only averagely related to the known primary structures of legume lectins, with sequence identities not greater than 50-52%. The SSGL sequences were more related to amino acid sequences of papilionoid lectins from members of the tribes Sophoreae and Dalbergieae and from the Cladratis and Vataireoid clades, which constitute with other taxa, the first branching lineages of the subfamily Papilionoideae. The three-dimensional structures of 2 representative SSGLs (SSGL-A and SSGL-E) were predicted by homology modeling using templates that exhibit the characteristic β-sandwich fold of the L-type lectins. Molecular docking calculations predicted that SSGL-A is able to interact with D-galactose, N-acetyl-D-galactosamine and α-lactose, whereas SSGL-E is probably a non-functional lectin due to 2 mutations in the carbohydrate-binding site. Using molecular dynamics simulations followed by density functional theory calculations, the binding free energies of the interaction of SSGL-A with GalNAc and α-lactose were estimated as -31.7 and -47.5 kcal/mol, respectively. These findings gave insights about the carbohydrate-binding specificity of SLL, which binds to immobilized lactose but is not retained in a matrix containing D-GalNAc as ligand. Copyright © 2017 Elsevier Ltd. All rights reserved.
Towards Long-Range RNA Structure Prediction in Eukaryotic Genes.
Pervouchine, Dmitri D
2018-06-15
The ability to form an intramolecular structure plays a fundamental role in eukaryotic RNA biogenesis. Proximate regions in the primary transcripts fold into a local secondary structure, which is then hierarchically assembled into a tertiary structure that is stabilized by RNA-binding proteins and long-range intramolecular base pairings. While the local RNA structure can be predicted reasonably well for short sequences, long-range structure at the scale of eukaryotic genes remains problematic from the computational standpoint. The aim of this review is to list functional examples of long-range RNA structures, to summarize current comparative methods of structure prediction, and to highlight their advances and limitations in the context of long-range RNA structures. Most comparative methods implement the “first-align-then-fold” principle, i.e., they operate on multiple sequence alignments, while functional RNA structures often reside in non-conserved parts of the primary transcripts. The opposite “first-fold-then-align” approach is currently explored to a much lesser extent. Developing novel methods in both directions will improve the performance of comparative RNA structure analysis and help discover novel long-range structures, their higher-order organization, and RNA⁻RNA interactions across the transcriptome.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Chun-Long; Zuckermann, Ronald N.; DeYoreo, James J.
The exquisite self-assembly of proteins and peptides in nature into highly ordered functional materials has inspired innovative approaches to biomimetic materials design and synthesis. Here we report the assembly of peptoids—a class of highly stable sequence-defined synthetic polymers—into biomimetic materials on mica surfaces. The assembling 12-mer peptoid contains alternating acidic and aromatic residues, and the presence of Ca2+ cations creates peptoid-peptoid and peptoid-mica interactions that drive assembly. In situ atomic force microscopy (AFM) shows that peptoids first assemble into discrete nanoparticles, these particles then transform into hexagonally-patterned nanoribbons on mica surfaces. AFM-based dynamic force spectroscopy (DFS) studies show that peptoid-micamore » interactions are much stronger than peptoidpeptoid interactions in the presence of Ca2+, illuminating the physical parameters that drive peptoid assembly. We further demonstrate the display of functional groups at the N-terminus of assembling peptoid sequence to produce biomimetic materials with similar hierarchical structures. This research demonstrates that surface-directed peptoid assembly can be used as a robust platform to develop biomimetic coating materials for applications.« less
Building toy models of proteins using coevolutionary information
NASA Astrophysics Data System (ADS)
Cheng, Ryan; Raghunathan, Mohit; Onuchic, Jose
2015-03-01
Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid positions within the multiple sequence alignment of a protein family. Here, we use Direct Coupling Analysis (DCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family to obtain the sequence-dependent interaction energies of a toy protein model. We demonstrate that this methodology predicts residue-residue interaction energies that are consistent with experimental mutational changes in protein stabilities as well as other computational methodologies. Furthermore, we demonstrate with several examples that DCA could be used to construct a structure-based model that quantitatively agrees with experimental data on folding mechanisms. This work serves as a potential framework for generating models of proteins that are enriched by evolutionary data that can potentially be used to engineer key functional motions and interactions in protein systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1427654).
Recognition and Binding of Human Telomeric G-Quadruplex DNA by Unfolding Protein 1
2015-01-01
The specific recognition by proteins of G-quadruplex structures provides evidence of a functional role for in vivo G-quadruplex structures. As previously reported, the ribonucleoprotein, hnRNP Al, and it is proteolytic derivative, unwinding protein 1 (UP1), bind to and destabilize G-quadruplex structures formed by the human telomeric repeat d(TTAGGG)n. UP1 has been proposed to be involved in the recruitment of telomerase to telomeres for chain extension. In this study, a detailed thermodynamic characterization of the binding of UP1 to a human telomeric repeat sequence, the d[AGGG(TTAGGG)3] G-quadruplex, is presented and reveals key insights into the UP1-induced unfolding of the G-quadruplex structure. The UP1–G-quadruplex interactions are shown to be enthalpically driven, exhibiting large negative enthalpy changes for the formation of both the Na+ and K+ G-quadruplex–UP1 complexes (ΔH values of −43 and −19 kcal/mol, respectively). These data reveal three distinct enthalpic contributions from the interactions of UP1 with the Na+ form of G-quadruplex DNA. The initial interaction is characterized by a binding affinity of 8.5 × 108 M–1 (strand), 200 times stronger than the binding of UP1 to a single-stranded DNA with a comparable but non-quadruplex-forming sequence [4.1 × 106 M–1 (strand)]. Circular dichroism spectroscopy reveals the Na+ form of the G-quadruplex to be completely unfolded by UP1 at a binding ratio of 2:1 (UP1:G-quadruplex DNA). The data presented here demonstrate that the favorable energetics of the initial binding event are closely coupled with and drive the unfolding of the G-quadruplex structure. PMID:24831962
Kanamori, Hiroshi; Yuhashi, Kazuhito; Ohnishi, Shin; Koike, Kazuhiko; Kodama, Tatsuhiko
2010-05-01
The hepatitis C virus NS5B RNA-dependent RNA polymerase (RdRp) is a key enzyme involved in viral replication. Interaction between NS5B RdRp and the viral RNA sequence is likely to be an important step in viral RNA replication. The C-terminal half of the NS5B-coding sequence, which contains the important cis-acting replication element, has been identified as an NS5B-binding sequence. In the present study, we confirm the specific binding of NS5B to one of the RNA stem-loop structures in the region, 5BSL3.2. In addition, we show that NS5B binds to the complementary strand of 5BSL3.2 (5BSL3.2N). The bulge structure of 5BSL3.2N was shown to be indispensable for tight binding to NS5B. In vitro RdRp activity was inhibited by 5BSL3.2N, indicating the importance of the RNA element in the polymerization by RdRp. These results suggest the involvement of the RNA stem-loop structure of the negative strand in the replication process.
Volpi, Nicola; Linhardt, Robert J
2012-01-01
Glycosaminoglycans (GAGs) have proven to be very difficult to analyze and characterize because of their high negative charge density, polydispersity and sequence heterogeneity. As the specificity of the interactions between GAGs and proteins results from the structure of these polysaccharides, an understanding of GAG structure is essential for developing a structure–activity relationship. Electrospray ionization (ESI) mass spectrometry (MS) is particularly promising for the analysis of oligosaccharides chemically or enzymatically generated by GAGs because of its relatively soft ionization capacity. Furthermore, on-line high-performance liquid chromatography (HPLC)-MS greatly enhances the characterization of complex mixtures of GAG-derived oligosaccharides, providing important structural information and affording their disaccharide composition. A detailed protocol for producing oligosaccharides from various GAGs, using controlled, specific enzymatic or chemical depolymerization, is presented, together with their HPLC separation, using volatile reversed-phase ion-pairing reagents and on-line ESI-MS structural identification. This analysis provides an oligosaccharide map together with sequence information from a reading frame beginning at the nonreducing end of the GAG chains. The preparation of oligosaccharides can be carried out in 10 h, with subsequent HPLC analysis in 1–2 h and HPLC-MS analysis taking another 2 h. PMID:20448545
Ramirez, Kelly S; Knight, Christopher G; de Hollander, Mattias; Brearley, Francis Q; Constantinides, Bede; Cotton, Anne; Creer, Si; Crowther, Thomas W; Davison, John; Delgado-Baquerizo, Manuel; Dorrepaal, Ellen; Elliott, David R; Fox, Graeme; Griffiths, Robert I; Hale, Chris; Hartman, Kyle; Houlden, Ashley; Jones, David L; Krab, Eveline J; Maestre, Fernando T; McGuire, Krista L; Monteux, Sylvain; Orr, Caroline H; van der Putten, Wim H; Roberts, Ian S; Robinson, David A; Rocca, Jennifer D; Rowntree, Jennifer; Schlaeppi, Klaus; Shepherd, Matthew; Singh, Brajesh K; Straathof, Angela L; Bhatnagar, Jennifer M; Thion, Cécile; van der Heijden, Marcel G A; de Vries, Franciska T
2018-02-01
The emergence of high-throughput DNA sequencing methods provides unprecedented opportunities to further unravel bacterial biodiversity and its worldwide role from human health to ecosystem functioning. However, despite the abundance of sequencing studies, combining data from multiple individual studies to address macroecological questions of bacterial diversity remains methodically challenging and plagued with biases. Here, using a machine-learning approach that accounts for differences among studies and complex interactions among taxa, we merge 30 independent bacterial data sets comprising 1,998 soil samples from 21 countries. Whereas previous meta-analysis efforts have focused on bacterial diversity measures or abundances of major taxa, we show that disparate amplicon sequence data can be combined at the taxonomy-based level to assess bacterial community structure. We find that rarer taxa are more important for structuring soil communities than abundant taxa, and that these rarer taxa are better predictors of community structure than environmental factors, which are often confounded across studies. We conclude that combining data from independent studies can be used to explore bacterial community dynamics, identify potential 'indicator' taxa with an important role in structuring communities, and propose hypotheses on the factors that shape bacterial biogeography that have been overlooked in the past.
Lee, Soon Goo; Krishnan, Hari B; Jez, Joseph M
2014-04-29
The symbiosis between rhizobial microbes and host plants involves the coordinated expression of multiple genes, which leads to nodule formation and nitrogen fixation. As part of the transcriptional machinery for nodulation and symbiosis across a range of Rhizobium, NolR serves as a global regulatory protein. Here, we present the X-ray crystal structures of NolR in the unliganded form and complexed with two different 22-base pair (bp) double-stranded operator sequences (oligos AT and AA). Structural and biochemical analysis of NolR reveals protein-DNA interactions with an asymmetric operator site and defines a mechanism for conformational switching of a key residue (Gln56) to accommodate variation in target DNA sequences from diverse rhizobial genes for nodulation and symbiosis. This conformational switching alters the energetic contributions to DNA binding without changes in affinity for the target sequence. Two possible models for the role of NolR in the regulation of different nodulation and symbiosis genes are proposed. To our knowledge, these studies provide the first structural insight on the regulation of genes involved in the agriculturally and ecologically important symbiosis of microbes and plants that leads to nodule formation and nitrogen fixation.
Cleveland, Sean B.; Davies, John; McClure, Marcella A.
2011-01-01
The goal of this Bioinformatic study is to investigate sequence conservation in relation to evolutionary function/structure of the nucleoprotein of the order Mononegavirales. In the combined analysis of 63 representative nucleoprotein (N) sequences from four viral families (Bornaviridae, Filoviridae, Rhabdoviridae, and Paramyxoviridae) we predict the regions of protein disorder, intra-residue contact and co-evolving residues. Correlations between location and conservation of predicted regions illustrate a strong division between families while high- lighting conservation within individual families. These results suggest the conserved regions among the nucleoproteins, specifically within Rhabdoviridae and Paramyxoviradae, but also generally among all members of the order, reflect an evolutionary advantage in maintaining these sites for the viral nucleoprotein as part of the transcription/replication machinery. Results indicate conservation for disorder in the C-terminus region of the representative proteins that is important for interacting with the phosphoprotein and the large subunit polymerase during transcription and replication. Additionally, the C-terminus region of the protein preceding the disordered region, is predicted to be important for interacting with the encapsidated genome. Portions of the N-terminus are responsible for N∶N stability and interactions identified by the presence or lack of co-evolving intra-protein contact predictions. The validation of these prediction results by current structural information illustrates the benefits of the Disorder, Intra-residue contact and Compensatory mutation Correlator (DisICC) pipeline as a method for quickly characterizing proteins and providing the most likely residues and regions necessary to target for disruption in viruses that have little structural information available. PMID:21559282
Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations
Beleva Guthrie, Violeta; Masica, David L; Fraser, Andrew; Federico, Joseph; Fan, Yunfan; Camps, Manel; Karchin, Rachel
2018-01-01
Abstract The evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure. PMID:29522102
Crystal Structure of the HEAT Domain from the Pre-mRNA Processing Factor Symplekin
Kennedy, Sarah A.; Frazier, Monica L.; Steiniger, Mindy; Mast, Ann M.; Marzluff, William F.; Redinbo, Matthew R.
2009-01-01
The majority of eukaryotic pre-mRNAs are processed by 3′-end cleavage and polyadenylation, although in metazoa the replication-dependant histone mRNAs are processed by 3′-end cleavage but not polyadenylation. The macromolecular complex responsible for processing both canonical and histone pre-mRNAs contains the ~1,160-residue protein Symplekin. Secondary structural prediction algorithms identified putative HEAT domains in the 300 N-terminal residues of all Symplekins of known sequence. The structure and dynamics of this domain were investigated to begin elucidating the role Symplekin plays in mRNA maturation. The crystal structure of the Drosophila melanogaster Symplekin HEAT domain was determined to 2.4 Å resolution using SAD phasing methods. The structure exhibits 5 canonical HEAT repeats along with an extended 31 amino acid loop (loop 8) between the fourth and fifth repeat that is conserved within closely related Symplekin sequences. Molecular dynamics simulations of this domain show that the presence of loop 8 dampens correlated and anticorrelated motion in the HEAT domain, therefore providing a neutral surface for potential protein-protein interactions. HEAT domains are often employed for such macromolecular contacts. The Symplekin HEAT region not only structurally aligns with several established scaffolding proteins, but also has been reported to contact proteins essential for regulating 3′-end processing. Taken together, these data support the conclusion that the Symplekin HEAT domain serves as a scaffold for protein-protein interactions essential to the mRNA maturation process. PMID:19576221
Abuqarn, Mehtap; Allmeling, Christina; Amshoff, Inga; Menger, Bjoern; Nasser, Inas; Vogt, Peter M; Reimers, Kerstin
2011-07-01
Urodele amphibians are exceptional in their ability to regenerate complex body structures such as limbs. Limb regeneration depends on a process called dedifferentiation. Under an inductive wound epidermis terminally differentiated cells transform to pluripotent progenitor cells that coordinately proliferate and eventually redifferentiate to form the new appendage. Recent studies have developed molecular models integrating a set of genes that might have important functions in the control of regenerative cellular plasticity. Among them is Msx1, which induced dedifferentiation in mammalian myotubes in vitro. Herein, we screened for interaction partners of axolotl Msx1 using a yeast two hybrid system. A two hybrid cDNA library of 5-day-old wound epidermis and underlying tissue containing more than 2×10⁶ cDNAs was constructed and used in the screen. 34 resulting cDNA clones were isolated and sequenced. We then compared sequences of the isolated clones to annotated EST contigs of the Salamander EST database (BLASTn) to identify presumptive orthologs. We subsequently searched all no-hit clone sequences against non redundant NCBI sequence databases using BLASTx. It is the first time, that the yeast two hybrid system was adapted to the axolotl animal model and successfully used in a screen for proteins interacting with Msx1 in the context of amphibian limb regeneration. 2011 Elsevier B.V. All rights reserved.
Computational biology of RNA interactions.
Dieterich, Christoph; Stadler, Peter F
2013-01-01
The biodiversity of the RNA world has been underestimated for decades. RNA molecules are key building blocks, sensors, and regulators of modern cells. The biological function of RNA molecules cannot be separated from their ability to bind to and interact with a wide space of chemical species, including small molecules, nucleic acids, and proteins. Computational chemists, physicists, and biologists have developed a rich tool set for modeling and predicting RNA interactions. These interactions are to some extent determined by the binding conformation of the RNA molecule. RNA binding conformations are approximated with often acceptable accuracy by sequence and secondary structure motifs. Secondary structure ensembles of a given RNA molecule can be efficiently computed in many relevant situations by employing a standard energy model for base pair interactions and dynamic programming techniques. The case of bi-molecular RNA-RNA interactions can be seen as an extension of this approach. However, unbiased transcriptome-wide scans for local RNA-RNA interactions are computationally challenging yet become efficient if the binding motif/mode is known and other external information can be used to confine the search space. Computational methods are less developed for proteins and small molecules, which bind to RNA with very high specificity. Binding descriptors of proteins are usually determined by in vitro high-throughput assays (e.g., microarrays or sequencing). Intriguingly, recent experimental advances, which are mostly based on light-induced cross-linking of binding partners, render in vivo binding patterns accessible yet require new computational methods for careful data interpretation. The grand challenge is to model the in vivo situation where a complex interplay of RNA binders competes for the same target RNA molecule. Evidently, bioinformaticians are just catching up with the impressive pace of these developments. Copyright © 2012 John Wiley & Sons, Ltd.
3DIANA: 3D Domain Interaction Analysis: A Toolbox for Quaternary Structure Modeling
Segura, Joan; Sanchez-Garcia, Ruben; Tabas-Madrid, Daniel; Cuenca-Alba, Jesus; Sorzano, Carlos Oscar S.; Carazo, Jose Maria
2016-01-01
Electron microscopy (EM) is experiencing a revolution with the advent of a new generation of Direct Electron Detectors, enabling a broad range of large and flexible structures to be resolved well below 1 nm resolution. Although EM techniques are evolving to the point of directly obtaining structural data at near-atomic resolution, for many molecules the attainable resolution might not be enough to propose high-resolution structural models. However, accessing information on atomic coordinates is a necessary step toward a deeper understanding of the molecular mechanisms that allow proteins to perform specific tasks. For that reason, methods for the integration of EM three-dimensional maps with x-ray and NMR structural data are being developed, a modeling task that is normally referred to as fitting, resulting in the so called hybrid models. In this work, we present a novel application—3DIANA—specially targeted to those cases in which the EM map resolution is medium or low and additional experimental structural information is scarce or even lacking. In this way, 3DIANA statistically evaluates proposed/potential contacts between protein domains, presents a complete catalog of both structurally resolved and predicted interacting regions involving these domains and, finally, suggests structural templates to model the interaction between them. The evaluation of the proposed interactions is computed with DIMERO, a new method that scores physical binding sites based on the topology of protein interaction networks, which has recently shown the capability to increase by 200% the number of domain-domain interactions predicted in interactomes as compared to previous approaches. The new application displays the information at a sequence and structural level and is accessible through a web browser or as a Chimera plugin at http://3diana.cnb.csic.es. PMID:26772592
Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang
2018-03-10
Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
You’re a good structure, Charlie Brown: The distribution of narrative categories in comic strips
Cohn, Neil
2014-01-01
Cohn’s (2013) theory of “Visual Narrative Grammar” argues that sequential images take on categorical roles in a narrative structure, which organizes them into hierarchic constituents analogous to the organization of syntactic categories in sentences. This theory proposes that narrative categories, like syntactic categories, can be identified through diagnostic tests that reveal tendencies for their distribution throughout a sequence. This paper describes four experiments testing these diagnostics to provide support for the validity of these narrative categories. In Experiment 1, participants reconstructed unordered panels of a comic strip into an order that makes sense. Experiment 2 measured viewing times to panels in sequences where the order of panels was reversed. In Experiment 3 participants again reconstructed strips, but also deleted a panel from the sequence. Finally, in Experiment 4 participants identified where a panel had been deleted from a comic strip, and rated that strip’s coherence. Overall, categories had consistent distributional tendencies within experiments and complementary tendencies across experiments. These results point toward an interaction between categorical roles and a global narrative structure. PMID:24646175
StrBioLib: a Java library for development of custom computationalstructural biology applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chandonia, John-Marc
2007-05-14
Summary: StrBioLib is a library of Java classes useful fordeveloping software for computational structural biology research.StrBioLib contains classes to represent and manipulate proteinstructures, biopolymer sequences, sets of biopolymer sequences, andalignments between biopolymers based on either sequence or structure.Interfaces are provided to interact with commonly used bioinformaticsapplications, including (PSI)-BLAST, MODELLER, MUSCLE, and Primer3, andtools are provided to read and write many file formats used to representbioinformatic data. The library includes a general-purpose neural networkobject with multiple training algorithms, the Hooke and Jeeves nonlinearoptimization algorithm, and tools for efficient C-style string parsingand formatting. StrBioLib is the basis for the Pred2ary secondarystructure predictionmore » program, is used to build the ASTRAL compendium forsequence and structure analysis, and has been extensively tested throughuse in many smaller projects. Examples and documentation are available atthe site below.Availability: StrBioLib may be obtained under the terms ofthe GNU LGPL license from http://strbio.sourceforge.net/Contact:JMChandonia@lbl.gov« less
Loucks, Jeff; Mutschler, Christina; Meltzoff, Andrew N
2017-09-01
Children's imitation of adults plays a prominent role in human cognitive development. However, few studies have investigated how children represent the complex structure of observed actions which underlies their imitation. We integrate theories of action segmentation, memory, and imitation to investigate whether children's event representation is organized according to veridical serial order or a higher level goal structure. Children were randomly assigned to learn novel event sequences either through interactive hands-on experience (Study 1) or via storybook (Study 2). Results demonstrate that children's representation of observed actions is organized according to higher level goals, even at the cost of representing the veridical temporal ordering of the sequence. We argue that prioritizing goal structure enhances event memory, and that this mental organization is a key mechanism of social-cognitive development in real-world, dynamic environments. It supports cultural learning and imitation in ecologically valid settings when social agents are multitasking and not demonstrating one isolated goal at a time. Copyright © 2016 Cognitive Science Society, Inc.
You're a good structure, Charlie Brown: the distribution of narrative categories in comic strips.
Cohn, Neil
2014-01-01
Cohn's (2013) theory of "Visual Narrative Grammar" argues that sequential images take on categorical roles in a narrative structure, which organizes them into hierarchic constituents analogous to the organization of syntactic categories in sentences. This theory proposes that narrative categories, like syntactic categories, can be identified through diagnostic tests that reveal tendencies for their distribution throughout a sequence. This paper describes four experiments testing these diagnostics to provide support for the validity of these narrative categories. In Experiment 1, participants reconstructed unordered panels of a comic strip into an order that makes sense. Experiment 2 measured viewing times to panels in sequences where the order of panels was reversed. In Experiment 3, participants again reconstructed strips but also deleted a panel from the sequence. Finally, in Experiment 4 participants identified where a panel had been deleted from a comic strip and rated that strip's coherence. Overall, categories had consistent distributional tendencies within experiments and complementary tendencies across experiments. These results point toward an interaction between categorical roles and a global narrative structure. © 2014 Cognitive Science Society, Inc.
Panayotou, G; Bax, B; Gout, I; Federwisch, M; Wroblowski, B; Dhand, R; Fry, M J; Blundell, T L; Wollmer, A; Waterfield, M D
1992-01-01
Circular dichroism and fluorescence spectroscopy were used to investigate the structure of the p85 alpha subunit of the PI 3-kinase, a closely related p85 beta protein, and a recombinant SH2 domain-containing fragment of p85 alpha. Significant spectral changes, indicative of a conformational change, were observed on formation of a complex with a 17 residue peptide containing a phosphorylated tyrosine residue. The sequence of this peptide is identical to the sequence surrounding Tyr751 in the kinase-insert region of the platelet-derived growth factor beta-receptor (beta PDGFR). The rotational correlation times measured by fluorescence anisotropy decay indicated that phosphopeptide binding changed the shape of the SH2 domain-containing fragment. The CD and fluorescence spectroscopy data support the secondary structure prediction based on sequence analysis and provide evidence for flexible linker regions between the various domains of the p85 proteins. The significance of these results for SH2 domain-containing proteins is discussed. Images PMID:1330535
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis
Rampp, Markus; Soddemann, Thomas; Lederer, Hermann
2006-01-01
We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980
Structure, replication efficiency and fragility of yeast ARS elements.
Dhar, Manoj K; Sehgal, Shelly; Kaul, Sanjana
2012-05-01
DNA replication in eukaryotes initiates at specific sites known as origins of replication, or replicators. These replication origins occur throughout the genome, though the propensity of their occurrence depends on the type of organism. In eukaryotes, zones of initiation of replication spanning from about 100 to 50,000 base pairs have been reported. The characteristics of eukaryotic replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where some autonomously replicating sequences, or ARS elements, confer origin activity. ARS elements are short DNA sequences of a few hundred base pairs, identified by their efficiency at initiating a replication event when cloned in a plasmid. ARS elements, although structurally diverse, maintain a basic structure composed of three domains, A, B and C. Domain A is comprised of a consensus sequence designated ACS (ARS consensus sequence), while the B domain has the DNA unwinding element and the C domain is important for DNA-protein interactions. Although there are ∼400 ARS elements in the yeast genome, not all of them are active origins of replication. Different groups within the genus Saccharomyces have ARS elements as components of replication origin. The present paper provides a comprehensive review of various aspects of ARSs, starting from their structural conservation to sequence thermodynamics. All significant and conserved functional sequence motifs within different types of ARS elements have been extensively described. Issues like silencing at ARSs, their inherent fragility and factors governing their replication efficiency have also been addressed. Progress in understanding crucial components associated with the replication machinery and timing at these ARS elements is discussed in the section entitled "The replicon revisited". Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Jagtap, Soham; Shivaprasad, Padubidri V
2014-12-02
Micro (mi)RNAs are important regulators of plant development. Across plant lineages, Dicer-like 1 (DCL1) proteins process long ds-like structures to produce micro (mi) RNA duplexes in a stepwise manner. These miRNAs are incorporated into Argonaute (AGO) proteins and influence expression of RNAs that have sequence complementarity with miRNAs. Expression levels of AGOs are greatly regulated by plants in order to minimize unwarranted perturbations using miRNAs to target mRNAs coding for AGOs. AGOs may also have high promoter specificity-sometimes expression of AGO can be limited to just a few cells in a plant. Viral pathogens utilize various means to counter antiviral roles of AGOs including hijacking the host encoded miRNAs to target AGOs. Two host encoded miRNAs namely miR168 and miR403 that target AGOs have been described in the model plant Arabidopsis and such a mechanism is thought to be well conserved across plants because AGO sequences are well conserved. We show that the interaction between AGO mRNAs and miRNAs is species-specific due to the diversity in sequences of two miRNAs that target AGOs, sequence diversity among corresponding target regions in AGO mRNAs and variable expression levels of these miRNAs among vascular plants. We used miRNA sequences from 68 plant species representing 31 plant families for this analysis. Sequences of miR168 and miR403 are not conserved among plant lineages, but surprisingly they differ drastically in their sequence diversity and expression levels even among closely related plants. Variation in miR168 expression among plants correlates well with secondary structures/length of loop sequences of their precursors. Our data indicates a complex AGO targeting interaction among plant lineages due to miRNA sequence diversity and sequences of miRNA targeting regions among AGO mRNAs, thus leading to the assumption that the perturbations by viruses that use host miRNAs to target antiviral AGOs can only be species-specific. We also show that rapid evolution and likely loss of expression of miR168 isoforms in tobacco is related to the insertion of MITE-like transposons between miRNA and miRNA* sequences, a possible mechanism showing how miRNAs are lost in few plant lineages even though other close relatives have abundantly expressing miRNAs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pokkuluri, P. R.; Londer, Y. Y.; Yang, X.
2010-02-01
Periplasmic cytochromes c{sub 7} are important in electron transfer pathway(s) in Fe(III) respiration by Geobacter sulfurreducens. The genome of G. sulfurreducens encodes a family of five 10-kDa, three-heme cytochromes c{sub 7}. The sequence identity between the five proteins (designated PpcA, PpcB, PpcC, PpcD, and PpcE) varies between 45% and 77%. Here, we report the high-resolution structures of PpcC, PpcD, and PpcE determined by X-ray diffraction. This new information made it possible to compare the sequences and structures of the entire family. The triheme cores are largely conserved but are not identical. We observed changes, due to different crystal packing, inmore » the relative positions of the hemes between two molecules in the crystal. The overall protein fold of the cytochromes is similar. The structure of PpcD differs most from that of the other homologs, which is not obvious from the sequence comparisons of the family. Interestingly, PpcD is the only cytochrome c{sub 7} within the family that has higher abundance when G. sulfurreducens is grown on insoluble Fe(III) oxide compared to ferric citrate. The structures have the highest degree of conservation around 'heme IV'; the protein surface around this heme is positively charged in all of the proteins, and therefore all cytochromes c{sub 7} could interact with similar molecules involving this region. The structures and surface characteristics of the proteins near the other two hemes, 'heme I' and 'heme III', differ within the family. The above observations suggest that each of the five cytochromes c{sub 7} could interact with its own redox partner via an interface involving the regions of heme I and/or heme III; this provides a possible rationalization for the existence of five similar proteins in G. sulfurreducens.« less
Churchill, Mair E.A.; Klass, Janet; Zoetewey, David L.
2010-01-01
The ubiquitous eukaryotic High-Mobility-Group-Box (HMGB) chromosomal proteins promote many chromatin-mediated cellular activities through their non-sequence-specific binding and bending of DNA. Minor groove DNA binding by the HMG box results in substantial DNA bending toward the major groove owing to electrostatic interactions, shape complementarity and DNA intercalation that occurs at two sites. Here, the structures of the complexes formed with DNA by a partially DNA intercalation-deficient mutant of Drosophila melanogaster HMGD have been determined by X-ray crystallography at a resolution of 2.85 Å. The six proteins and fifty base pairs of DNA in the crystal structure revealed a variety of bound conformations. All of the proteins bound in the minor groove, bridging DNA molecules, presumably because these DNA regions are easily deformed. The loss of the primary site of DNA intercalation decreased overall DNA bending and shape complementarity. However, DNA bending at the secondary site of intercalation was retained and most protein-DNA contacts were preserved. The mode of binding resembles the HMGB1-boxA-cisplatin-DNA complex, which also lacks a primary intercalating residue. This study provides new insights into the binding mechanisms used by HMG boxes to recognize varied DNA structures and sequences as well as modulate DNA structure and DNA bending. PMID:20800069