Protein Crystal Eco R1 Endonulease-DNA Complex
NASA Technical Reports Server (NTRS)
1998-01-01
Type II restriction enzymes, such as Eco R1 endonulease, present a unique advantage for the study of sequence-specific recognition because they leave a record of where they have been in the form of the cleaved ends of the DNA sites where they were bound. The differential behavior of a sequence -specific protein at sites of differing base sequence is the essence of the sequence-specificity; the core question is how do these proteins discriminate between different DNA sequences especially when the two sequences are very similar. Principal Investigator: Dan Carter/New Century Pharmaceuticals
Structure and Sequence Search on Aptamer-Protein Docking
NASA Astrophysics Data System (ADS)
Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie
2015-03-01
Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.
HomPPI: a class of sequence homology based protein-protein interface prediction methods
2011-01-01
Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895
Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens
2015-01-01
Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100
Isolation and characterization of target sequences of the chicken CdxA homeobox gene.
Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A
1993-01-01
The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
Specific minor groove solvation is a crucial determinant of DNA binding site recognition
Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.
2014-01-01
The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976
Lou, Tzu-Fang; Weidmann, Chase A; Killingsworth, Jordan; Tanaka Hall, Traci M; Goldstrohm, Aaron C; Campbell, Zachary T
2017-04-15
RNA-binding proteins (RBPs) collaborate to control virtually every aspect of RNA function. Tremendous progress has been made in the area of global assessment of RBP specificity using next-generation sequencing approaches both in vivo and in vitro. Understanding how protein-protein interactions enable precise combinatorial regulation of RNA remains a significant problem. Addressing this challenge requires tools that can quantitatively determine the specificities of both individual proteins and multimeric complexes in an unbiased and comprehensive way. One approach utilizes in vitro selection, high-throughput sequencing, and sequence-specificity landscapes (SEQRS). We outline a SEQRS experiment focused on obtaining the specificity of a multi-protein complex between Drosophila RBPs Pumilio (Pum) and Nanos (Nos). We discuss the necessary controls in this type of experiment and examine how the resulting data can be complemented with structural and cell-based reporter assays. Additionally, SEQRS data can be integrated with functional genomics data to uncover biological function. Finally, we propose extensions of the technique that will enhance our understanding of multi-protein regulatory complexes assembled onto RNA. Copyright © 2016 Elsevier Inc. All rights reserved.
Churchill, M E; Jones, D N; Glaser, T; Hefner, H; Searles, M A; Travers, A A
1995-01-01
The high mobility group (HMG) protein HMG-D from Drosophila melanogaster is a highly abundant chromosomal protein that is closely related to the vertebrate HMG domain proteins HMG1 and HMG2. In general, chromosomal HMG domain proteins lack sequence specificity. However, using both NMR spectroscopy and standard biochemical techniques we show that binding of HMG-D to a single DNA site is sequence selective. The preferred duplex DNA binding site comprises at least 5 bp and contains the deformable dinucleotide TG embedded in A/T-rich sequences. The TG motif constitutes a common core element in the binding sites of the well-characterized sequence-specific HMG domain proteins. We show that a conserved aromatic residue in helix 1 of the HMG domain may be involved in recognition of this core sequence. In common with other HMG domain proteins HMG-D binds preferentially to DNA sites that are stably bent and underwound, therefore HMG-D can be considered an architecture-specific protein. Finally, we show that HMG-D bends DNA and may confer a superhelical DNA conformation at a natural DNA binding site in the Drosophila fushi tarazu scaffold-associated region. Images PMID:7720717
2009-01-01
Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996
Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas
2009-10-12
Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.
Optimizing high performance computing workflow for protein functional annotation.
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-09-10
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-01-01
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
1998-12-01
Type II restriction enzymes, such as Eco R1 endonulease, present a unique advantage for the study of sequence-specific recognition because they leave a record of where they have been in the form of the cleaved ends of the DNA sites where they were bound. The differential behavior of a sequence -specific protein at sites of differing base sequence is the essence of the sequence-specificity; the core question is how do these proteins discriminate between different DNA sequences especially when the two sequences are very similar. Principal Investigator: Dan Carter/New Century Pharmaceuticals
2017-01-01
Abstract Target search as performed by DNA-binding proteins is a complex process, in which multiple factors contribute to both thermodynamic discrimination of the target sequence from overwhelmingly abundant off-target sites and kinetic acceleration of dynamic sequence interrogation. TRF1, the protein that binds to telomeric tandem repeats, faces an intriguing variant of the search problem where target sites are clustered within short fragments of chromosomal DNA. In this study, we use extensive (>0.5 ms in total) MD simulations to study the dynamical aspects of sequence-specific binding of TRF1 at both telomeric and non-cognate DNA. For the first time, we describe the spontaneous formation of a sequence-specific native protein–DNA complex in atomistic detail, and study the mechanism by which proteins avoid off-target binding while retaining high affinity for target sites. Our calculated free energy landscapes reproduce the thermodynamics of sequence-specific binding, while statistical approaches allow for a comprehensive description of intermediate stages of complex formation. PMID:28633355
Zhang, Lu; Xu, Jinhao; Ma, Jinbiao
2016-07-25
RNA-binding protein exerts important biological function by specifically recognizing RNA motif. SELEX (Systematic evolution of ligands by exponential enrichment), an in vitro selection method, can obtain consensus motif with high-affinity and specificity for many target molecules from DNA or RNA libraries. Here, we combined SELEX with next-generation sequencing to study the protein-RNA interaction in vitro. A pool of RNAs with 20 bp random sequences were transcribed by T7 promoter, and target protein was inserted into plasmid containing SBP-tag, which can be captured by streptavidin beads. Through only one cycle, the specific RNA motif can be obtained, which dramatically improved the selection efficiency. Using this method, we found that human hnRNP A1 RRMs domain (UP1 domain) bound RNA motifs containing AGG and AG sequences. The EMSA experiment indicated that hnRNP A1 RRMs could bind the obtained RNA motif. Taken together, this method provides a rapid and effective method to study the RNA binding specificity of proteins.
Mining for class-specific motifs in protein sequence classification
2013-01-01
Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms. PMID:23496846
Nagano, Yukio; Furuhashi, Hirofumi; Inaba, Takehito; Sasaki, Yukiko
2001-01-01
Complementary DNA encoding a DNA-binding protein, designated PLATZ1 (plant AT-rich sequence- and zinc-binding protein 1), was isolated from peas. The amino acid sequence of the protein is similar to those of other uncharacterized proteins predicted from the genome sequences of higher plants. However, no paralogous sequences have been found outside the plant kingdom. Multiple alignments among these paralogous proteins show that several cysteine and histidine residues are invariant, suggesting that these proteins are a novel class of zinc-dependent DNA-binding proteins with two distantly located regions, C-x2-H-x11-C-x2-C-x(4–5)-C-x2-C-x(3–7)-H-x2-H and C-x2-C-x(10–11)-C-x3-C. In an electrophoretic mobility shift assay, the zinc chelator 1,10-o-phenanthroline inhibited DNA binding, and two distant zinc-binding regions were required for DNA binding. A protein blot with 65ZnCl2 showed that both regions are required for zinc-binding activity. The PLATZ1 protein non-specifically binds to A/T-rich sequences, including the upstream region of the pea GTPase pra2 and plastocyanin petE genes. Expression of the PLATZ1 repressed those of the reporter constructs containing the coding sequence of luciferase gene driven by the cauliflower mosaic virus (CaMV) 35S90 promoter fused to the tandem repeat of the A/T-rich sequences. These results indicate that PLATZ1 is a novel class of plant-specific zinc-dependent DNA-binding protein responsible for A/T-rich sequence-mediated transcriptional repression. PMID:11600698
Sequence specificity of single-stranded DNA-binding proteins: a novel DNA microarray approach
Morgan, Hugh P.; Estibeiro, Peter; Wear, Martin A.; Max, Klaas E.A.; Heinemann, Udo; Cubeddu, Liza; Gallagher, Maurice P.; Sadler, Peter J.; Walkinshaw, Malcolm D.
2007-01-01
We have developed a novel DNA microarray-based approach for identification of the sequence-specificity of single-stranded nucleic-acid-binding proteins (SNABPs). For verification, we have shown that the major cold shock protein (CspB) from Bacillus subtilis binds with high affinity to pyrimidine-rich sequences, with a binding preference for the consensus sequence, 5′-GTCTTTG/T-3′. The sequence was modelled onto the known structure of CspB and a cytosine-binding pocket was identified, which explains the strong preference for a cytosine base at position 3. This microarray method offers a rapid high-throughput approach for determining the specificity and strength of ss DNA–protein interactions. Further screening of this newly emerging family of transcription factors will help provide an insight into their cellular function. PMID:17488853
Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie
2003-04-02
Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi
2016-03-02
Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
UniDrug-target: a computational tool to identify unique drug targets in pathogenic bacteria.
Chanumolu, Sree Krishna; Rout, Chittaranjan; Chauhan, Rajinder S
2012-01-01
Targeting conserved proteins of bacteria through antibacterial medications has resulted in both the development of resistant strains and changes to human health by destroying beneficial microbes which eventually become breeding grounds for the evolution of resistances. Despite the availability of more than 800 genomes sequences, 430 pathways, 4743 enzymes, 9257 metabolic reactions and protein (three-dimensional) 3D structures in bacteria, no pathogen-specific computational drug target identification tool has been developed. A web server, UniDrug-Target, which combines bacterial biological information and computational methods to stringently identify pathogen-specific proteins as drug targets, has been designed. Besides predicting pathogen-specific proteins essentiality, chokepoint property, etc., three new algorithms were developed and implemented by using protein sequences, domains, structures, and metabolic reactions for construction of partial metabolic networks (PMNs), determination of conservation in critical residues, and variation analysis of residues forming similar cavities in proteins sequences. First, PMNs are constructed to determine the extent of disturbances in metabolite production by targeting a protein as drug target. Conservation of pathogen-specific protein's critical residues involved in cavity formation and biological function determined at domain-level with low-matching sequences. Last, variation analysis of residues forming similar cavities in proteins sequences from pathogenic versus non-pathogenic bacteria and humans is performed. The server is capable of predicting drug targets for any sequenced pathogenic bacteria having fasta sequences and annotated information. The utility of UniDrug-Target server was demonstrated for Mycobacterium tuberculosis (H37Rv). The UniDrug-Target identified 265 mycobacteria pathogen-specific proteins, including 17 essential proteins which can be potential drug targets. UniDrug-Target is expected to accelerate pathogen-specific drug targets identification which will increase their success and durability as drugs developed against them have less chance to develop resistances and adverse impact on environment. The server is freely available at http://117.211.115.67/UDT/main.html. The standalone application (source codes) is available at http://www.bioinformatics.org/ftp/pub/bioinfojuit/UDT.rar.
Modular protein domains: an engineering approach toward functional biomaterials.
Lin, Charng-Yu; Liu, Julie C
2016-08-01
Protein domains and peptide sequences are a powerful tool for conferring specific functions to engineered biomaterials. Protein sequences with a wide variety of functionalities, including structure, bioactivity, protein-protein interactions, and stimuli responsiveness, have been identified, and advances in molecular biology continue to pinpoint new sequences. Protein domains can be combined to make recombinant proteins with multiple functionalities. The high fidelity of the protein translation machinery results in exquisite control over the sequence of recombinant proteins and the resulting properties of protein-based materials. In this review, we discuss protein domains and peptide sequences in the context of functional protein-based materials, composite materials, and their biological applications. Copyright © 2016 Elsevier Ltd. All rights reserved.
Drobni, Mirva; Hallberg, Kristina; Öhman, Ulla; Birve, Anna; Persson, Karina; Johansson, Ingegerd; Strömberg, Nicklas
2006-01-01
Background Actinomyces naeslundii genospecies 1 and 2 express type-2 fimbriae (FimA subunit polymers) with variant Galβ binding specificities and Actinomyces odontolyticus a sialic acid specificity to colonize different oral surfaces. However, the fimbrial nature of the sialic acid binding property and sequence information about FimA proteins from multiple strains are lacking. Results Here we have sequenced fimA genes from strains of A.naeslundii genospecies 1 (n = 4) and genospecies 2 (n = 4), both of which harboured variant Galβ-dependent hemagglutination (HA) types, and from A.odontolyticus PK984 with a sialic acid-dependent HA pattern. Three unique subtypes of FimA proteins with 63.8–66.4% sequence identity were present in strains of A. naeslundii genospecies 1 and 2 and A. odontolyticus. The generally high FimA sequence identity (>97.2%) within a genospecies revealed species specific sequences or segments that coincided with binding specificity. All three FimA protein variants contained a signal peptide, pilin motif, E box, proline-rich segment and an LPXTG sorting motif among other conserved segments for secretion, assembly and sorting of fimbrial proteins. The highly conserved pilin, E box and LPXTG motifs are present in fimbriae proteins from other Gram-positive bacteria. Moreover, only strains of genospecies 1 were agglutinated with type-2 fimbriae antisera derived from A. naeslundii genospecies 1 strain 12104, emphasizing that the overall folding of FimA may generate different functionalities. Western blot analyses with FimA antisera revealed monomers and oligomers of FimA in whole cell protein extracts and a purified recombinant FimA preparation, indicating a sortase-independent oligomerization of FimA. Conclusion The genus Actinomyces involves a diversity of unique FimA proteins with conserved pilin, E box and LPXTG motifs, depending on subspecies and associated binding specificity. In addition, a sortase independent oligomerization of FimA subunit proteins in solution was indicated. PMID:16686953
Gaynor, R; Soultanakis, E; Kuwabara, M; Garcia, J; Sigman, D S
1989-01-01
The transactivator protein, tat, encoded by the human immunodeficiency virus is a key regulator of viral transcription. Activation by the tat protein requires sequences downstream of the transcription initiation site called the transactivating region (TAR). RNA derived from the TAR is capable of forming a stable stem-loop structure and the maintenance of both the stem structure and the loop sequences located between +19 and +44 is required for complete in vivo activation by tat. Gel retardation assays with RNA from both wild-type and mutant TAR constructs generated in vitro with SP6 polymerase indicated specific binding of HeLa nuclear proteins to the TAR. To characterize this RNA-protein interaction, a method of chemical "imprinting" has been developed using photoactivated uranyl acetate as the nucleolytic agent. This reagent nicks RNA under physiological conditions at all four nucleotides in a reaction that is independent of sequence and secondary structure. Specific interaction of cellular proteins with TAR RNA could be detected by enhanced cleavages or imprints surrounding the loop region. Mutations that either disrupted stem base-pairing or extensively changed the primary sequence resulted in alterations in the cleavage pattern of the TAR RNA. Structural features of the TAR RNA stem-loop essential for tat activation are also required for specific binding of the HeLa cell nuclear protein. Images PMID:2544877
AlignMe—a membrane protein sequence alignment web server
Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.
2014-01-01
We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425
Engineering peptide ligase specificity by proteomic identification of ligation sites.
Weeks, Amy M; Wells, James A
2018-01-01
Enzyme-catalyzed peptide ligation is a powerful tool for site-specific protein bioconjugation, but stringent enzyme-substrate specificity limits its utility. We developed an approach for comprehensively characterizing peptide ligase specificity for N termini using proteome-derived peptide libraries. We used this strategy to characterize the ligation efficiency for >25,000 enzyme-substrate pairs in the context of the engineered peptide ligase subtiligase and identified a family of 72 mutant subtiligases with activity toward N-terminal sequences that were previously recalcitrant to modification. We applied these mutants individually for site-specific bioconjugation of purified proteins, including antibodies, and in algorithmically selected combinations for sequencing of the cellular N terminome with reduced sequence bias. We also developed a web application to enable algorithmic selection of the most efficient subtiligase variant(s) for bioconjugation to user-defined sequences. Our methods provide a new toolbox of enzymes for site-specific protein modification and a general approach for rapidly defining and engineering peptide ligase specificity.
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.
Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai
2015-12-01
The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Functional annotation from the genome sequence of the giant panda.
Huo, Tong; Zhang, Yinjie; Lin, Jianping
2012-08-01
The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.
Predicting protein-binding RNA nucleotides with consideration of binding partners.
Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook
2015-06-01
In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Evaluating, Comparing, and Interpreting Protein Domain Hierarchies
2014-01-01
Abstract Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific aspects of a hierarchy, for probing the criteria underlying its construction and for direct comparisons between hierarchies. Information theoretical notions are used to quantify the contributions of specific hierarchical features to the underlying statistical model. Such features include subhierarchies, sequence subgroups, individual sequences, and subgroup-associated signature patterns. Underlying properties are graphically displayed in plots of each specific feature's contributions, in heat maps of pattern residue conservation, in “contrast alignments,” and through cross-mapping of subgroups between hierarchies. Together, these approaches provide a deeper understanding of protein domain functional divergence, reveal uncertainties caused by inconsistent patterns of sequence conservation, and help resolve conflicts between competing hierarchies. PMID:24559108
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.
Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt
2008-07-01
MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
Engineered proteins with PUF scaffold to manipulate RNA metabolism
Wang, Yang; Wang, Zefeng; Tanaka Hall, Traci M.
2013-01-01
Pumilio/fem-3 mRNA binding factor (FBF) proteins are characterized by a sequence-specific RNA-binding domain. This unique single-stranded RNA recognition module, whose sequence specificity can be reprogrammed, has been fused with functional modules to engineer protein factors with various functions. Here we summarize the advancement in developing RNA regulatory tools and opportunities for the future. PMID:23731364
Cross-Specificities between cII-like Proteins and pRE-like Promoters of Lambdoid Bacteriophages
Wulff, Daniel L.; Mahoney, Michael E.
1987-01-01
We have investigated the activation of transcription from the pRE promoters of phages λ, 21 and P22 by the λ and 21 cII proteins and the P22 c1 (cII-like) protein, using an in vivo system in which cII protein from a derepressed prophage activates transcription from a pRE DNA fragment on a multicopy plasmid. We find that each protein is highly specific for its own cognate pRE promoter, although measureable cross-reactions are observed. The primary recognition sequence for cII protein on λ pRE is a pair of TTGC repeat sequences in the sequence 5'-TTGCN 6TTGC-3' at the -35 region of the promoter. This same sequence is found in 21 pRE, while P22 pRE has the sequence 5'-TTGCN6TTGT-3', which is the same as that of λctr1, a pRE+ variant of λ. λctr1 pRE is half as active as λ + pRE when assayed with either the λ cII or the P22 c1 proteins. Therefore, the single base change in the P22 repeat sequence cannot explain why the P22 c1 protein is much more active with P22 pRE than λ p RE. The dya5 mutation, a G→A change at position -43 of pRE, makes pRE a stronger promoter when assayed with either the λ or 21 cII proteins or the P22 c1 protein. We conclude that efficient activation of a cII-dependent promoter by a cII protein requires sequence information in addition to the TTGC repeat sequences. We do not know the characteristics of the proteins which are responsible for the specificity of each protein for its own cognate promoter. However, λdya8, which has a Glu27→Lys alteration in the λ cII protein and a cII+ phenotype, results in a mutant cII protein that is much more highly specific than wild-type cII protein for its own cognate λ p RE promoter. This is especially remarkable because the dya8 amino acid alteration makes the helix-2 region (the region of the protein predicted to make contact with the phosphodiester backbone of the DNA) of λ cII protein conform exactly with the helix-2 region of the P22 c1 protein in both charge and charge distribution. PMID:2953649
McCarthy, Jason R.; Weissleder, Ralph
2007-01-01
Background Probes that allow site-specific protein labeling have become critical tools for visualizing biological processes. Methods Here we used phage display to identify a novel peptide sequence with nanomolar affinity for near infrared (NIR) (benz)indolium fluorochromes. The developed peptide sequence (“IQ-tag”) allows detection of NIR dyes in a wide range of assays including ELISA, flow cytometry, high throughput screens, microscopy, and optical in vivo imaging. Significance The described method is expected to have broad utility in numerous applications, namely site-specific protein imaging, target identification, cell tracking, and drug development. PMID:17653285
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Matthew T.; Higgin, Joshua J.; Hall, Traci M.Tanaka
2008-06-06
Pumilio/FBF (PUF) family proteins are found in eukaryotic organisms and regulate gene expression post-transcriptionally by binding to sequences in the 3' untranslated region of target transcripts. PUF proteins contain an RNA binding domain that typically comprises eight {alpha}-helical repeats, each of which recognizes one RNA base. Some PUF proteins, including yeast Puf4p, have altered RNA binding specificity and use their eight repeats to bind to RNA sequences with nine or ten bases. Here we report the crystal structures of Puf4p alone and in complex with a 9-nucleotide (nt) target RNA sequence, revealing that Puf4p accommodates an 'extra' nucleotide by modestmore » adaptations allowing one base to be turned away from the RNA binding surface. Using structural information and sequence comparisons, we created a mutant Puf4p protein that preferentially binds to an 8-nt target RNA sequence over a 9-nt sequence and restores binding of each protein repeat to one RNA base.« less
Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel; Ten Have, Arjen
2018-01-01
Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.
Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel
2018-01-01
Background Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER. PMID:29579071
Characterization and prediction of residues determining protein functional specificity.
Capra, John A; Singh, Mona
2008-07-01
Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular functional specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs. We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolutionary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs. Datasets and GroupSim code are available online at http://compbio.cs.princeton.edu/specificity/. Supplementary data are available at Bioinformatics online.
Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret
2007-11-01
LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Suhkmann; Zhang, Ziming; Upchurch, Sean
2004-04-16
2 ARID is a homologous family of DNA-binding domains that occur in DNA binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We havemore » solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized non-specific DNA-binding by the SWI1 ARID domain. Results from this study indicate that a flexible long internal loop in ARID motif is likely to be important for sequence specific DNA-recognition. The structure of human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that boundary of the DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies. Key Words: ARID, SWI1, NMR, structural genomics, protein-DNA interaction.« less
Lingner, Thomas; Kataya, Amr R. A.; Reumann, Sigrun
2012-01-01
We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences.1 As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity.” Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals. PMID:22415050
Lingner, Thomas; Kataya, Amr R A; Reumann, Sigrun
2012-02-01
We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences. As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity." Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals.
Salt bridges: geometrically specific, designable interactions.
Donald, Jason E; Kulp, Daniel W; DeGrado, William F
2011-03-01
Salt bridges occur frequently in proteins, providing conformational specificity and contributing to molecular recognition and catalysis. We present a comprehensive analysis of these interactions in protein structures by surveying a large database of protein structures. Salt bridges between Asp or Glu and His, Arg, or Lys display extremely well-defined geometric preferences. Several previously observed preferences are confirmed, and others that were previously unrecognized are discovered. Salt bridges are explored for their preferences for different separations in sequence and in space, geometric preferences within proteins and at protein-protein interfaces, co-operativity in networked salt bridges, inclusion within metal-binding sites, preference for acidic electrons, apparent conformational side chain entropy reduction on formation, and degree of burial. Salt bridges occur far more frequently between residues at close than distant sequence separations, but, at close distances, there remain strong preferences for salt bridges at specific separations. Specific types of complex salt bridges, involving three or more members, are also discovered. As we observe a strong relationship between the propensity to form a salt bridge and the placement of salt-bridging residues in protein sequences, we discuss the role that salt bridges might play in kinetically influencing protein folding and thermodynamically stabilizing the native conformation. We also develop a quantitative method to select appropriate crystal structure resolution and B-factor cutoffs. Detailed knowledge of these geometric and sequence dependences should aid de novo design and prediction algorithms. Copyright © 2010 Wiley-Liss, Inc.
Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M.
2013-01-01
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs. PMID:24688703
Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M
2013-01-01
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.
Molecular evolution of miraculin-like proteins in soybean Kunitz super-family.
Selvakumar, Purushotham; Gahloth, Deepankar; Tomar, Prabhat Pratap Singh; Sharma, Nidhi; Sharma, Ashwani Kumar
2011-12-01
Miraculin-like proteins (MLPs) belong to soybean Kunitz super-family and have been characterized from many plant families like Rutaceae, Solanaceae, Rubiaceae, etc. Many of them possess trypsin inhibitory activity and are involved in plant defense. MLPs exhibit significant sequence identity (~30-95%) to native miraculin protein, also belonging to Kunitz super-family compared with a typical Kunitz family member (~30%). The sequence and structure-function comparison of MLPs with that of a classical Kunitz inhibitor have demonstrated that MLPs have evolved to form a distinct group within Kunitz super-family. Sequence analysis of new genes along with available MLP sequences in the literature revealed three major groups for these proteins. A significant feature of Rutaceae MLP type 2 sequences is the presence of phosphorylation motif. Subtle changes are seen in putative reactive loop residues among different MLPs suggesting altered specificities to specific proteases. In phylogenetic analysis, Rutaceae MLP type 1 and type 2 proteins clustered together on separate branches, whereas native miraculin along with other MLPs formed distinct clusters. Site-specific positive Darwinian selection was observed at many sites in both the groups of Rutaceae MLP sequences with most of the residues undergoing positive selection located in loop regions. The results demonstrate the sequence and thereby the structure-function divergence of MLPs as a distinct group within soybean Kunitz super-family due to biotic and abiotic stresses of local environment.
Site-Specific Pyrolysis Induced Cleavage at Aspartic Acid Residue in Peptides and Proteins
Zhang, Shaofeng; Basile, Franco
2011-01-01
A simple and site-specific non-enzymatic method based on pyrolysis has been developed to cleave peptides and proteins. Pyrolytic cleavage was found to be specific and rapid as it induced a cleavage at the C-terminal side of aspartic acid in the temperature range of 220–250 °C in 10 seconds. Electrospray Ionization (ESI) mass spectrometry (MS) and tandem-MS (MS/MS) were used to characterize and identify pyrolysis cleavage products, confirming that sequence information is conserved after the pyrolysis process in both peptides and protein tested. This suggests that pyrolysis-induced cleavage at aspartyl residues can be used as a rapid protein digestion procedure for the generation of sequence specific protein biomarkers. PMID:17388620
Stable isotope, site-specific mass tagging for protein identification
Chen, Xian
2006-10-24
Proteolytic peptide mass mapping as measured by mass spectrometry provides an important method for the identification of proteins, which are usually identified by matching the measured and calculated m/z values of the proteolytic peptides. A unique identification is, however, heavily dependent upon the mass accuracy and sequence coverage of the fragment ions generated by peptide ionization. The present invention describes a method for increasing the specificity, accuracy and efficiency of the assignments of particular proteolytic peptides and consequent protein identification, by the incorporation of selected amino acid residue(s) enriched with stable isotope(s) into the protein sequence without the need for ultrahigh instrumental accuracy. Selected amino acid(s) are labeled with .sup.13C/.sup.15N/.sup.2H and incorporated into proteins in a sequence-specific manner during cell culturing. Each of these labeled amino acids carries a defined mass change encoded in its monoisotopic distribution pattern. Through their characteristic patterns, the peptides with mass tag(s) can then be readily distinguished from other peptides in mass spectra. The present method of identifying unique proteins can also be extended to protein complexes and will significantly increase data search specificity, efficiency and accuracy for protein identifications.
Cell density signal protein suitable for treatment of connective tissue injuries and defects
Schwarz, Richard I.
2002-08-13
Identification, isolation and partial sequencing of a cell density protein produced by fibroblastic cells. The cell density signal protein comprising a 14 amino acid peptide or a fragment, variant, mutant or analog thereof, the deduced cDNA sequence from the 14 amino acid peptide, a recombinant protein, protein and peptide-specific antibodies, and the use of the peptide and peptide-specific antibodies as therapeutic agents for regulation of cell differentiation and proliferation. A method for treatment and repair of connective tissue and tendon injuries, collagen deficiency, and connective tissue defects.
Bonham, Andrew J.; Wenta, Nikola; Osslund, Leah M.; Prussin, Aaron J.; Vinkemeier, Uwe; Reich, Norbert O.
2013-01-01
The DNA-binding specificity and affinity of the dimeric human transcription factor (TF) STAT1, were assessed by total internal reflectance fluorescence protein-binding microarrays (TIRF-PBM) to evaluate the effects of protein phosphorylation, higher-order polymerization and small-molecule inhibition. Active, phosphorylated STAT1 showed binding preferences consistent with prior characterization, whereas unphosphorylated STAT1 showed a weak-binding preference for one-half of the GAS consensus site, consistent with recent models of STAT1 structure and function in response to phosphorylation. This altered-binding preference was further tested by use of the inhibitor LLL3, which we show to disrupt STAT1 binding in a sequence-dependent fashion. To determine if this sequence-dependence is specific to STAT1 and not a general feature of human TF biology, the TF Myc/Max was analysed and tested with the inhibitor Mycro3. Myc/Max inhibition by Mycro3 is sequence independent, suggesting that the sequence-dependent inhibition of STAT1 may be specific to this system and a useful target for future inhibitor design. PMID:23180800
Luzio, J P; Brake, B; Banting, G; Howell, K E; Braghetta, P; Stanley, K K
1990-01-01
Organelle-specific integral membrane proteins were identified by a novel strategy which gives rise to monospecific antibodies to these proteins as well as to the cDNA clones encoding them. A cDNA expression library was screened with a polyclonal antiserum raised against Triton X-114-extracted organelle proteins and clones were then grouped using antibodies affinity-purified on individual fusion proteins. The identification, molecular cloning and sequencing are described of a type 1 membrane protein (TGN38) which is located specifically in the trans-Golgi network. Images Fig. 1. Fig. 3. PMID:2204342
Wolf, Maxim Y; Wolf, Yuri I; Koonin, Eugene V
2008-01-01
Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude. Conclusion Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution. Reviewers This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section. PMID:18840284
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Blanden, Melanie J; Suazo, Kiall F; Hildebrandt, Emily R; Hardgrove, Daniel S; Patel, Meet; Saunders, William P; Distefano, Mark D; Schmidt, Walter K; Hougland, James L
2018-02-23
Protein prenylation is a post-translational modification that has been most commonly associated with enabling protein trafficking to and interaction with cellular membranes. In this process, an isoprenoid group is attached to a cysteine near the C terminus of a substrate protein by protein farnesyltransferase (FTase) or protein geranylgeranyltransferase type I or II (GGTase-I and GGTase-II). FTase and GGTase-I have long been proposed to specifically recognize a four-amino acid C AAX C-terminal sequence within their substrates. Surprisingly, genetic screening reveals that yeast FTase can modify sequences longer than the canonical C AAX sequence, specifically C( x ) 3 X sequences with four amino acids downstream of the cysteine. Biochemical and cell-based studies using both peptide and protein substrates reveal that mammalian FTase orthologs can also prenylate C( x ) 3 X sequences. As the search to identify physiologically relevant C( x ) 3 X proteins begins, this new prenylation motif nearly doubles the number of proteins within the yeast and human proteomes that can be explored as potential FTase substrates. This work expands our understanding of prenylation's impact within the proteome, establishes the biologically relevant reactivity possible with this new motif, and opens new frontiers in determining the impact of non-canonically prenylated proteins on cell function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
Bandeira, Nuno; Clauser, Karl R; Pevzner, Pavel A
2007-07-01
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats
de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas
2015-01-01
Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363
The C-Terminal Sequence of RhoB Directs Protein Degradation through an Endo-Lysosomal Pathway
Ramos, Irene; Herrera, Mónica; Stamatakis, Konstantinos
2009-01-01
Background Protein degradation is essential for cell homeostasis. Targeting of proteins for degradation is often achieved by specific protein sequences or posttranslational modifications such as ubiquitination. Methodology/Principal Findings By using biochemical and genetic tools we have monitored the localization and degradation of endogenous and chimeric proteins in live primary cells by confocal microscopy and ultra-structural analysis. Here we identify an eight amino acid sequence from the C-terminus of the short-lived GTPase RhoB that directs the rapid degradation of both RhoB and chimeric proteins bearing this sequence through a lysosomal pathway. Elucidation of the RhoB degradation pathway unveils a mechanism dependent on protein isoprenylation and palmitoylation that involves sorting of the protein into multivesicular bodies, mediated by the ESCRT machinery. Moreover, RhoB sorting is regulated by late endosome specific lipid dynamics and is altered in human genetic lipid traffic disease. Conclusions/Significance Our findings characterize a short-lived cytosolic protein that is degraded through a lysosomal pathway. In addition, we define a novel motif for protein sorting and rapid degradation, which allows controlling protein levels by means of clinically used drugs. PMID:19956591
MIPS: a database for genomes and protein sequences
Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.
Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; ...
2015-03-09
There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less
GABARAPL1 antibodies: target one protein, get one free!
Le Grand, Jaclyn Nicole; Chakrama, Fatima Zahra; Seguin-Py, Stéphanie; Fraichard, Annick; Delage-Mourroux, Régis; Jouvenot, Michèle; Risold, Pierre-Yves; Boyer-Guittaut, Michaël
2011-11-01
Atg8 is a yeast protein involved in the autophagic process and in particular in the elongation of autophagosomes. In mammals, several orthologs have been identified and are classed into two subfamilies: the LC3 subfamily and the GABARAP subfamily, referred to simply as the LC3 or GABARAP families. GABARAPL1 (GABARAP-like protein 1), one of the proteins belonging to the GABARAP (GABA(A) receptor-associated protein) family, is highly expressed in the central nervous system and implicated in processes such as receptor and vesicle transport as well as autophagy. The proteins that make up the GABARAP family demonstrate conservation of their amino acid sequences and protein structures. In humans, GABARAPL1 shares 86% identity with GABARAP and 61% with GABARAPL2 (GATE-16). The identification of the individual proteins is thus very limited when working in vivo due to a lack of unique peptide sequences from which specific antibodies can be developed. Actually, and to our knowledge, there are no available antibodies on the market that are entirely specific to GABARAPL1 and the same may be true of the anti-GABARAP antibodies. In this study, we sought to examine the specificity of three antibodies targeted against different peptide sequences within GABARAPL1: CHEM-CENT (an antibody raised against a short peptide sequence within the center of the protein), PTG-NTER (an antibody raised against the N-terminus of the protein) and PTG-FL (an antibody raised against the full-length protein). The results described in this article demonstrate the importance of testing antibody specificity under the conditions for which it will be used experimentally, a caution that should be taken when studying the expression of the GABARAP family proteins.
Zhao, A; Guo, A; Liu, Z; Pape, L
1997-01-01
The coding sequences for a Schizosaccharomyces pombe sequence-specific DNA binding protein, Reb1p, have been cloned. The predicted S. pombe Reb1p is 24-29% identical to mouse TTF-1 (transcription termination factor-1) and Saccharomyces cerevisiae REB1 protein, both of which direct termination of RNA polymerase I catalyzed transcripts. The S.pombe Reb1 cDNA encodes a predicted polypeptide of 504 amino acids with a predicted molecular weight of 58.4 kDa. The S. pombe Reb1p is unusual in that the bipartite DNA binding motif identified originally in S.cerevisiae and Klyveromyces lactis REB1 proteins is uninterrupted and thus S.pombe Reb1p may contain the smallest natural REB1 homologous DNA binding domain. Its genomic coding sequences were shown to be interrupted by two introns. A recombinant histidine-tagged Reb1 protein bearing the rDNA binding domain has two homologous, sequence-specific binding sites in the S. pomber DNA intergenic spacer, located between 289 and 480 nt downstream of the end of the approximately 25S rRNA coding sequences. Each binding site is 13-14 bp downstream of two of the three proposed in vivo termination sites. The core of this 17 bp site, AGGTAAGGGTAATGCAC, is specifically protected by Reb1p in footprinting analysis. PMID:9016645
NASA Technical Reports Server (NTRS)
Gatlin, L. L.
1974-01-01
Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
Ollikainen, Noah; de Jong, René M; Kortemme, Tanja
2015-01-01
Interactions between small molecules and proteins play critical roles in regulating and facilitating diverse biological functions, yet our ability to accurately re-engineer the specificity of these interactions using computational approaches has been limited. One main difficulty, in addition to inaccuracies in energy functions, is the exquisite sensitivity of protein-ligand interactions to subtle conformational changes, coupled with the computational problem of sampling the large conformational search space of degrees of freedom of ligands, amino acid side chains, and the protein backbone. Here, we describe two benchmarks for evaluating the accuracy of computational approaches for re-engineering protein-ligand interactions: (i) prediction of enzyme specificity altering mutations and (ii) prediction of sequence tolerance in ligand binding sites. After finding that current state-of-the-art "fixed backbone" design methods perform poorly on these tests, we develop a new "coupled moves" design method in the program Rosetta that couples changes to protein sequence with alterations in both protein side-chain and protein backbone conformations, and allows for changes in ligand rigid-body and torsion degrees of freedom. We show significantly increased accuracy in both predicting ligand specificity altering mutations and binding site sequences. These methodological improvements should be useful for many applications of protein-ligand design. The approach also provides insights into the role of subtle conformational adjustments that enable functional changes not only in engineering applications but also in natural protein evolution.
Structure-related statistical singularities along protein sequences: a correlation study.
Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro
2005-01-01
A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.
A Versatile Platform for Nanotechnology Based on Circular Permutation of a Chaperonin Protein
NASA Technical Reports Server (NTRS)
Paavola, Chad; McMillan, Andrew; Trent, Jonathan; Chan, Suzanne; Mazzarella, Kellen; Li, Yi-Fen
2004-01-01
A number of protein complexes have been developed as nanoscale templates. These templates can be functionalized using the peptide sequences that bind inorganic materials. However, it is difficult to integrate peptides into a specific position within a protein template. Integrating intact proteins with desirable binding or catalytic activities is an even greater challenge. We present a general method for modifying protein templates using circular permutation so that additional peptide sequence can be added in a wide variety of specific locations. Circular permutation is a reordering of the polypeptide chain such that the original termini are joined and new termini are created elsewhere in the protein. New sequence can be joined to the protein termini without perturbing the protein structure and with minimal limitation on the size and conformation of the added sequence. We have used this approach to modify a chaperonin protein template, placing termini at five different locations distributed across the surface of the protein complex. These permutants are competent to form the double-ring structures typical of chaperonin proteins. The permuted double-rings also form the same assemblies as the unmodified protein. We fused a fluorescent protein to two representative permutants and demonstrated that it assumes its active structure and does not interfere with assembly of chaperonin double-rings.
Gans, Jonathan; Osborne, Jonathan; Cheng, Juliet; Djapgne, Louise; Oglesby-Sherrouse, Amanda G
2018-01-01
Bacterial small RNA molecules (sRNAs) are increasingly recognized as central regulators of bacterial stress responses and pathogenesis. In many cases, RNA-binding proteins are critical for the stability and function of sRNAs. Previous studies have adopted strategies to genetically tag an sRNA of interest, allowing isolation of RNA-protein complexes from cells. Here we present a sequence-specific affinity purification protocol that requires no prior genetic manipulation of bacterial cells, allowing isolation of RNA-binding proteins bound to native RNA molecules.
McKinney, Nancy
2002-01-01
PCR (polymerase chain reaction) primers for the detection of certain Bacillus species, such as Bacillus anthracis. The primers specifically amplify only DNA found in the target species and can distinguish closely related species. Species-specific PCR primers for Bacillus anthracis, Bacillus globigii and Clostridium perfringens are disclosed. The primers are directed to unique sequences within sasp (small acid soluble protein) genes.
Strategies to Improve Efficiency and Specificity of Degenerate Primers in PCR.
Campos, Maria Jorge; Quesada, Alberto
2017-01-01
PCR with degenerate primers can be used to identify the coding sequence of an unknown protein or to detect a genetic variant within a gene family. These primers, which are complex mixtures of slightly different oligonucleotide sequences, can be optimized to increase the efficiency and/or specificity of PCR in the amplification of a sequence of interest by the introduction of mismatches with the target sequence and balancing their position toward the primers 5'- or 3'-ends. In this work, we explain in detail examples of rational design of primers in two different applications, including the use of specific determinants at the 3'-end, to: (1) improve PCR efficiency with coding sequences for members of a protein family by fully degeneration at a core box of conserved genetic information, with the reduction of degeneration at the 5'-end, and (2) optimize specificity of allelic discrimination of closely related orthologous by 5'-end degenerate primers.
The 87-kD A gamma-globin enhancer-binding protein is a product of the HOXB2(HOX2H) locus.
Sengupta, P K; Lavelle, D E; DeSimone, J
1994-03-01
Developmental regulation of globin gene expression may be controlled by developmental stage-specific nuclear proteins that influence interactions between the locus control region and local regulatory sequences near individual globin genes. We previously isolated an 87-kD nuclear protein from K562 cells that bound to DNA sequences in the beta-globin locus control region, gamma-globin promoter, and A gamma-globin enhancer. The presence of this protein in fetal globin-expressing cells and its absence in adult globin-expressing cells suggested that it may be a developmental stage-specific factor. A lambda gt11 K562 cDNA clone encoding a portion of the HOXB2 (formerly HOX2H) homeobox gene was isolated on the basis of the ability of its beta-galactosidase fusion protein to bind to the same DNA sequences as the 87-kD K562 protein. Because no other relationship had been established between the 87-kD K562 protein and the HOXB2 protein other than their ability to bind ot the same DNA sequences, we have investigated whether the two proteins are related antigenically. Our data show that antisera produced against the HOXB2-beta-gal fusion protein and a synthetic HOXB2 decapeptide react specifically with an 87-kD protein from K562 nuclear extract, showing that the 87-kD K562 nuclear protein is a product of the HOXB2 locus, and is the first demonstration of cellular HOXB2 protein.
Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.
Pietrowski, D; Förster, M
2000-01-01
The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).
SIBIS: a Bayesian model for inconsistent protein sequence estimation.
Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D
2014-09-01
The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.
de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas
2015-11-16
Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Specification of anteroposterior cell fates in Caenorhabditis elegans by Drosophila Hox proteins.
Hunter, C P; Kenyon, C
1995-09-21
Antennapedia class homeobox (Hox) genes specify cell fates in successive anteroposterior body domains in vertebrates, insects and nematodes. The DNA-binding homeodomain sequences are very similar between vertebrate and Drosophila Hox proteins, and this similarity allows vertebrate Hox proteins to function in Drosophila. In contrast, the Caenorhabditis elegans homeodomains are substantially divergent. Further, C. elegans differs from both insects and vertebrates in having a non-segmented body as well as a distinctive mode of development that involves asymmetric early cleavages and invariant cell lineages. Here we report that, despite these differences, Drosophila Hox proteins expressed in C. elegans can substitute for C. elegans Hox proteins in the control of three different cell-fate decisions: the regulation of cell migration, the specification of serotonergic neurons, and the specification of a sensory structure. We also show that the specificity of one C. elegans Hox protein is partly determined by two amino acids that have been implicated in sequence-specific DNA binding. Together these findings suggest that factors important for target recognition by specific Hox proteins have been conserved throughout much of the animal kingdom.
Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas
2014-01-01
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881
A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions
Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth
2016-01-01
Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences. PMID:27552220
Luo, Si-Wei; Liang, Zhi; Wu, Jia-Rui
2017-01-01
Quantitatively detecting correlations of multiple protein-protein interactions (PPIs) in vivo is a big challenge. Here we introduce a novel method, termed Protein-interactome Footprinting (PiF), to simultaneously measure multiple PPIs in one cell. The principle of PiF is that each target physical PPI in the interactome is simultaneously transcoded into a specific DNA sequence based on dimerization of the target proteins fused with DNA-binding domains. The interaction intensity of each target protein is quantified as the copy number of the specific DNA sequences bound by each fusion protein dimers. Using PiF, we quantitatively reveal dynamic patterns of PPIs and their correlation network in E. coli two-component systems. PMID:28338015
Bhat, Abhay Prasad; Shin, Minsang; Choy, Hyon E
2014-07-01
Histone-like nucleoid structuring protein (H-NS) is a small but abundant protein present in enteric bacteria and is involved in compaction of the DNA and regulation of the transcription. Recent reports have suggested that H-NS binds to a specific AT rich DNA sequence than to intrinsically curved DNA in sequence independent manner. We detected two high-specificity H-NS binding sites in LEE5 promoter of EPEC centered at -110 and -138, which were close to the proposed consensus H-NS binding motif. To identify H-NS binding sequence in LEE5 promoter, we took a random mutagenesis approach and found the mutations at around -138 were specifically defective in the regulation by H-NS. It was concluded that H-NS exerts maximum repression via the specific sequence at around -138 and subsequently contacts a subunit of RNAP through oligomerization.
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-01-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. PMID:24792163
Clark, A M; Jacobsen, K R; Bostwick, D E; Dannenhoffer, J M; Skaggs, M I; Thompson, G A
1997-07-01
Sieve elements in the phloem of most angiosperms contain proteinaceous filaments and aggregates called P-protein. In the genus Cucurbita, these filaments are composed of two major proteins: PP1, the phloem filament protein, and PP2, the phloem lactin. The gene encoding the phloem filament protein in pumpkin (Cucurbita maxima Duch.) has been isolated and characterized. Nucleotide sequence analysis of the reconstructed gene gPP1 revealed a continuous 2430 bp protein coding sequence, with no introns, encoding an 809 amino acid polypeptide. The deduced polypeptide had characteristics of PP1 and contained a 15 amino acid sequence determined by N-terminal peptide sequence analysis of PP1. The sequence of PP1 was highly repetitive with four 200 amino acid sequence domains containing structural motifs in common with cysteine proteinase inhibitors. Expression of the PP1 gene was detected in roots, hypocotyls, cotyledons, stems, and leaves of pumpkin plants. PP1 and its mRNA accumulated in pumpkin hypocotyls during the period of rapid hypocotyl elongation after which mRNA levels declined, while protein levels remained elevated. PP1 was immunolocalized in slime plugs and P-protein bodies in sieve elements of the phloem. Occasionally, PP1 was detected in companion cells. PP1 mRNA was localized by in situ hybridization in companion cells at early stages of vascular differentiation. The developmental accumulation and localization of PP1 and its mRNA paralleled the phloem lactin, further suggesting an interaction between these phloem-specific proteins.
Lu, Stephen M.; Lu, Wuyuan; Qasim, M. A.; Anderson, Stephen; Apostol, Izydor; Ardelt, Wojciech; Bigler, Theresa; Chiang, Yi Wen; Cook, James; James, Michael N. G.; Kato, Ikunoshin; Kelly, Clyde; Kohr, William; Komiyama, Tomoko; Lin, Tiao-Yin; Ogawa, Michio; Otlewski, Jacek; Park, Soon-Jae; Qasim, Sabiha; Ranjbar, Michael; Tashiro, Misao; Warne, Nicholas; Whatley, Harry; Wieczorek, Anna; Wieczorek, Maciej; Wilusz, Tadeusz; Wynn, Richard; Zhang, Wenlei; Laskowski, Michael
2001-01-01
An additivity-based sequence to reactivity algorithm for the interaction of members of the Kazal family of protein inhibitors with six selected serine proteinases is described. Ten consensus variable contact positions in the inhibitor were identified, and the 19 possible variants at each of these positions were expressed. The free energies of interaction of these variants and the wild type were measured. For an additive system, this data set allows for the calculation of all possible sequences, subject to some restrictions. The algorithm was extensively tested. It is exceptionally fast so that all possible sequences can be predicted. The strongest, the most specific possible, and the least specific inhibitors were designed, and an evolutionary problem was solved. PMID:11171964
Principles of protein folding--a perspective from simple exact models.
Dill, K. A.; Bromberg, S.; Yue, K.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S.
1995-01-01
General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse. PMID:7613459
Majumder, P; Choudhury, A; Banerjee, M; Lahiri, A; Bhattacharyya, N P
2007-08-01
To investigate the mechanism of increased expression of caspase-1 caused by exogenous Hippi, observed earlier in HeLa and Neuro2A cells, in this work we identified a specific motif AAAGACATG (- 101 to - 93) at the caspase-1 gene upstream sequence where HIPPI could bind. Various mutations in this specific sequence compromised the interaction, showing the specificity of the interactions. In the luciferase reporter assay, when the reporter gene was driven by caspase-1 gene upstream sequences (- 151 to - 92) with the mutation G to T at position - 98, luciferase activity was decreased significantly in green fluorescent protein-Hippi-expressing HeLa cells in comparison to that obtained with the wild-type caspase-1 gene 60 bp upstream sequence, indicating the biological significance of such binding. It was observed that the C-terminal 'pseudo' death effector domain of HIPPI interacted with the 60 bp (- 151 to - 92) upstream sequence of the caspase-1 gene containing the motif. We further observed that expression of caspase-8 and caspase-10 was increased in green fluorescent protein-Hippi-expressing HeLa cells. In addition, HIPPI interacted in vitro with putative promoter sequences of these genes, containing a similar motif. In summary, we identified a novel function of HIPPI; it binds to specific upstream sequences of the caspase-1, caspase-8 and caspase-10 genes and alters the expression of the genes. This result showed the motif-specific interaction of HIPPI with DNA, and indicates that it could act as transcription regulator.
Takai, T; Nishita, Y; Iguchi-Ariga, S M; Ariga, H
1994-01-01
We have previously reported the human cDNA encoding MSSP-1, a sequence-specific double- and single-stranded DNA binding protein [Negishi, Nishita, Saëgusa, Kakizaki, Galli, Kihara, Tamai, Miyajima, Iguchi-Ariga and Ariga (1994) Oncogene, 9, 1133-1143]. MSSP-1 binds to a DNA replication origin/transcriptional enhancer of the human c-myc gene and has turned out to be identical with Scr2, a human protein which complements the defect of cdc2 kinase in S.pombe [Kataoka and Nojima (1994) Nucleic Acid Res., 22, 2687-2693]. We have cloned the cDNA for MSSP-2, another member of the MSSP family of proteins. The MSSP-2 cDNA shares highly homologous sequences with MSSP-1 cDNA, except for the insertion of 48 bp coding 16 amino acids near the C-terminus. Like MSSP-1, MSSP-2 has RNP-1 consensus sequences. The results of the experiments using bacterially expressed MSSP-2, and its deletion mutants, as histidine fusion proteins suggested that the binding specificity of MSSP-2 to double- and single-stranded DNA is the same as that of MSSP-1, and that the RNP consensus sequences are required for the DNA binding of the protein. MSSP-2 stimulated the DNA replication of an SV40-derived plasmid containing the binding sequence for MSSP-1 or -2. MSSP-2 is hence suggested to play an important role in regulation of DNA replication. Images PMID:7838710
Toward rules relating zinc finger protein sequences and DNA binding site preferences.
Desjarlais, J R; Berg, J M
1992-08-15
Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.
Takeshita, S; Kikuno, R; Tezuka, K; Amann, E
1993-01-01
A cDNA library prepared from the mouse osteoblastic cell line MC3T3-E1 was screened for the presence of specifically expressed genes by employing a combined subtraction hybridization/differential screening approach. A cDNA was identified and sequenced which encodes a protein designated osteoblast-specific factor 2 (OSF-2) comprising 811 amino acids. OSF-2 has a typical signal sequence, followed by a cysteine-rich domain, a fourfold repeated domain and a C-terminal domain. The protein lacks a typical transmembrane region. The fourfold repeated domain of OSF-2 shows homology with the insect protein fasciclin I. RNA analyses revealed that OSF-2 is expressed in bone and to a lesser extent in lung, but not in other tissues. Mouse OSF-2 cDNA was subsequently used as a probe to clone the human counterpart. Mouse and human OSF-2 show a high amino acid sequence conservation except for the signal sequence and two regions in the C-terminal domain in which 'in-frame' insertions or deletions are observed, implying alternative splicing events. On the basis of the amino acid sequence homology with fasciclin I, we suggest that OSF-2 functions as a homophilic adhesion molecule in bone formation. Images Figure 3 Figure 4 Figure 5 Figure 6 PMID:8363580
Kemme, Catherine A; Esadze, Alexandre; Iwahara, Junji
2015-11-10
Functions of transcription factors require formation of specific complexes at particular sites in cis-regulatory elements of genes. However, chromosomal DNA contains numerous sites that are similar to the target sequences recognized by transcription factors. The influence of such "quasi-specific" sites on functions of the transcription factors is not well understood at present by experimental means. In this work, using fluorescence methods, we have investigated the influence of quasi-specific DNA sites on the efficiency of target location by the zinc finger DNA-binding domain of the inducible transcription factor Egr-1, which recognizes a 9 bp sequence. By stopped-flow assays, we measured the kinetics of Egr-1's association with a target site on 143 bp DNA in the presence of various competitor DNAs, including nonspecific and quasi-specific sites. The presence of quasi-specific sites on competitor DNA significantly decelerated the target association by the Egr-1 protein. The impact of the quasi-specific sites depended strongly on their affinity, their concentration, and the degree of their binding to the protein. To quantitatively describe the kinetic impact of the quasi-specific sites, we derived an analytical form of the apparent kinetic rate constant for the target association and used it for fitting to the experimental data. Our kinetic data with calf thymus DNA as a competitor suggested that there are millions of high-affinity quasi-specific sites for Egr-1 among the 3 billion bp of genomic DNA. This study quantitatively demonstrates that naturally abundant quasi-specific sites on DNA can considerably impede the target search processes of sequence-specific DNA-binding proteins.
Ishida, Yojiro; Park, Jung-Ho; Mao, Lili; Yamaguchi, Yoshihiro; Inouye, Masayori
2013-03-15
Replacement of a specific amino acid residue in a protein with nonnatural analogues is highly challenging because of their cellular toxicity. We demonstrate for the first time the replacement of all arginine (Arg) residues in a protein with canavanine (Can), a toxic Arg analogue. All Arg residues in the 5-base specific (UACAU) mRNA interferase from Bacillus subtilis (MazF-bs(arg)) were replaced with Can by using the single-protein production system in Escherichia coli. The resulting MazF-bs(can) gained a 6-base recognition sequence, UACAUA, for RNA cleavage instead of the 5-base sequence, UACAU, for MazF-bs(arg). Mass spectrometry analysis confirmed that all Arg residues were replaced with Can. The present system offers a novel approach to create new functional proteins by replacing a specific amino acid in a protein with its analogues.
Evolution of the alternative AQP2 gene: Acquisition of a novel protein-coding sequence in dolphins.
Kishida, Takushi; Suzuki, Miwa; Takayama, Asuka
2018-01-01
Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans. Copyright © 2017 Elsevier Inc. All rights reserved.
Characteristic motifs for families of allergenic proteins
Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner
2008-01-01
The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633
Anisimov, Andrey P; Panfertsev, Evgeniy A; Svetoch, Tat'yana E; Dentovskaya, Svetlana V
2007-01-01
Sequencing of lcrV genes and comparison of the deduced amino acid sequences from ten Y. pestis strains belonging mostly to the group of atypical rhamnose-positive isolates (non-pestis subspecies or pestoides group) showed that the LcrV proteins analyzed could be classified into five sequence types. This classification was based on major amino acid polymorphisms among LcrV proteins in the four "hot points" of the protein sequences. Some additional minor polymorphisms were found throughout these sequence types. The "hot points" corresponded to amino acids 18 (Lys --> Asn), 72 (Lys --> Arg), 273 (Cys --> Ser), and 324-326 (Ser-Gly-Lys --> Arg) in the LcrV sequence of the reference Y. pestis strain CO92. One possible explanation for polymorphism in amino acid sequences of LcrV among different strains is that strain-specific variation resulted from adaptation of the plague pathogen to different rodent and lagomorph hosts.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Kai; Roberts, Gareth A.; Stephanou, Augoustinos S.
2010-07-23
Research highlights: {yields} Successful fusion of GFP to M.EcoKI DNA methyltransferase. {yields} GFP located at C-terminal of sequence specificity subunit does not later enzyme activity. {yields} FRET confirms structural model of M.EcoKI bound to DNA. -- Abstract: We describe the fusion of enhanced green fluorescent protein to the C-terminus of the HsdS DNA sequence-specificity subunit of the Type I DNA modification methyltransferase M.EcoKI. The fusion expresses well in vivo and assembles with the two HsdM modification subunits. The fusion protein functions as a sequence-specific DNA methyltransferase protecting DNA against digestion by the EcoKI restriction endonuclease. The purified enzyme shows Foerstermore » resonance energy transfer to fluorescently-labelled DNA duplexes containing the target sequence and to fluorescently-labelled ocr protein, a DNA mimic that binds to the M.EcoKI enzyme. Distances determined from the energy transfer experiments corroborate the structural model of M.EcoKI.« less
2015-01-01
Functions of transcription factors require formation of specific complexes at particular sites in cis-regulatory elements of genes. However, chromosomal DNA contains numerous sites that are similar to the target sequences recognized by transcription factors. The influence of such “quasi-specific” sites on functions of the transcription factors is not well understood at present by experimental means. In this work, using fluorescence methods, we have investigated the influence of quasi-specific DNA sites on the efficiency of target location by the zinc finger DNA-binding domain of the inducible transcription factor Egr-1, which recognizes a 9 bp sequence. By stopped-flow assays, we measured the kinetics of Egr-1’s association with a target site on 143 bp DNA in the presence of various competitor DNAs, including nonspecific and quasi-specific sites. The presence of quasi-specific sites on competitor DNA significantly decelerated the target association by the Egr-1 protein. The impact of the quasi-specific sites depended strongly on their affinity, their concentration, and the degree of their binding to the protein. To quantitatively describe the kinetic impact of the quasi-specific sites, we derived an analytical form of the apparent kinetic rate constant for the target association and used it for fitting to the experimental data. Our kinetic data with calf thymus DNA as a competitor suggested that there are millions of high-affinity quasi-specific sites for Egr-1 among the 3 billion bp of genomic DNA. This study quantitatively demonstrates that naturally abundant quasi-specific sites on DNA can considerably impede the target search processes of sequence-specific DNA-binding proteins. PMID:26502071
Eliciting an antibody response against a recombinant TSH containing fusion protein.
Mard-Soltani, Maysam; Rasaee, Mohamad Javad; Sheikhi, AbdolKarim; Hedayati, Mehdi
2017-01-01
Designing novel antigens to rise specific antibodies for Thyroid Stimulating Hormone (TSH) detection is of great significance. A novel fusion protein consisting of the C termini sequence of TSH beta subunit and a fusion sequence was designed and produced for rabbit immunization. Thereafter, the produced antibodies were purified and characterized for TSH detection. Our results indicate that the produced antibody is capable of sensitive and specific detection of TSH with low cross reactivity. This study underscores the applicability of designed fusion protein for specific and sensitive polyclonal antibody production and the importance of selecting an amenable region of the TSH for immunization.
Multi-Harmony: detecting functional specificity from sequence alignment
Brandt, Bernd W.; Feenstra, K. Anton; Heringa, Jaap
2010-01-01
Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww. PMID:20525785
Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E
2015-01-01
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Sequence harmony: detecting functional specificity from alignments
Feenstra, K. Anton; Pirovano, Walter; Krab, Klaas; Heringa, Jaap
2007-01-01
Multiple sequence alignments are often used for the identification of key specificity-determining residues within protein families. We present a web server implementation of the Sequence Harmony (SH) method previously introduced. SH accurately detects subfamily specific positions from a multiple alignment by scoring compositional differences between subfamilies, without imposing conservation. The SH web server allows a quick selection of subtype specific sites from a multiple alignment given a subfamily grouping. In addition, it allows the predicted sites to be directly mapped onto a protein structure and displayed. We demonstrate the use of the SH server using the family of plant mitochondrial alternative oxidases (AOX). In addition, we illustrate the usefulness of combining sequence and structural information by showing that the predicted sites are clustered into a few distinct regions in an AOX homology model. The SH web server can be accessed at www.ibi.vu.nl/programs/seqharmwww. PMID:17584793
Functional specificity of a Hox protein mediated by the recognition of minor groove structure.
Joshi, Rohit; Passner, Jonathan M; Rohs, Remo; Jain, Rinku; Sosinsky, Alona; Crickmore, Michael A; Jacob, Vinitha; Aggarwal, Aneel K; Honig, Barry; Mann, Richard S
2007-11-02
The recognition of specific DNA-binding sites by transcription factors is a critical yet poorly understood step in the control of gene expression. Members of the Hox family of transcription factors bind DNA by making nearly identical major groove contacts via the recognition helices of their homeodomains. In vivo specificity, however, often depends on extended and unstructured regions that link Hox homeodomains to a DNA-bound cofactor, Extradenticle (Exd). Using a combination of structure determination, computational analysis, and in vitro and in vivo assays, we show that Hox proteins recognize specific Hox-Exd binding sites via residues located in these extended regions that insert into the minor groove but only when presented with the correct DNA sequence. Our results suggest that these residues, which are conserved in a paralog-specific manner, confer specificity by recognizing a sequence-dependent DNA structure instead of directly reading a specific DNA sequence.
Molecular Simulations of Sequence-Specific Association of Transmembrane Proteins in Lipid Bilayers
NASA Astrophysics Data System (ADS)
Doxastakis, Manolis; Prakash, Anupam; Janosi, Lorant
2011-03-01
Association of membrane proteins is central in material and information flow across the cellular membranes. Amino-acid sequence and the membrane environment are two critical factors controlling association, however, quantitative knowledge on such contributions is limited. In this work, we study the dimerization of helices in lipid bilayers using extensive parallel Monte Carlo simulations with recently developed algorithms. The dimerization of Glycophorin A is examined employing a coarse-grain model that retains a level of amino-acid specificity, in three different phospholipid bilayers. Association is driven by a balance of protein-protein and lipid-induced interactions with the latter playing a major role at short separations. Following a different approach, the effect of amino-acid sequence is studied using the four transmembrane domains of the epidermal growth factor receptor family in identical lipid environments. Detailed characterization of dimer formation and estimates of the free energy of association reveal that these helices present significant affinity to self-associate with certain dimers forming non-specific interfaces.
Kachhap, Sangita; Singh, Balvinder
2015-01-01
In most of homeodomain-DNA complexes, glutamine or lysine is present at 50th position and interacts with 5th and 6th nucleotide of core recognition region. Molecular dynamics simulations of Msx-1-DNA complex (Q50-TG) and its variant complexes, that is specific (Q50K-CC), nonspecific (Q50-CC) having mutation in DNA and (Q50K-TG) in protein, have been carried out. Analysis of protein-DNA interactions and structure of DNA in specific and nonspecific complexes show that amino acid residues use sequence-dependent shape of DNA to interact. The binding free energies of all four complexes were analysed to define role of amino acid residue at 50th position in terms of binding strength considering the variation in DNA on stability of protein-DNA complexes. The order of stability of protein-DNA complexes shows that specific complexes are more stable than nonspecific ones. Decomposition analysis shows that N-terminal amino acid residues have been found to contribute maximally in binding free energy of protein-DNA complexes. Among specific protein-DNA complexes, K50 contributes more as compared to Q50 towards binding free energy in respective complexes. The sequence dependence of local conformation of DNA enables Q50/Q50K to make hydrogen bond with nucleotide(s) of DNA. The changes in amino acid sequence of protein are accommodated and stabilized around TAAT core region of DNA having variation in nucleotides.
APOBEC3G Interacts with ssDNA by Two Modes: AFM Studies
NASA Astrophysics Data System (ADS)
Shlyakhtenko, Luda S.; Dutta, Samrat; Banga, Jaspreet; Li, Ming; Harris, Reuben S.; Lyubchenko, Yuri L.
2015-10-01
APOBEC3G (A3G) protein has antiviral activity against HIV and other pathogenic retroviruses. A3G has two domains: a catalytic C-terminal domain (CTD) that deaminates cytidine, and a N-terminal domain (NTD) that binds to ssDNA. Although abundant information exists about the biological activities of A3G protein, the interplay between sequence specific deaminase activity and A3G binding to ssDNA remains controversial. We used the topographic imaging and force spectroscopy modalities of Atomic Force Spectroscopy (AFM) to characterize the interaction of A3G protein with deaminase specific and nonspecific ssDNA substrates. AFM imaging demonstrated that A3G has elevated affinity for deaminase specific ssDNA than for nonspecific ssDNA. AFM force spectroscopy revealed two distinct binding modes by which A3G interacts with ssDNA. One mode requires sequence specificity, as demonstrated by stronger and more stable complexes with deaminase specific ssDNA than with nonspecific ssDNA. Overall these observations enforce prior studies suggesting that both domains of A3G contribute to the sequence specific binding of ssDNA.
APOBEC3G Interacts with ssDNA by Two Modes: AFM Studies.
Shlyakhtenko, Luda S; Dutta, Samrat; Banga, Jaspreet; Li, Ming; Harris, Reuben S; Lyubchenko, Yuri L
2015-10-27
APOBEC3G (A3G) protein has antiviral activity against HIV and other pathogenic retroviruses. A3G has two domains: a catalytic C-terminal domain (CTD) that deaminates cytidine, and a N-terminal domain (NTD) that binds to ssDNA. Although abundant information exists about the biological activities of A3G protein, the interplay between sequence specific deaminase activity and A3G binding to ssDNA remains controversial. We used the topographic imaging and force spectroscopy modalities of Atomic Force Spectroscopy (AFM) to characterize the interaction of A3G protein with deaminase specific and nonspecific ssDNA substrates. AFM imaging demonstrated that A3G has elevated affinity for deaminase specific ssDNA than for nonspecific ssDNA. AFM force spectroscopy revealed two distinct binding modes by which A3G interacts with ssDNA. One mode requires sequence specificity, as demonstrated by stronger and more stable complexes with deaminase specific ssDNA than with nonspecific ssDNA. Overall these observations enforce prior studies suggesting that both domains of A3G contribute to the sequence specific binding of ssDNA.
Jaiswal, Mamta; Dvorsky, Radovan; Ahmadian, Mohammad Reza
2013-02-08
The diffuse B-cell lymphoma (Dbl) family of the guanine nucleotide exchange factors is a direct activator of the Rho family proteins. The Rho family proteins are involved in almost every cellular process that ranges from fundamental (e.g. the establishment of cell polarity) to highly specialized processes (e.g. the contraction of vascular smooth muscle cells). Abnormal activation of the Rho proteins is known to play a crucial role in cancer, infectious and cognitive disorders, and cardiovascular diseases. However, the existence of 74 Dbl proteins and 25 Rho-related proteins in humans, which are largely uncharacterized, has led to increasing complexity in identifying specific upstream pathways. Thus, we comprehensively investigated sequence-structure-function-property relationships of 21 representatives of the Dbl protein family regarding their specificities and activities toward 12 Rho family proteins. The meta-analysis approach provides an unprecedented opportunity to broadly profile functional properties of Dbl family proteins, including catalytic efficiency, substrate selectivity, and signaling specificity. Our analysis has provided novel insights into the following: (i) understanding of the relative differences of various Rho protein members in nucleotide exchange; (ii) comparing and defining individual and overall guanine nucleotide exchange factor activities of a large representative set of the Dbl proteins toward 12 Rho proteins; (iii) grouping the Dbl family into functionally distinct categories based on both their catalytic efficiencies and their sequence-structural relationships; (iv) identifying conserved amino acids as fingerprints of the Dbl and Rho protein interaction; and (v) defining amino acid sequences conserved within, but not between, Dbl subfamilies. Therefore, the characteristics of such specificity-determining residues identified the regions or clusters conserved within the Dbl subfamilies.
Molecular Cloning of Drebrin: Progress and Perspectives.
Kojima, Nobuhiko
2017-01-01
Chicken drebrin isoforms were first identified in the optic tectum of developing brain. Although the time course of protein expression was different in each drebrin isoform, the similarity between their protein structures was suggested by biochemical analysis of purified protein. To determine their protein structures, the cloning of drebrin cDNAs was conducted. Comparison between the cDNA sequences shows that all drebrin cDNAs are identical except that the internal insertion sequences are present or absent in their sequences. Chicken drebrin are now classified into three isoforms, namely, drebrins E1, E2, and A. Genomic cloning demonstrated that the three isoforms are generated by an alternative splicing of individual exons encoding the insertion sequences from single drebrin gene. The mechanism should be precisely regulated in cell-type-specific and developmental stage-specific fashion. Drebrin protein, which is well conserved in various vertebrate species, although mammalian drebrin has only two isoforms, namely, drebrin E and drebrin A, is different from chicken drebrin that has three isoforms. Drebrin belongs to an actin-depolymerizing factor homology (ADF-H) domain protein family. Besides the ADF-H domain, drebrin has other domains, including the actin-binding domain and Homer-binding motifs. Diversity of protein isoform and multiple domains of drebrin could interact differentially with the actin cytoskeleton and other intracellular proteins and regulate diverse cellular processes.
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-06-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Isalan, M; Klug, A; Choo, Y
2001-07-01
DNA-binding domains with predetermined sequence specificity are engineered by selection of zinc finger modules using phage display, allowing the construction of customized transcription factors. Despite remarkable progress in this field, the available protein-engineering methods are deficient in many respects, thus hampering the applicability of the technique. Here we present a rapid and convenient method that can be used to design zinc finger proteins against a variety of DNA-binding sites. This is based on a pair of pre-made zinc finger phage-display libraries, which are used in parallel to select two DNA-binding domains each of which recognizes given 5 base pair sequences, and whose products are recombined to produce a single protein that recognizes a composite (9 base pair) site of predefined sequence. Engineering using this system can be completed in less than two weeks and yields proteins that bind sequence-specifically to DNA with Kd values in the nanomolar range. To illustrate the technique, we have selected seven different proteins to bind various regions of the human immunodeficiency virus 1 (HIV-1) promoter.
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-12-27
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Bown, David P; Gatehouse, John A
2004-05-01
Carboxypeptidases were purified from guts of larvae of corn earworm (Helicoverpa armigera), a lepidopteran crop pest, by affinity chromatography on immobilized potato carboxypeptidase inhibitor, and characterized by N-terminal sequencing. A larval gut cDNA library was screened using probes based on these protein sequences. cDNA HaCA42 encoded a carboxypeptidase with sequence similarity to enzymes of clan MC [Barrett, A. J., Rawlings, N. D. & Woessner, J. F. (1998) Handbook of Proteolytic Enzymes. Academic Press, London.], but with a novel predicted specificity towards C-terminal acidic residues. This carboxypeptidase was expressed as a recombinant proprotein in the yeast Pichia pastoris. The expressed protein could be activated by treatment with bovine trypsin; degradation of bound pro-region, rather than cleavage of pro-region from mature protein, was the rate-limiting step in activation. Activated HaCA42 carboxypeptidase hydrolysed a synthetic substrate for glutamate carboxypeptidases (FAEE, C-terminal Glu), but did not hydrolyse substrates for carboxypeptidase A or B (FAPP or FAAK, C-terminal Phe or Lys) or methotrexate, cleaved by clan MH glutamate carboxypeptidases. The enzyme was highly specific for C-terminal glutamate in peptide substrates, with slow hydrolysis of C-terminal aspartate also observed. Glutamate carboxypeptidase activity was present in larval gut extract from H. armigera. The HaCA42 protein is the first glutamate-specific metallocarboxypeptidase from clan MC to be identified and characterized. The genome of Drosophila melanogaster contains genes encoding enzymes with similar sequences and predicted specificity, and a cDNA encoding a similar enzyme has been isolated from gut tissue in tsetse fly. We suggest that digestive carboxypeptidases with sequence similarity to the classical mammalian enzymes, but with specificity towards C-terminal glutamate, are widely distributed in insects.
Coarse-grained sequences for protein folding and design.
Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa
2003-09-16
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.
Coarse-grained sequences for protein folding and design
Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa
2003-01-01
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815
NASA Astrophysics Data System (ADS)
Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.
1984-08-01
A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.
Cysteine-containing peptide tag for site-specific conjugation of proteins
Backer, Marina V.; Backer, Joseph M.
2008-04-08
The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety bound to the targeting moiety; the biological conjugate having a covalent bond between the thiol group of SEQ ID NO:2 and a functional group in the binding moiety. The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety that comprises an adapter protein, the adapter protein having a thiol group; the biological conjugate having a disulfide bond between the thiol group of SEQ ID NO:2 and the thiol group of the adapter protein. The present invention is also directed to biological sequences employed in the above biological conjugates, as well as pharmaceutical preparations and methods using the above biological conjugates.
Cysteine-containing peptide tag for site-specific conjugation of proteins
Backer, Marina V.; Backer, Joseph M.
2010-10-05
The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety bound to the targeting moiety; the biological conjugate having a covalent bond between the thiol group of SEQ ID NO:2 and a functional group in the binding moiety. The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety that comprises an adapter protein, the adapter protein having a thiol group; the biological conjugate having a disulfide bond between the thiol group of SEQ ID NO:2 and the thiol group of the adapter protein. The present invention is also directed to biological sequences employed in the above biological conjugates, as well as pharmaceutical preparations and methods using the above biological conjugates.
Wang, Lei; You, Zhu-Hong; Chen, Xing; Li, Jian-Qiang; Yan, Xin; Zhang, Wei; Huang, Yu-An
2017-01-01
Protein–Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use. PMID:28029645
Specificity determinants for the abscisic acid response element.
Sarkar, Aditya Kumar; Lahiri, Ansuman
2013-01-01
Abscisic acid (ABA) response elements (ABREs) are a group of cis-acting DNA elements that have been identified from promoter analysis of many ABA-regulated genes in plants. We are interested in understanding the mechanism of binding specificity between ABREs and a class of bZIP transcription factors known as ABRE binding factors (ABFs). In this work, we have modeled the homodimeric structure of the bZIP domain of ABRE binding factor 1 from Arabidopsis thaliana (AtABF1) and studied its interaction with ACGT core motif-containing ABRE sequences. We have also examined the variation in the stability of the protein-DNA complex upon mutating ABRE sequences using the protein design algorithm FoldX. The high throughput free energy calculations successfully predicted the ability of ABF1 to bind to alternative core motifs like GCGT or AAGT and also rationalized the role of the flanking sequences in determining the specificity of the protein-DNA interaction.
Lee, K K; Paranchych, W; Hodges, R S
1990-01-01
Antipeptide antibodies were raised against synthetic peptides corresponding to the amino acid sequences of eight surface predicted regions of the pilin proteins from Pseudomonas aeruginosa PAK and PAO. Four of the anti-PAK peptide antisera cross-reacted with strain PAO pili, while five anti-PAO peptide antisera cross-reacted with strain PAK pili. Only one region of the two pilin proteins (region 88-97) provided strain-specific antibodies when either strain PAK or strain PAO region 88-97 peptides were used to generate antipeptide antibodies. Our results clearly showed that cross-reactive and strain-specific antibodies cannot be based solely on the degree of homology in the aligned protein sequences. The majority of synthetic peptides bound to their homologous antipilus antiserum, suggesting that linear sequences play a significant role in the immunogenic response of native pili. PMID:1974884
Szczyglowski, K; Hamburger, D; Kapranov, P; de Bruijn, F J
1997-01-01
A range of novel expressed sequence tags (ESTs) associated with late developmental events during nodule organogenesis in the legume Lotus japonicus were identified using mRNA differential display; 110 differentially displayed polymerase chain reaction products were cloned and analyzed. Of 88 unique cDNAs obtained, 22 shared significant homology to DNA/protein sequences in the respective databases. This group comprises, among others, a nodule-specific homolog of protein phosphatase 2C, a peptide transporter protein, and a nodule-specific form of cytochrome P450. RNA gel-blot analysis of 16 differentially displayed ESTs confirmed their nodule-specific expression pattern. The kinetics of mRNA accumulation of the majority of the ESTs analyzed were found to resemble the expression pattern observed for the L. japonicus leghemoglobin gene. These results indicate that the newly isolated molecular markers correspond to genes induced during late developmental stages of L. japonicus nodule organogenesis and provide important, novel tools for the study of nodulation. PMID:9276951
Specificity and non-specificity in RNA–protein interactions
Jankowsky, Eckhard; Harris, Michael E.
2016-01-01
Gene expression is regulated by complex networks of interactions between RNAs and proteins. Proteins that interact with RNA have been traditionally viewed as either specific or non-specific; specific proteins interact preferentially with defined RNA sequence or structure motifs, whereas non-specific proteins interact with RNA sites devoid of such characteristics. Recent studies indicate that the binary “specific vs. non-specific” classification is insufficient to describe the full spectrum of RNA–protein interactions. Here, we review new methods that enable quantitative measurements of protein binding to large numbers of RNA variants, and the concepts aimed as describing resulting binding spectra: affinity distributions, comprehensive binding models and free energy landscapes. We discuss how these new methodologies and associated concepts enable work towards inclusive, quantitative models for specific and non-specific RNA–protein interactions. PMID:26285679
Chen, Yen-Kuang; Li, Kuo-Bin
2013-02-07
The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt. Copyright © 2012 Elsevier Ltd. All rights reserved.
Morea, Edna G O; Viviescas, Maria Alejandra; Fernandes, Carlos A H; Matioli, Fabio F; Lira, Cristina B B; Fernandez, Maribel F; Moraes, Barbara S; da Silva, Marcelo S; Storti, Camila B; Fontes, Marcos R M; Cano, Maria Isabel N
2017-11-01
Leishmania spp. telomeres are composed of 5'-TTAGGG-3' repeats associated with proteins. We have previously identified LaRbp38 and LaRPA-1 as proteins that bind the G-rich telomeric strand. At that time, we had also partially characterized a protein: DNA complex, named LaGT1, but we could not identify its protein component. Using protein-DNA interaction and competition assays, we confirmed that LaGT1 is highly specific to the G-rich telomeric single-stranded DNA. Three protein bands, with LaGT1 activity, were isolated from affinity-purified protein extracts in-gel digested, and sequenced de novo using mass spectrometry analysis. In silico analysis of the digested peptide identified them as a putative calmodulin with sequences identical to the T. cruzi calmodulin. In the Leishmania genome, the calmodulin ortholog is present in three identical copies. We cloned and sequenced one of the gene copies, named it LCalA, and obtained the recombinant protein. Multiple sequence alignment and molecular modeling showed that LCalA shares homology to most eukaryotes calmodulin. In addition, we demonstrated that LCalA is nuclear, partially co-localizes with telomeres and binds in vivo the G-rich telomeric strand. Recombinant LCalA can bind specifically and with relative affinity to the G-rich telomeric single-strand and to a 3'G-overhang, and DNA binding is calcium dependent. We have described a novel candidate component of Leishmania telomeres, LCalA, a nuclear calmodulin that binds the G-rich telomeric strand with high specificity and relative affinity, in a calcium-dependent manner. LCalA is the first reported calmodulin that binds in vivo telomeric DNA. Copyright © 2017 Elsevier B.V. All rights reserved.
Pamp, Sünje J.; Harrington, Eoghan D.; Quake, Stephen R.; Relman, David A.; Blainey, Paul C.
2012-01-01
Segmented filamentous bacteria (SFB) are host-specific intestinal symbionts that comprise a distinct clade within the Clostridiaceae, designated Candidatus Arthromitus. SFB display a unique life cycle within the host, involving differentiation into multiple cell types. The latter include filaments that attach intimately to intestinal epithelial cells, and from which “holdfasts” and spores develop. SFB induce a multifaceted immune response, leading to host protection from intestinal pathogens. Cultivation resistance has hindered characterization of these enigmatic bacteria. In the present study, we isolated five SFB filaments from a mouse using a microfluidic device equipped with laser tweezers, generated genome sequences from each, and compared these sequences with each other, as well as to recently published SFB genome sequences. Based on the resulting analyses, SFB appear to be dependent on the host for a variety of essential nutrients. SFB have a relatively high abundance of predicted proteins devoted to cell cycle control and to envelope biogenesis, and have a group of SFB-specific autolysins and a dynamin-like protein. Among the five filament genomes, an average of 8.6% of predicted proteins were novel, including a family of secreted SFB-specific proteins. Four ADP-ribosyltransferase (ADPRT) sequence types, and a myosin-cross-reactive antigen (MCRA) protein were discovered; we hypothesize that they are involved in modulation of host responses. The presence of polymorphisms among mouse SFB genomes suggests the evolution of distinct SFB lineages. Overall, our results reveal several aspects of SFB adaptation to the mammalian intestinal tract. PMID:22434425
Ma, Jun; Wu, Kaiming; Zhao, Zhenxian; Miao, Rong; Xu, Zhe
2017-03-01
Esophageal squamous cell carcinoma is one of the most aggressive malignancies worldwide. Special AT-rich sequence binding protein 1 is a nuclear matrix attachment region binding protein which participates in higher order chromatin organization and tissue-specific gene expression. However, the role of special AT-rich sequence binding protein 1 in esophageal squamous cell carcinoma remains unknown. In this study, western blot and quantitative real-time polymerase chain reaction analysis were performed to identify differentially expressed special AT-rich sequence binding protein 1 in a series of esophageal squamous cell carcinoma tissue samples. The effects of special AT-rich sequence binding protein 1 silencing by two short-hairpin RNAs on cell proliferation, migration, and invasion were assessed by the CCK-8 assay and transwell assays in esophageal squamous cell carcinoma in vitro. Special AT-rich sequence binding protein 1 was significantly upregulated in esophageal squamous cell carcinoma tissue samples and cell lines. Silencing of special AT-rich sequence binding protein 1 inhibited the proliferation of KYSE450 and EC9706 cells which have a relatively high level of special AT-rich sequence binding protein 1, and the ability of migration and invasion of KYSE450 and EC9706 cells was distinctly suppressed. Special AT-rich sequence binding protein 1 could be a potential target for the treatment of esophageal squamous cell carcinoma and inhibition of special AT-rich sequence binding protein 1 may provide a new strategy for the prevention of esophageal squamous cell carcinoma invasion and metastasis.
Sequence patterns mediating functions of disordered proteins.
Exarchos, Konstantinos P; Kourou, Konstantina; Exarchos, Themis P; Papaloukas, Costas; Karamouzis, Michalis V; Fotiadis, Dimitrios I
2015-01-01
Disordered proteins lack specific 3D structure in their native state and have been implicated with numerous cellular functions as well as with the induction of severe diseases, e.g., cardiovascular and neurodegenerative diseases as well as diabetes. Due to their conformational flexibility they are often found to interact with a multitude of protein molecules; this one-to-many interaction which is vital for their versatile functioning involves short consensus protein sequences, which are normally detected using slow and cumbersome experimental procedures. In this work we exploit information from disorder-oriented protein interaction networks focused specifically on humans, in order to assemble, by means of overrepresentation, a set of sequence patterns that mediate the functioning of disordered proteins; hence, we are able to identify how a single protein achieves such functional promiscuity. Next, we study the sequential characteristics of the extracted patterns, which exhibit a striking preference towards a very limited subset of amino acids; specifically, residues leucine, glutamic acid, and serine are particularly frequent among the extracted patterns, and we also observe a nontrivial propensity towards alanine and glycine. Furthermore, based on the extracted patterns we set off to infer potential functional implications in order to verify our findings and potentially further extrapolate our knowledge regarding the functioning of disordered proteins. We observe that the extracted patterns are primarily involved with regulation, binding and posttranslational modifications, which constitute the most prominent functions of disordered proteins.
Guo, Deyin; Spetz, Carl; Saarma, Mart; Valkonen, Jari P T
2003-05-01
Potyviral helper-component proteinase (HCpro) is a multifunctional protein exerting its cellular functions in interaction with putative host proteins. In this study, cellular protein partners of the HCpro encoded by Potato virus A (PVA) (genus Potyvirus) were screened in a potato leaf cDNA library using a yeast two-hybrid system. Two cellular proteins were obtained that interact specifically with PVA HCpro in yeast and in the two in vitro binding assays used. Both proteins are encoded by single-copy genes in the potato genome. Analysis of the deduced amino acid sequences revealed that one (HIP1) of the two HCpro interactors is a novel RING finger protein. The sequence of the other protein (HIP2) showed no resemblance to the protein sequences available from databanks and has known biological functions.
Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation
Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392
Rapid identification of sequences for orphan enzymes to power accurate protein annotation.
Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.
A Method for WD40 Repeat Detection and Secondary Structure Prediction
Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong
2013-01-01
WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530
Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dong, Shuyun; Wang, Yang; Cassidy-Amstutz, Caleb
2011-10-28
Pumilio/fem-3 mRNA-binding factor (PUF) proteins possess a recognition code for bases A, U, and G, allowing designed RNA sequence specificity of their modular Pumilio (PUM) repeats. However, recognition side chains in a PUM repeat for cytosine are unknown. Here we report identification of a cytosine-recognition code by screening random amino acid combinations at conserved RNA recognition positions using a yeast three-hybrid system. This C-recognition code is specific and modular as specificity can be transferred to different positions in the RNA recognition sequence. A crystal structure of a modified PUF domain reveals specific contacts between an arginine side chain and themore » cytosine base. We applied the C-recognition code to design PUF domains that recognize targets with multiple cytosines and to generate engineered splicing factors that modulate alternative splicing. Finally, we identified a divergent yeast PUF protein, Nop9p, that may recognize natural target RNAs with cytosine. This work deepens our understanding of natural PUF protein target recognition and expands the ability to engineer PUF domains to recognize any RNA sequence.« less
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H
2017-01-01
Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.
Food-derived immunomodulatory peptides.
Santiago-López, Lourdes; Hernández-Mendoza, Adrián; Vallejo-Cordoba, Belinda; Mata-Haro, Verónica; González-Córdova, Aarón F
2016-08-01
Food proteins contain specific amino acid sequences within their structures that may positively impact bodily functions and have multiple immunomodulatory effects. The functional properties of these specific sequences, also referred to as bioactive peptides, are revealed only after the degradation of native proteins during digestion processes. Currently, milk proteins have been the most explored source of bioactive peptides, which presents an interesting opportunity for the dairy industry. However, plant- and animal-derived proteins have also been shown to be important sources of bioactive peptides. This review summarizes the in vitro and in vivo evidence of the role of various food proteins as sources of immunomodulatory peptides and discusses the possible pathways involving these properties. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.
Sasaki, H; Yokoyama, E; Kuroiwa, A
1990-01-01
The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Predicting the binding preference of transcription factors to individual DNA k-mers.
Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R
2009-04-15
Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
Zavalova, L; Lukyanov, S; Baskova, I; Snezhkov, E; Akopov, S; Berezhnoy, S; Bogdanova, E; Barsova, E; Sverdlov, E D
1996-11-27
We previously detected in salivary gland secretions of the medicinal leech (Hirudo medicinalis) a novel enzymatic activity, endo-epsilon(gamma-Glu)-Lys isopeptidase, which cleaves isopeptide bonds formed by transglutaminase (Factor XIIIa) between glutamine gamma-carboxamide and the epsilon-amino group of lysine. Such isopeptide bonds, either within or between protein polypeptide chains are formed in many biological processes. However, before we started our work no enzymes were known to be capable of specifically splitting isopeptide bonds in proteins. The isopeptidase activity we detected was specific for isopeptide bonds. The enzyme was termed destabilase. Here we report the first purification of destabilase, part of its amino acid sequence isolation and sequencing of two related cDNAs derived from the gene family that encodes destabilase proteins, and the detection of isopeptidase activity encoded by one of these cDNAs cloned in a baculovirus expression vector. The deduced mature protein products of these cDNAs contain 115 and 116 amino acid residues, including 14 highly conserved Cys residues, and are formed from precursors containing specific leader peptides. No homologous sequences were found in public databases.
Zhu, Xun; Xie, Shangbo; Armengaud, Jean; Xie, Wen; Guo, Zhaojiang; Kang, Shi; Wu, Qingjun; Wang, Shaoli; Xia, Jixing; He, Rongjun; Zhang, Youjun
2016-01-01
The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest. PMID:26902207
Oteng-Pabi, Samuel K; Clouthier, Christopher M; Keillor, Jeffrey W
2018-01-01
Transglutaminases (TGases) are enzymes that catalyse protein cross-linking through a transamidation reaction between the side chain of a glutamine residue on one protein and the side chain of a lysine residue on another. Generally, TGases show low substrate specificity with respect to their amine substrate, such that a wide variety of primary amines can participate in the modification of specific glutamine residue. Although a number of different TGases have been used to mediate these bioconjugation reactions, the TGase from Bacillus subtilis (bTG) may be particularly suited to this application. It is smaller than most TGases, can be expressed in a soluble active form, and lacks the calcium dependence of its mammalian counterparts. However, little is known regarding this enzyme and its glutamine substrate specificity, limiting the scope of its application. In this work, we designed a FRET-based ligation assay to monitor the bTG-mediated conjugation of the fluorescent proteins Clover and mRuby2. This assay allowed us to screen a library of random heptapeptide glutamine sequences for their reactivity with recombinant bTG in bacterial cells, using fluorescence assisted cell sorting. From this library, several reactive sequences were identified and kinetically characterized, with the most reactive sequence (YAHQAHY) having a kcat/KM value of 19 ± 3 μM-1 min-1. This sequence was then genetically appended onto a test protein as a reactive 'Q-tag' and fluorescently labelled with dansyl-cadaverine, in the first demonstration of protein labelling mediated by bTG.
Crooks, Richard O; Baxter, Daniel; Panek, Anna S; Lubben, Anneke T; Mason, Jody M
2016-01-29
Interactions between naturally occurring proteins are highly specific, with protein-network imbalances associated with numerous diseases. For designed protein-protein interactions (PPIs), required specificity can be notoriously difficult to engineer. To accelerate this process, we have derived peptides that form heterospecific PPIs when combined. This is achieved using software that generates large virtual libraries of peptide sequences and searches within the resulting interactome for preferentially interacting peptides. To demonstrate feasibility, we have (i) generated 1536 peptide sequences based on the parallel dimeric coiled-coil motif and varied residues known to be important for stability and specificity, (ii) screened the 1,180,416 member interactome for predicted Tm values and (iii) used predicted Tm cutoff points to isolate eight peptides that form four heterospecific PPIs when combined. This required that all 32 hypothetical off-target interactions within the eight-peptide interactome be disfavoured and that the four desired interactions pair correctly. Lastly, we have verified the approach by characterising all 36 pairs within the interactome. In analysing the output, we hypothesised that several sequences are capable of adopting antiparallel orientations. We subsequently improved the software by removing sequences where doing so led to fully complementary electrostatic pairings. Our approach can be used to derive increasingly large and therefore complex sets of heterospecific PPIs with a wide range of potential downstream applications from disease modulation to the design of biomaterials and peptides in synthetic biology. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Interactions between the R2R3-MYB Transcription Factor, AtMYB61, and Target DNA Binding Sites
Prouse, Michael B.; Campbell, Malcolm M.
2013-01-01
Despite the prominent roles played by R2R3-MYB transcription factors in the regulation of plant gene expression, little is known about the details of how these proteins interact with their DNA targets. For example, while Arabidopsis thaliana R2R3-MYB protein AtMYB61 is known to alter transcript abundance of a specific set of target genes, little is known about the specific DNA sequences to which AtMYB61 binds. To address this gap in knowledge, DNA sequences bound by AtMYB61 were identified using cyclic amplification and selection of targets (CASTing). The DNA targets identified using this approach corresponded to AC elements, sequences enriched in adenosine and cytosine nucleotides. The preferred target sequence that bound with the greatest affinity to AtMYB61 recombinant protein was ACCTAC, the AC-I element. Mutational analyses based on the AC-I element showed that ACC nucleotides in the AC-I element served as the core recognition motif, critical for AtMYB61 binding. Molecular modelling predicted interactions between AtMYB61 amino acid residues and corresponding nucleotides in the DNA targets. The affinity between AtMYB61 and specific target DNA sequences did not correlate with AtMYB61-driven transcriptional activation with each of the target sequences. CASTing-selected motifs were found in the regulatory regions of genes previously shown to be regulated by AtMYB61. Taken together, these findings are consistent with the hypothesis that AtMYB61 regulates transcription from specific cis-acting AC elements in vivo. The results shed light on the specifics of DNA binding by an important family of plant-specific transcriptional regulators. PMID:23741471
Determination of the Gene Sequence of Poliovirus with Pactamycin
Summers, D. F.; Maizel, J. V.
1971-01-01
By examination of the virus-specific polypeptides formed after the addition of pactamycin, an inhibitor of protein chain initiation, to infected cells, it has been possible to tentatively locate the virus coat proteins at the amino terminus of the large, virus-specific protein precursor, and, therefore, to assign the coat protein cistron to the 5′ end of the RNA. PMID:4330946
DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.
Mehdi, Ahmed M; Sehgal, Muhammad Shoaib B; Kobe, Bostjan; Bailey, Timothy L; Bodén, Mikael
2013-01-01
Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. http://bioinf.scmb.uq.edu.au/dlocalmotif/
Lahr, Roni M; Mack, Seshat M; Héroux, Annie; Blagden, Sarah P; Bousquet-Antonelli, Cécile; Deragon, Jean-Marc; Berman, Andrea J
2015-09-18
La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Lahr, Roni M.; Mack, Seshat M.; Heroux, Annie; ...
2015-07-22
La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. Amore » putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. Ultimately, these studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.« less
Foggetti, Giorgia; Raimondi, Ivan; Campomenosi, Paola; Menichini, Paola
2014-01-01
TP63 is a member of the TP53 gene family that encodes for up to ten different TA and ΔN isoforms through alternative promoter usage and alternative splicing. Besides being a master regulator of gene expression for squamous epithelial proliferation, differentiation and maintenance, P63, through differential expression of its isoforms, plays important roles in tumorigenesis. All P63 isoforms share an immunoglobulin-like folded DNA binding domain responsible for binding to sequence-specific response elements (REs), whose overall consensus sequence is similar to that of the canonical p53 RE. Using a defined assay in yeast, where P63 isoforms and RE sequences are the only variables, and gene expression assays in human cell lines, we demonstrated that human TA- and ΔN-P63α proteins exhibited differences in transactivation specificity not observed with the corresponding P73 or P53 protein isoforms. These differences 1) were dependent on specific features of the RE sequence, 2) could be related to intrinsic differences in their oligomeric state and cooperative DNA binding, and 3) appeared to be conserved in evolution. Since genotoxic stress can change relative ratio of TA- and ΔN-P63α protein levels, the different transactivation specificity of each P63 isoform could potentially influence cellular responses to specific stresses. PMID:24926492
Sunflower centromeres consist of a centromere-specific LINE and a chromosome-specific tandem repeat.
Nagaki, Kiyotaka; Tanaka, Keisuke; Yamaji, Naoki; Kobayashi, Hisato; Murata, Minoru
2015-01-01
The kinetochore is a protein complex including kinetochore-specific proteins that plays a role in chromatid segregation during mitosis and meiosis. The complex associates with centromeric DNA sequences that are usually species-specific. In plant species, tandem repeats including satellite DNA sequences and retrotransposons have been reported as centromeric DNA sequences. In this study on sunflowers, a cDNA-encoding centromere-specific histone H3 (CENH3) was isolated from a cDNA pool from a seedling, and an antibody was raised against a peptide synthesized from the deduced cDNA. The antibody specifically recognized the sunflower CENH3 (HaCENH3) and showed centromeric signals by immunostaining and immunohistochemical staining analysis. The antibody was also applied in chromatin immunoprecipitation (ChIP)-Seq to isolate centromeric DNA sequences and two different types of repetitive DNA sequences were identified. One was a long interspersed nuclear element (LINE)-like sequence, which showed centromere-specific signals on almost all chromosomes in sunflowers. This is the first report of a centromeric LINE sequence, suggesting possible centromere targeting ability. Another type of identified repetitive DNA was a tandem repeat sequence with a 187-bp unit that was found only on a pair of chromosomes. The HaCENH3 content of the tandem repeats was estimated to be much higher than that of the LINE, which implies centromere evolution from LINE-based centromeres to more stable tandem-repeat-based centromeres. In addition, the epigenetic status of the sunflower centromeres was investigated by immunohistochemical staining and ChIP, and it was found that centromeres were heterochromatic.
Structure of human POFUT2: insights into thrombospondin type 1 repeat fold and O-fucosylation
Chen, Chun-I; Keusch, Jeremy J; Klein, Dominique; Hess, Daniel; Hofsteenge, Jan; Gut, Heinz
2012-01-01
Protein O-fucosylation is a post-translational modification found on serine/threonine residues of thrombospondin type 1 repeats (TSR). The fucose transfer is catalysed by the enzyme protein O-fucosyltransferase 2 (POFUT2) and >40 human proteins contain the TSR consensus sequence for POFUT2-dependent fucosylation. To better understand O-fucosylation on TSR, we carried out a structural and functional analysis of human POFUT2 and its TSR substrate. Crystal structures of POFUT2 reveal a variation of the classical GT-B fold and identify sugar donor and TSR acceptor binding sites. Structural findings are correlated with steady-state kinetic measurements of wild-type and mutant POFUT2 and TSR and give insight into the catalytic mechanism and substrate specificity. By using an artificial mini-TSR substrate, we show that specificity is not primarily encoded in the TSR protein sequence but rather in the unusual 3D structure of a small part of the TSR. Our findings uncover that recognition of distinct conserved 3D fold motifs can be used as a mechanism to achieve substrate specificity by enzymes modifying completely folded proteins of very wide sequence diversity and biological function. PMID:22588082
Structural hot spots for the solubility of globular proteins
Ganesan, Ashok; Siekierska, Aleksandra; Beerten, Jacinte; Brams, Marijke; Van Durme, Joost; De Baets, Greet; Van der Kant, Rob; Gallardo, Rodrigo; Ramakers, Meine; Langenberg, Tobias; Wilkinson, Hannah; De Smet, Frederik; Ulens, Chris; Rousseau, Frederic; Schymkowitz, Joost
2016-01-01
Natural selection shapes protein solubility to physiological requirements and recombinant applications that require higher protein concentrations are often problematic. This raises the question whether the solubility of natural protein sequences can be improved. We here show an anti-correlation between the number of aggregation prone regions (APRs) in a protein sequence and its solubility, suggesting that mutational suppression of APRs provides a simple strategy to increase protein solubility. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein solubility are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein solubility independently of structure and function. PMID:26905391
Merlaen, Britt; De Keyser, Ellen; Van Labeke, Marie-Christine
2018-01-01
The newly identified aquaporin coding sequences presented here pave the way for further insights into the plant-water relations in the commercial strawberry ( Fragaria x ananassa ). Aquaporins are water channel proteins that allow water to cross (intra)cellular membranes. In Fragaria x ananassa , few of them have been identified hitherto, hampering the exploration of the water transport regulation at cellular level. Here, we present new aquaporin coding sequences belonging to different subclasses: plasma membrane intrinsic proteins subtype 1 and subtype 2 (PIP1 and PIP2) and tonoplast intrinsic proteins (TIP). The classification is based on phylogenetic analysis and is confirmed by the presence of conserved residues. Substrate-specific signature sequences (SSSSs) and specificity-determining positions (SDPs) predict the substrate specificity of each new aquaporin. Expression profiling in leaves, petioles and developing fruits reveals distinct patterns, even within the same (sub)class. Expression profiles range from leaf-specific expression over constitutive expression to fruit-specific expression. Both upregulation and downregulation during fruit ripening occur. Substrate specificity and expression profiles suggest that functional specialization exists among aquaporins belonging to a different but also to the same (sub)class.
APOBEC3G Interacts with ssDNA by Two Modes: AFM Studies
Shlyakhtenko, Luda S.; Dutta, Samrat; Banga, Jaspreet; Li, Ming; Harris, Reuben S.; Lyubchenko, Yuri L.
2015-01-01
APOBEC3G (A3G) protein has antiviral activity against HIV and other pathogenic retroviruses. A3G has two domains: a catalytic C-terminal domain (CTD) that deaminates cytidine, and a N-terminal domain (NTD) that binds to ssDNA. Although abundant information exists about the biological activities of A3G protein, the interplay between sequence specific deaminase activity and A3G binding to ssDNA remains controversial. We used the topographic imaging and force spectroscopy modalities of Atomic Force Spectroscopy (AFM) to characterize the interaction of A3G protein with deaminase specific and nonspecific ssDNA substrates. AFM imaging demonstrated that A3G has elevated affinity for deaminase specific ssDNA than for nonspecific ssDNA. AFM force spectroscopy revealed two distinct binding modes by which A3G interacts with ssDNA. One mode requires sequence specificity, as demonstrated by stronger and more stable complexes with deaminase specific ssDNA than with nonspecific ssDNA. Overall these observations enforce prior studies suggesting that both domains of A3G contribute to the sequence specific binding of ssDNA. PMID:26503602
A TALE-inspired computational screen for proteins that contain approximate tandem repeats.
Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats
Krwawicz, Joanna
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832
Lin, Jiangguo; Countryman, Preston; Buncher, Noah; Kaur, Parminder; E, Longjiang; Zhang, Yiyun; Gibson, Greg; You, Changjiang; Watkins, Simon C; Piehler, Jacob; Opresko, Patricia L; Kad, Neil M; Wang, Hong
2014-02-01
Human telomeres are maintained by the shelterin protein complex in which TRF1 and TRF2 bind directly to duplex telomeric DNA. How these proteins find telomeric sequences among a genome of billions of base pairs and how they find protein partners to form the shelterin complex remains uncertain. Using single-molecule fluorescence imaging of quantum dot-labeled TRF1 and TRF2, we study how these proteins locate TTAGGG repeats on DNA tightropes. By virtue of its basic domain TRF2 performs an extensive 1D search on nontelomeric DNA, whereas TRF1's 1D search is limited. Unlike the stable and static associations observed for other proteins at specific binding sites, TRF proteins possess reduced binding stability marked by transient binding (∼ 9-17 s) and slow 1D diffusion on specific telomeric regions. These slow diffusion constants yield activation energy barriers to sliding ∼ 2.8-3.6 κ(B)T greater than those for nontelomeric DNA. We propose that the TRF proteins use 1D sliding to find protein partners and assemble the shelterin complex, which in turn stabilizes the interaction with specific telomeric DNA. This 'tag-team proofreading' represents a more general mechanism to ensure a specific set of proteins interact with each other on long repetitive specific DNA sequences without requiring external energy sources.
Mapping specificity landscapes of RNA-protein interactions by high throughput sequencing.
Jankowsky, Eckhard; Harris, Michael E
2017-04-15
To function in a biological setting, RNA binding proteins (RBPs) have to discriminate between alternative binding sites in RNAs. This discrimination can occur in the ground state of an RNA-protein binding reaction, in its transition state, or in both. The extent by which RBPs discriminate at these reaction states defines RBP specificity landscapes. Here, we describe the HiTS-Kin and HiTS-EQ techniques, which combine kinetic and equilibrium binding experiments with high throughput sequencing to quantitatively assess substrate discrimination for large numbers of substrate variants at ground and transition states of RNA-protein binding reactions. We discuss experimental design, practical considerations and data analysis and outline how a combination of HiTS-Kin and HiTS-EQ allows the mapping of RBP specificity landscapes. Copyright © 2017 Elsevier Inc. All rights reserved.
Dodds, Peter N.; Lawrence, Gregory J.; Catanzariti, Ann-Maree; Teh, Trazel; Wang, Ching-I. A.; Ayliffe, Michael A.; Kobe, Bostjan; Ellis, Jeffrey G.
2006-01-01
Plant resistance proteins (R proteins) recognize corresponding pathogen avirulence (Avr) proteins either indirectly through detection of changes in their host protein targets or through direct R–Avr protein interaction. Although indirect recognition imposes selection against Avr effector function, pathogen effector molecules recognized through direct interaction may overcome resistance through sequence diversification rather than loss of function. Here we show that the flax rust fungus AvrL567 genes, whose products are recognized by the L5, L6, and L7 R proteins of flax, are highly diverse, with 12 sequence variants identified from six rust strains. Seven AvrL567 variants derived from Avr alleles induce necrotic responses when expressed in flax plants containing corresponding resistance genes (R genes), whereas five variants from avr alleles do not. Differences in recognition specificity between AvrL567 variants and evidence for diversifying selection acting on these genes suggest they have been involved in a gene-specific arms race with the corresponding flax R genes. Yeast two-hybrid assays indicate that recognition is based on direct R–Avr protein interaction and recapitulate the interaction specificity observed in planta. Biochemical analysis of Escherichia coli-produced AvrL567 proteins shows that variants that escape recognition nevertheless maintain a conserved structure and stability, suggesting that the amino acid sequence differences directly affect the R–Avr protein interaction. We suggest that direct recognition associated with high genetic diversity at corresponding R and Avr gene loci represents an alternative outcome of plant–pathogen coevolution to indirect recognition associated with simple balanced polymorphisms for functional and nonfunctional R and Avr genes. PMID:16731621
Warfield, Linda; Tuttle, Lisa M; Pacheco, Derek; Klevit, Rachel E; Hahn, Steven
2014-08-26
Although many transcription activators contact the same set of coactivator complexes, the mechanism and specificity of these interactions have been unclear. For example, do intrinsically disordered transcription activation domains (ADs) use sequence-specific motifs, or do ADs of seemingly different sequence have common properties that encode activation function? We find that the central activation domain (cAD) of the yeast activator Gcn4 functions through a short, conserved sequence-specific motif. Optimizing the residues surrounding this short motif by inserting additional hydrophobic residues creates very powerful ADs that bind the Mediator subunit Gal11/Med15 with high affinity via a "fuzzy" protein interface. In contrast to Gcn4, the activity of these synthetic ADs is not strongly dependent on any one residue of the AD, and this redundancy is similar to that of some natural ADs in which few if any sequence-specific residues have been identified. The additional hydrophobic residues in the synthetic ADs likely allow multiple faces of the AD helix to interact with the Gal11 activator-binding domain, effectively forming a fuzzier interface than that of the wild-type cAD.
Bosselut, R; Levin, J; Adjadj, E; Ghysdael, J
1993-11-11
Ets proteins form a family of sequence specific DNA binding proteins which bind DNA through a 85 aminoacids conserved domain, the Ets domain, whose sequence is unrelated to any other characterized DNA binding domain. Unlike all other known Ets proteins, which bind specific DNA sequences centered over either GGAA or GGAT core motifs, E74 and Elf1 selectively bind to GGAA corecontaining sites. Elf1 and E74 differ from other Ets proteins in three residues located in an otherwise highly conserved region of the Ets domain, referred to as conserved region III (CRIII). We show that a restricted selectivity for GGAA core-containing sites could be conferred to Ets1 upon changing a single lysine residue within CRIII to the threonine found in Elf1 and E74 at this position. Conversely, the reciprocal mutation in Elf1 confers to this protein the ability to bind to GGAT core containing EBS. This, together with the fact that mutation of two invariant arginine residues in CRIII abolishes DNA binding, indicates that CRIII plays a key role in Ets domain recognition of the GGAA/T core motif and lead us to discuss a model of Ets proteins--core motif interaction.
Sarkar, Debasree; Patra, Piya; Ghosh, Abhirupa; Saha, Sudipto
2016-01-01
A considerable proportion of protein-protein interactions (PPIs) in the cell are estimated to be mediated by very short peptide segments that approximately conform to specific sequence patterns known as linear motifs (LMs), often present in the disordered regions in the eukaryotic proteins. These peptides have been found to interact with low affinity and are able bind to multiple interactors, thus playing an important role in the PPI networks involving date hubs. In this work, PPI data and de novo motif identification based method (MEME) were used to identify such peptides in three cancer-associated hub proteins-MYC, APC and MDM2. The peptides corresponding to the significant LMs identified for each hub protein were aligned, the overlapping regions across these peptides being termed as overlapping linear peptides (OLPs). These OLPs were thus predicted to be responsible for multiple PPIs of the corresponding hub proteins and a scoring system was developed to rank them. We predicted six OLPs in MYC and five OLPs in MDM2 that scored higher than OLP predictions from randomly generated protein sets. Two OLP sequences from the C-terminal of MYC were predicted to bind with FBXW7, component of an E3 ubiquitin-protein ligase complex involved in proteasomal degradation of MYC. Similarly, we identified peptides in the C-terminal of MDM2 interacting with FKBP3, which has a specific role in auto-ubiquitinylation of MDM2. The peptide sequences predicted in MYC and MDM2 look promising for designing orthosteric inhibitors against possible disease-associated PPIs. Since these OLPs can interact with other proteins as well, these inhibitors should be specific to the targeted interactor to prevent undesired side-effects. This computational framework has been designed to predict and rank the peptide regions that may mediate multiple PPIs and can be applied to other disease-associated date hub proteins for prediction of novel therapeutic targets of small molecule PPI modulators.
Xu, Hanfu; Deng, Dangjun; Yuan, Lin; Wang, Yuancheng; Wang, Feng; Xia, Qingyou
2014-08-01
30K proteins are a group of structurally related proteins that play important roles in the life cycle of the silkworm Bombyx mori and are largely synthesized and regulated in a time-dependent manner in the fat body. Little is known about the upstream regulatory elements associated with the genes encoding these proteins. In the present study, the promoter of Bmlp3, a fat body-specific gene encoding a 30K protein family member, was characterized by joining sequences containing the Bmlp3 promoter with various amounts of 5' upstream sequences to a luciferase reporter gene. The results indicated that the sequences from -150 to -250bp and -597 to -675bp upstream of the Bmlp3 transcription start site were necessary for high levels of luciferase activity. Further analysis showed that a 21-bp sequence located between -230 and -250 was specifically recognized by nuclear factors from silkworm fat bodies and BmE cells, and could enhance luciferase reporter-gene expression 2.8-fold in BmE cells. This study provides new insights into the Bmlp3 promoter and contributes to the further clarification of the function and developmental regulation of Bmlp3. Copyright © 2014. Published by Elsevier B.V.
NKX3.1 Genotype and IGF-1 Interact in Prostate Cancer Risk
2009-05-01
Steadman DJ, Giuffrida D, Gelmann EP. DNA-binding sequence of the human prostate-specific homeodomain protein NKX3.1. Nucleic Acids Res 2000;28...Gelmann EP. DNA-binding sequence of the human prostate-specific homeodomain protein NKX3.1. Nucleic Acids Res 2000;28:2389–95. 20. Wu X, Senechal K...3212836 /UG=Hs.21765 fatty acid desaturase 3 204733_at 5.74 gb:NM_002774.1 /DEF=Homo sapiens kallikrein 6 (neurosin, zyme) (KLK6), mRNA. /FEA=mRNA /GEN
Adelman, K; Salmon, B; Baines, J D
2001-03-13
The product of the herpes simplex virus type 1 U(L)28 gene is essential for cleavage of concatemeric viral DNA into genome-length units and packaging of this DNA into viral procapsids. To address the role of U(L)28 in this process, purified U(L)28 protein was assayed for the ability to recognize conserved herpesvirus DNA packaging sequences. We report that DNA fragments containing the pac1 DNA packaging motif can be induced by heat treatment to adopt novel DNA conformations that migrate faster than the corresponding duplex in nondenaturing gels. Surprisingly, these novel DNA structures are high-affinity substrates for U(L)28 protein binding, whereas double-stranded DNA of identical sequence composition is not recognized by U(L)28 protein. We demonstrate that only one strand of the pac1 motif is responsible for the formation of novel DNA structures that are bound tightly and specifically by U(L)28 protein. To determine the relevance of the observed U(L)28 protein-pac1 interaction to the cleavage and packaging process, we have analyzed the binding affinity of U(L)28 protein for pac1 mutants previously shown to be deficient in cleavage and packaging in vivo. Each of the pac1 mutants exhibited a decrease in DNA binding by U(L)28 protein that correlated directly with the reported reduction in cleavage and packaging efficiency, thereby supporting a role for the U(L)28 protein-pac1 interaction in vivo. These data therefore suggest that the formation of novel DNA structures by the pac1 motif confers added specificity on recognition of DNA packaging sequences by the U(L)28-encoded component of the herpesvirus cleavage and packaging machinery.
Mining sequential patterns for protein fold recognition.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2008-02-01
Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.
The proteome: structure, function and evolution
Fleming, Keiran; Kelley, Lawrence A; Islam, Suhail A; MacCallum, Robert M; Muller, Arne; Pazos, Florencio; Sternberg, Michael J.E
2006-01-01
This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family. PMID:16524832
The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.
Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B; Halpern, Aaron L; Williamson, Shannon J; Remington, Karin; Eisen, Jonathan A; Heidelberg, Karla B; Manning, Gerard; Li, Weizhong; Jaroszewski, Lukasz; Cieplak, Piotr; Miller, Christopher S; Li, Huiying; Mashiyama, Susan T; Joachimiak, Marcin P; van Belle, Christopher; Chandonia, John-Marc; Soergel, David A; Zhai, Yufeng; Natarajan, Kannan; Lee, Shaun; Raphael, Benjamin J; Bafna, Vineet; Friedman, Robert; Brenner, Steven E; Godzik, Adam; Eisenberg, David; Dixon, Jack E; Taylor, Susan S; Strausberg, Robert L; Frazier, Marvin; Venter, J Craig
2007-03-01
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-01-01
Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389
Mizumoto, Shuji; Yamada, Shuhei; Sugahara, Kazuyuki
2015-10-01
Recent functional studies on chondroitin sulfate-dermatan sulfate (CS-DS) demonstrated its indispensable roles in various biological events including brain development and cancer. CS-DS proteoglycans exert their physiological activity through interactions with specific proteins including growth factors, cell surface receptors, and matrix proteins. The characterization of these interactions is essential for regulating the biological functions of CS-DS proteoglycans. Although amino acid sequences on the bioactive proteins required for these interactions have already been elucidated, the specific saccharide sequences involved in the binding of CS-DS to target proteins have not yet been sufficiently identified. In this review, recent findings are described on the interaction between CS-DS and some proteins which are especially involved in the central nervous system and cancer development/metastasis. Copyright © 2015. Published by Elsevier Ltd.
A new subfamily LIP of the major intrinsic proteins.
Khabudaev, Kirill Vladimirovich; Petrova, Darya Petrovna; Grachev, Mikhail Aleksandrovich; Likhoshway, Yelena Valentinovna
2014-03-04
Proteins of the major intrinsic protein (MIP) family, or aquaporins, have been detected in almost all organisms. These proteins are important in cells and organisms because they allow for passive transmembrane transport of water and other small, uncharged polar molecules. We compared the predicted amino acid sequences of 20 MIPs from several algae species of the phylum Heterokontophyta (Kingdom Chromista) with the sequences of MIPs from other organisms. Multiple sequence alignments revealed motifs that were homologous to functionally important NPA motifs and the so-called ar/R-selective filter of glyceroporins and aquaporins. The MIP sequences of the studied chromists fell into several clusters that belonged to different groups of MIPs from a wide variety of organisms from different Kingdoms. Two of these proteins belong to Plasma membrane intrinsic proteins (PIPs), four of them belong to GlpF-like intrinsic proteins (GIPs), and one of them belongs to a specific MIPE subfamily from green algae. Three proteins belong to the unclassified MIPs, two of which are of bacterial origin. Eight of the studied MIPs contain an NPM-motif in place of the second conserved NPA-motif typical of the majority of MIPs. The MIPs of heterokonts within all detected clusters can differ from other MIPs in the same cluster regarding the structure of the ar/R-selective filter and other generally conserved motifs. We proposed placing nine MIPs from heterokonts into a new group, which we have named the LIPs (large intrinsic proteins). The possible substrate specificities of the studied MIPs are discussed.
An approach to large scale identification of non-obvious structural similarities between proteins
Cherkasov, Artem; Jones, Steven JM
2004-01-01
Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578
Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander
2009-11-01
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
Tutorial on Protein Ontology Resources
Arighi, Cecilia; Drabkin, Harold; Christie, Karen R.; Ross, Karen; Natale, Darren
2017-01-01
The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species non-specific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In this first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website (proconsortium.org) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO. PMID:28150233
Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya
2012-01-01
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565
Gene calling and bacterial genome annotation with BG7.
Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo
2015-01-01
New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao
2015-09-09
Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew's correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental Int. J. Mol. Sci. 2015, 16,21735 scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yusim, Karina; Korber, Bette Tina Marie; Barouch, Dan
HIV Molecular Immunology is a companion volume to HIV Sequence Compendium. This publication, the 2014 edition, is the PDF version of the web-based HIV Immunology Database (http://www.hiv.lanl.gov/content/immunology/). The web interface for this relational database has many search options, as well as interactive tools to help immunologists design reagents and interpret their results. In the HIV Immunology Database, HIV-specific B-cell and T-cell responses are summarized and annotated. Immunological responses are divided into three parts, CTL, T helper, and antibody. Within these parts, defined epitopes are organized by protein and binding sites within each protein, moving from left to right through themore » coding regions spanning the HIV genome. We include human responses to natural HIV infections, as well as vaccine studies in a range of animal models and human trials. Responses that are not specifically defined, such as responses to whole proteins or monoclonal antibody responses to discontinuous epitopes, are summarized at the end of each protein section. Studies describing general HIV responses to the virus, but not to any specific protein, are included at the end of each part. The annotation includes information such as crossreactivity, escape mutations, antibody sequence, TCR usage, functional domains that overlap with an epitope, immune response associations with rates of progression and therapy, and how specific epitopes were experimentally defined. Basic information such as HLA specificities for T-cell epitopes, isotypes of monoclonal antibodies, and epitope sequences are included whenever possible. All studies that we can find that incorporate the use of a specific monoclonal antibody are included in the entry for that antibody. A single T-cell epitope can have multiple entries, generally one entry per study. Finally, maps of all defined linear epitopes relative to the HXB2 reference proteins are provided.« less
Pasricha, Gunisha; Mishra, Akhilesh C; Chakrabarti, Alok K
2013-07-01
PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Analysis showed that 96·4% of the H5N1 influenza viruses harbored full-length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th-century pandemic influenza viruses contained full-length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human- and avian host-specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host-specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. © 2012 John Wiley & Sons Ltd.
2011-01-01
Background Wheat flour is one of the world's major food ingredients, in part because of the unique end-use qualities conferred by the abundant glutamine- and proline-rich gluten proteins. Many wheat flour proteins also present dietary problems for consumers with celiac disease or wheat allergies. Despite the importance of these proteins it has been particularly challenging to use MS/MS to distinguish the many proteins in a flour sample and relate them to gene sequences. Results Grain from the extensively characterized spring wheat cultivar Triticum aestivum 'Butte 86' was milled to white flour from which proteins were extracted, then separated and quantified by 2-DE. Protein spots were identified by separate digestions with three proteases, followed by tandem mass spectrometry analysis of the peptides. The spectra were used to interrogate an improved protein sequence database and results were integrated using the Scaffold program. Inclusion of cultivar specific sequences in the database greatly improved the results, and 233 spots were identified, accounting for 93.1% of normalized spot volume. Identified proteins were assigned to 157 wheat sequences, many for proteins unique to wheat and nearly 40% from Butte 86. Alpha-gliadins accounted for 20.4% of flour protein, low molecular weight glutenin subunits 18.0%, high molecular weight glutenin subunits 17.1%, gamma-gliadins 12.2%, omega-gliadins 10.5%, amylase/protease inhibitors 4.1%, triticins 1.6%, serpins 1.6%, purinins 0.9%, farinins 0.8%, beta-amylase 0.5%, globulins 0.4%, other enzymes and factors 1.9%, and all other 3%. Conclusions This is the first successful effort to identify the majority of abundant flour proteins for a single wheat cultivar, relate them to individual gene sequences and estimate their relative levels. Many genes for wheat flour proteins are not expressed, so this study represents further progress in describing the expressed wheat genome. Use of cultivar-specific contigs helped to overcome the difficulties of matching peptides to gene sequences for members of highly similar, rapidly evolving storage protein families. Prospects for simplifying this process for routine analyses are discussed. The ability to measure expression levels for individual flour protein genes complements information gained from efforts to sequence the wheat genome and is essential for studies of effects of environment on gene expression. PMID:21314956
Regulatory sequence of cupin family gene
Hood, Elizabeth; Teoh, Thomas
2017-07-25
This invention is in the field of plant biology and agriculture and relates to novel seed specific promoter regions. The present invention further provide methods of producing proteins and other products of interest and methods of controlling expression of nucleic acid sequences of interest using the seed specific promoter regions.
Recombinant soluble adenovirus receptor
Freimuth, Paul I.
2002-01-01
Disclosed are isolated polypeptides from human CAR (coxsackievirus and adenovirus receptor) protein which bind adenovirus. Specifically disclosed are amino acid sequences which corresponds to adenovirus binding domain D1 and the entire extracellular domain of human CAR protein comprising D1 and D2. In other aspects, the disclosure relates to nucleic acid sequences encoding these domains as well as expression vectors which encode the domains and bacterial cells containing such vectors. Also disclosed is an isolated fusion protein comprised of the D1 polypeptide sequence fused to a polypeptide sequence which facilitates folding of D1 into a functional, soluble domain when expressed in bacteria. The functional D1 domain finds application for example in a therapeutic method for treating a patient infected with a virus which binds to D1, and also in a method for identifying an antiviral compound which interferes with viral attachment. Also included is a method for specifically targeting a cell for infection by a virus which binds to D1.
Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier
2016-01-04
The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
The arbuscular mycorrhizal fungal protein glomalin is a putative homolog of heat shock protein 60.
Gadkar, Vijay; Rillig, Matthias C
2006-10-01
Work on glomalin-related soil protein produced by arbuscular mycorrhizal (AM) fungi (AMF) has been limited because of the unknown identity of the protein. A protein band cross-reactive with the glomalin-specific antibody MAb32B11 from the AM fungus Glomus intraradices was partially sequenced using tandem liquid chromatography-mass spectrometry. A 17 amino acid sequence showing similarity to heat shock protein 60 (hsp 60) was obtained. Based on degenerate PCR, a full-length cDNA of 1773 bp length encoding the hsp 60 gene was isolated from a G. intraradices cDNA library. The ORF was predicted to encode a protein of 590 amino acids. The protein sequence had three N-terminal glycosylation sites and a string of GGM motifs at the C-terminal end. The GiHsp 60 ORF had three introns of 67, 76 and 131 bp length. The GiHsp 60 was expressed using an in vitro translation system, and the protein was purified using the 6xHis-tag system. A dot-blot assay on the purified protein showed that it was highly cross-reactive with the glomalin-specific antibody MAb32B11. The present work provides the first evidence for the identity of the glomalin protein in the model AMF G. intraradices, thus facilitating further characterization of this protein, which is of great interest in soil ecology.
Control of total GFP expression by alterations to the 3′ region nucleotide sequence
2013-01-01
Background Previously, we distinguished the Escherichia coli type II cytoplasmic membrane translocation pathways of Tat, Yid, and Sec for unfolded and folded soluble target proteins. The translocation of folded protein to the periplasm for soluble expression via the Tat pathway was controlled by an N-terminal hydrophilic leader sequence. In this study, we investigated the effect of the hydrophilic C-terminal end and its nucleotide sequence on total and soluble protein expression. Results The native hydrophilic C-terminal end of GFP was obtained by deleting the C-terminal peptide LeuGlu-6×His, derived from pET22b(+). The corresponding clones induced total and soluble GFP expression that was either slightly increased or dramatically reduced, apparently through reconstruction of the nucleotide sequence around the stop codon in the 3′ region. In the expression-induced clones, the hydrophilic C-terminus showed increased Tat pathway specificity for soluble expression. However, in the expression-reduced clone, after analyzing the role of the 5′ poly(A) coding sequence with a substituted synonymous codon, we proved that the longer 5′ poly(A) coding sequence interacted with the reconstructed 3′ region nucleotide sequence to create a new mRNA tertiary structure between the 5′ and 3′ regions, which resulted in reduced total GFP expression. Further, to recover the reduced expression by changing the 3′ nucleotide sequence, after replacing selected C-terminal 5′ codons and the stop codon in the ORF with synonymous codons, total GFP expression in most of the clones was recovered to the undeleted control level. The insertion of trinucleotides after the stop codon in the 3′-UTR recovered or reduced total GFP expression. RT-PCR revealed that the level of total protein expression was controlled by changes in translational or transcriptional regulation, which were induced or reduced by the substitution or insertion of 3′ region nucleotides. Conclusions We found that the hydrophilic C-terminal end of GFP increased Tat pathway specificity and that the 3′ nucleotide sequence played an important role in total protein expression through translational and transcriptional regulation. These findings may be useful for efficiently producing recombinant proteins as well as for potentially controlling the expression level of specific genes in the body for therapeutic purposes. PMID:23834827
Automated design evolution of stereochemically randomized protein foldamers
NASA Astrophysics Data System (ADS)
Ranbhor, Ranjit; Kumar, Anil; Patel, Kirti; Ramakrishnan, Vibin; Durani, Susheel
2018-05-01
Diversification of chain stereochemistry opens up the possibilities of an ‘in principle’ increase in the design space of proteins. This huge increase in the sequence and consequent structural variation is aimed at the generation of smart materials. To diversify protein structure stereochemically, we introduced L- and D-α-amino acids as the design alphabet. With a sequence design algorithm, we explored the usage of specific variables such as chirality and the sequence of this alphabet in independent steps. With molecular dynamics, we folded stereochemically diverse homopolypeptides and evaluated their ‘fitness’ for possible design as protein-like foldamers. We propose a fitness function to prune the most optimal fold among 1000 structures simulated with an automated repetitive simulated annealing molecular dynamics (AR-SAMD) approach. The highly scored poly-leucine fold with sequence lengths of 24 and 30 amino acids were later sequence-optimized using a Dead End Elimination cum Monte Carlo based optimization tool. This paper demonstrates a novel approach for the de novo design of protein-like foldamers.
The eukaryotic signal sequence, YGRL, targets the chlamydial inclusion
Kabeiseman, Emily J.; Cichos, Kyle H.; Moore, Elizabeth R.
2014-01-01
Understanding how host proteins are targeted to pathogen-specified organelles, like the chlamydial inclusion, is fundamentally important to understanding the biogenesis of these unique subcellular compartments and how they maintain autonomy within the cell. Syntaxin 6, which localizes to the chlamydial inclusion, contains an YGRL signal sequence. The YGRL functions to return syntaxin 6 to the trans-Golgi from the plasma membrane, and deletion of the YGRL signal sequence from syntaxin 6 also prevents the protein from localizing to the chlamydial inclusion. YGRL is one of three YXXL (YGRL, YQRL, and YKGL) signal sequences which target proteins to the trans-Golgi. We designed various constructs of eukaryotic proteins to test the specificity and propensity of YXXL sequences to target the inclusion. The YGRL signal sequence redirects proteins (e.g., Tgn38, furin, syntaxin 4) that normally do not localize to the chlamydial inclusion. Further, the requirement of the YGRL signal sequence for syntaxin 6 localization to inclusions formed by different species of Chlamydia is conserved. These data indicate that there is an inherent property of the chlamydial inclusion, which allows it to recognize the YGRL signal sequence. To examine whether this “inherent property” was protein or lipid in nature, we asked if deletion of the YGRL signal sequence from syntaxin 6 altered the ability of the protein to interact with proteins or lipids. Deletion or alteration of the YGRL from syntaxin 6 does not appreciably impact syntaxin 6-protein interactions, but does decrease syntaxin 6-lipid interactions. Intriguingly, data also demonstrate that YKGL or YQRL can successfully substitute for YGRL in localization of syntaxin 6 to the chlamydial inclusion. Importantly and for the first time, we are establishing that a eukaryotic signal sequence targets the chlamydial inclusion. PMID:25309881
2014-01-01
Background Deciphering of the information content of eukaryotic promoters has remained confined to universal landmarks and conserved sequence elements such as enhancers and transcription factor binding motifs, which are considered sufficient for gene activation and regulation. Gene-specific sequences, interspersed between the canonical transacting factor binding sites or adjoining them within a promoter, are generally taken to be devoid of any regulatory information and have therefore been largely ignored. An unanswered question therefore is, do gene-specific sequences within a eukaryotic promoter have a role in gene activation? Here, we present an exhaustive experimental analysis of a gene-specific sequence adjoining the heat shock element (HSE) in the proximal promoter of the small heat shock protein gene, αB-crystallin (cryab). These sequences are highly conserved between the rodents and the humans. Results Using human retinal pigment epithelial cells in culture as the host, we have identified a 10-bp gene-specific promoter sequence (GPS), which, unlike an enhancer, controls expression from the promoter of this gene, only when in appropriate position and orientation. Notably, the data suggests that GPS in comparison with the HSE works in a context-independent fashion. Additionally, when moved upstream, about a nucleosome length of DNA (−154 bp) from the transcription start site (TSS), the activity of the promoter is markedly inhibited, suggesting its involvement in local promoter access. Importantly, we demonstrate that deletion of the GPS results in complete loss of cryab promoter activity in transgenic mice. Conclusions These data suggest that gene-specific sequences such as the GPS, identified here, may have critical roles in regulating gene-specific activity from eukaryotic promoters. PMID:24589182
Li, Wei; Jiang, Wei; Wang, Lei
2016-10-12
In this work, a novel self-locked aptamer probe mediated cascade amplification strategy has been constructed for highly sensitive and specific detection of protein. First, the self-locked aptamer probe was designed with three functions: one was specific molecular recognition attributed to the aptamer sequence, the second was signal transduction owing to the transduction sequence, and the third was self-locking through the hybridization of the transduction sequence and part of the aptamer sequence. Then, the aptamer sequence specific recognized the target and folded into a three-way helix junction, leading to the release of the transduction sequence. Next, the 3'-end of this three-way junction acted as primer to trigger the strand displacement amplification (SDA), yielding a large amount of primers. Finally, the primers initiated the dual-exponential rolling circle amplification (DE-RCA) and generated numerous G-quadruples sequences. By inserting the fluorescent dye N-methyl mesoporphyrin IX (NMM), enhanced fluorescence signal was achieved. In this strategy, the self-locked aptamer probe was more stable to reduce the interference signals generated by the uncontrollable folding in unbounded state. Through the cascade amplification of SDA and DE-RCA, the sensitivity was further improved with a detection limit of 3.8 × 10(-16) mol/L for protein detection. Furthermore, by changing the aptamer sequence of the probe, sensitive and selective detection of adenosine has been also achieved, suggesting that the proposed strategy has good versatility and can be widely used in sensitive and selective detection of biomolecules. Copyright © 2016 Elsevier B.V. All rights reserved.
RNA regulatory networks diversified through curvature of the PUF protein scaffold
Wilinski, Daniel; Qiu, Chen; Lapointe, Christopher P.; ...
2015-09-14
Proteins bind and control mRNAs, directing their localization, translation and stability. Members of the PUF family of RNA-binding proteins control multiple mRNAs in a single cell, and play key roles in development, stem cell maintenance and memory formation. Here we identified the mRNA targets of a S. cerevisiae PUF protein, Puf5p, by ultraviolet-crosslinking-affinity purification and high-throughput sequencing (HITS-CLIP). The binding sites recognized by Puf5p are diverse, with variable spacer lengths between two specific sequences. Each length of site correlates with a distinct biological function. Crystal structures of Puf5p–RNA complexes reveal that the protein scaffold presents an exceptionally flat and extendedmore » interaction surface relative to other PUF proteins. In complexes with RNAs of different lengths, the protein is unchanged. A single PUF protein repeat is sufficient to induce broadening of specificity. Changes in protein architecture, such as alterations in curvature, may lead to evolution of mRNA regulatory networks.« less
RNA regulatory networks diversified through curvature of the PUF protein scaffold
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilinski, Daniel; Qiu, Chen; Lapointe, Christopher P.
Proteins bind and control mRNAs, directing their localization, translation and stability. Members of the PUF family of RNA-binding proteins control multiple mRNAs in a single cell, and play key roles in development, stem cell maintenance and memory formation. Here we identified the mRNA targets of a S. cerevisiae PUF protein, Puf5p, by ultraviolet-crosslinking-affinity purification and high-throughput sequencing (HITS-CLIP). The binding sites recognized by Puf5p are diverse, with variable spacer lengths between two specific sequences. Each length of site correlates with a distinct biological function. Crystal structures of Puf5p–RNA complexes reveal that the protein scaffold presents an exceptionally flat and extendedmore » interaction surface relative to other PUF proteins. In complexes with RNAs of different lengths, the protein is unchanged. A single PUF protein repeat is sufficient to induce broadening of specificity. Changes in protein architecture, such as alterations in curvature, may lead to evolution of mRNA regulatory networks.« less
1990-05-01
Sta58 antigen and the Sta56 strain- GroES, C. burnetii HtpA, Mycobacterium tuberculosis 12- specific major antigen of R. tsutsugamushi (strain Karp...kb HindlIl fragment carrying the gene for the Sta58 tuberculosis, and Mycobacterium smegmatis (65-kDa anti- protein was subjected to DNA sequence...the Hsp6O and HsplO proteins. R. tsu., R. isutsugamushi; M. lep., Mvtcobacteriutn leprae : C. bur., C. burneiii; Synech.. Synechococcus strain 6301; T
Albornos, Lucía; Martín, Ignacio; Iglesias, Rebeca; Jiménez, Teresa; Labrador, Emilia; Dopico, Berta
2012-11-07
Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found.
2012-01-01
Background Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. Results ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. Conclusions We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found. PMID:23134664
Strain conformation controls the specificity of cross-species prion transmission in the yeast model.
Grizel, Anastasia V; Rubel, Aleksandr A; Chernoff, Yury O
2016-07-03
Transmissible self-assembled fibrous cross-β polymer infectious proteins (prions) cause neurodegenerative diseases in mammals and control non-Mendelian heritable traits in yeast. Cross-species prion transmission is frequently impaired, due to sequence differences in prion-forming proteins. Recent studies of prion species barrier on the model of closely related yeast species show that colocalization of divergent proteins is not sufficient for the cross-species prion transmission, and that an identity of specific amino acid sequences and a type of prion conformational variant (strain) play a major role in the control of transmission specificity. In contrast, chemical compounds primarily influence transmission specificity via favoring certain strain conformations, while the species origin of the host cell has only a relatively minor input. Strain alterations may occur during cross-species prion conversion in some combinations. The model is discussed which suggests that different recipient proteins can acquire different spectra of prion strain conformations, which could be either compatible or incompatible with a particular donor strain.
Context influences on TALE–DNA binding revealed by quantitative profiling
Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.
2015-01-01
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L
2015-06-11
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
MIPS: analysis and annotation of proteins from whole genomes.
Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Method for nucleic acid hybridization using single-stranded DNA binding protein
Tabor, Stanley; Richardson, Charles C.
1996-01-01
Method of nucleic acid hybridization for detecting the presence of a specific nucleic acid sequence in a population of different nucleic acid sequences using a nucleic acid probe. The nucleic acid probe hybridizes with the specific nucleic acid sequence but not with other nucleic acid sequences in the population. The method includes contacting a sample (potentially including the nucleic acid sequence) with the nucleic acid probe under hybridizing conditions in the presence of a single-stranded DNA binding protein provided in an amount which stimulates renaturation of a dilute solution (i.e., one in which the t.sub.1/2 of renaturation is longer than 3 weeks) of single-stranded DNA greater than 500 fold (i.e., to a t.sub.1/2 less than 60 min, preferably less than 5 min, and most preferably about 1 min.) in the absence of nucleotide triphosphates.
Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding.
Shahi, Payam; Kim, Samuel C; Haliburton, John R; Gartner, Zev J; Abate, Adam R
2017-03-14
Proteins are the primary effectors of cellular function, including cellular metabolism, structural dynamics, and information processing. However, quantitative characterization of proteins at the single-cell level is challenging due to the tiny amount of protein available. Here, we present Abseq, a method to detect and quantitate proteins in single cells at ultrahigh throughput. Like flow and mass cytometry, Abseq uses specific antibodies to detect epitopes of interest; however, unlike these methods, antibodies are labeled with sequence tags that can be read out with microfluidic barcoding and DNA sequencing. We demonstrate this novel approach by characterizing surface proteins of different cell types at the single-cell level and distinguishing between the cells by their protein expression profiles. DNA-tagged antibodies provide multiple advantages for profiling proteins in single cells, including the ability to amplify low-abundance tags to make them detectable with sequencing, to use molecular indices for quantitative results, and essentially limitless multiplexing.
Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding
NASA Astrophysics Data System (ADS)
Shahi, Payam; Kim, Samuel C.; Haliburton, John R.; Gartner, Zev J.; Abate, Adam R.
2017-03-01
Proteins are the primary effectors of cellular function, including cellular metabolism, structural dynamics, and information processing. However, quantitative characterization of proteins at the single-cell level is challenging due to the tiny amount of protein available. Here, we present Abseq, a method to detect and quantitate proteins in single cells at ultrahigh throughput. Like flow and mass cytometry, Abseq uses specific antibodies to detect epitopes of interest; however, unlike these methods, antibodies are labeled with sequence tags that can be read out with microfluidic barcoding and DNA sequencing. We demonstrate this novel approach by characterizing surface proteins of different cell types at the single-cell level and distinguishing between the cells by their protein expression profiles. DNA-tagged antibodies provide multiple advantages for profiling proteins in single cells, including the ability to amplify low-abundance tags to make them detectable with sequencing, to use molecular indices for quantitative results, and essentially limitless multiplexing.
Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding
Shahi, Payam; Kim, Samuel C.; Haliburton, John R.; Gartner, Zev J.; Abate, Adam R.
2017-01-01
Proteins are the primary effectors of cellular function, including cellular metabolism, structural dynamics, and information processing. However, quantitative characterization of proteins at the single-cell level is challenging due to the tiny amount of protein available. Here, we present Abseq, a method to detect and quantitate proteins in single cells at ultrahigh throughput. Like flow and mass cytometry, Abseq uses specific antibodies to detect epitopes of interest; however, unlike these methods, antibodies are labeled with sequence tags that can be read out with microfluidic barcoding and DNA sequencing. We demonstrate this novel approach by characterizing surface proteins of different cell types at the single-cell level and distinguishing between the cells by their protein expression profiles. DNA-tagged antibodies provide multiple advantages for profiling proteins in single cells, including the ability to amplify low-abundance tags to make them detectable with sequencing, to use molecular indices for quantitative results, and essentially limitless multiplexing. PMID:28290550
Liu, Zhong-Yuan; Wang, Yun; Lü, Guo-Dong; Wang, Xian-Lei; Zhang, Fu-Chun; Ma, Ji
2006-12-01
The partial cDNA sequence coding for the antifreeze proteins in the Tenebrio molitor was obtained by RT-PCR. Sequence analysis revealed nine putative cDNAs with a high degree of homology to Tenebrio molitor antifreeze proteins. The recombinant pGEX-4T-1-tmafp-XJ430 was introduced into E. coli BL21 to induce a GST fusion protein by IPTG. SDS-PAGE of the fusion protein demonstrated that the antifreeze protein migrated at a size of 38 kDa. The immunization was performed by intra-muscular injection of pCDNA3-tmafp-XJ430, and then antiserum was detected by ELISA. The titer of the antibody was 1:2,000. Western blotting analysis showed the antiserum was specific against the antifreeze protein. This finding could lead to further investigation of the properties and function of antifreeze proteins.
Liang, Xue-Hai; Shen, Wen; Crooke, Stanley T
2017-01-01
A number of diseases are caused by low levels of key proteins; therefore, increasing the amount of specific proteins in human bodies is of therapeutic interest. Protein expression is downregulated by some structural or sequence elements present in the 5' UTR of mRNAs, such as upstream open reading frames (uORF). Translation initiation from uORF(s) reduces translation from the downstream primary ORF encoding the main protein product in the same mRNA, leading to a less efficient protein expression. Therefore, it is possible to use antisense oligonucleotides (ASOs) to specifically inhibit translation of the uORF by base-pairing with the uAUG region of the mRNA, redirecting translation machinery to initiate from the primary AUG site. Here we review the recent findings that translation of specific mRNAs can be enhanced using ASOs targeting uORF regions. Appropriately designed and optimized ASOs are highly specific, and they act in a sequence- and position-dependent manner, with very minor off-target effects. Protein levels can be increased using this approach in different types of human and mouse cells, and, importantly, also in mice. Since uORFs are present in around half of human mRNAs, the uORF-targeting ASOs may thus have valuable potential as research tools and as therapeutics to increase the levels of proteins for a variety of genes.
Sequence signatures of allosteric proteins towards rational design.
Namboodiri, Saritha; Verma, Chandra; Dhar, Pawan K; Giuliani, Alessandro; Nair, Achuthsankar S
2010-12-01
Allostery is the phenomenon of changes in the structure and activity of proteins that appear as a consequence of ligand binding at sites other than the active site. Studying mechanistic basis of allostery leading to protein design with predetermined functional endpoints is an important unmet need of synthetic biology. Here, we screened the amino acid sequence landscape in search of sequence-signatures of allostery using Recurrence Quantitative Analysis (RQA) method. A characteristic vector, comprised of 10 features extracted from RQA was defined for amino acid sequences. Using Principal Component Analysis, four factors were found to be important determinants of allosteric behavior. Our sequence-based predictor method shows 82.6% accuracy, 85.7% sensitivity and 77.9% specificity with the current dataset. Further, we show that Laminarity-Mean-hydrophobicity representing repeated hydrophobic patches is the most crucial indicator of allostery. To our best knowledge this is the first report that describes sequence determinants of allostery based on hydrophobicity. As an outcome of these findings, we plan to explore possibility of inducing allostery in proteins.
FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant
2016-01-01
A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
Sequence-dependent DNA deformability studied using molecular dynamics simulations.
Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori
2007-01-01
Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.
Kitahara, Kei; Kajiura, Akimasa; Sato, Neuza Satomi; Suzuki, Tsutomu
2007-01-01
Ribosomal protein L2 is a highly conserved primary 23S rRNA-binding protein. L2 specifically recognizes the internal bulge sequence in Helix 66 (H66) of 23S rRNA and is localized to the intersubunit space through formation of bridge B7b with 16S rRNA. The L2-binding site in H66 is highly conserved in prokaryotic ribosomes, whereas the corresponding site in eukaryotic ribosomes has evolved into distinct classes of sequences. We performed a systematic genetic selection of randomized rRNA sequences in Escherichia coli, and isolated 20 functional variants of the L2-binding site. The isolated variants consisted of eukaryotic sequences, in addition to prokaryotic sequences. These results suggest that L2/L8e does not recognize a specific base sequence of H66, but rather a characteristic architecture of H66. The growth phenotype of the isolated variants correlated well with their ability of subunit association. Upon continuous cultivation of a deleterious variant, we isolated two spontaneous mutations within domain IV of 23S rRNA that compensated for its weak subunit association, and alleviated its growth defect, implying that functional interactions between intersubunit bridges compensate ribosomal function. PMID:17553838
Specific identification of Bacillus anthracis strains
NASA Astrophysics Data System (ADS)
Krishnamurthy, Thaiya; Deshpande, Samir; Hewel, Johannes; Liu, Hongbin; Wick, Charles H.; Yates, John R., III
2007-01-01
Accurate identification of human pathogens is the initial vital step in treating the civilian terrorism victims and military personnel afflicted in biological threat situations. We have applied a powerful multi-dimensional protein identification technology (MudPIT) along with newly generated software termed Profiler to identify the sequences of specific proteins observed for few strains of Bacillus anthracis, a human pathogen. Software termed Profiler was created to initially screen the MudPIT data of B. anthracis strains and establish the observed proteins specific for its strains. A database was also generated using Profiler containing marker proteins of B. anthracis and its strains, which in turn could be used for detecting the organism and its corresponding strains in samples. Analysis of the unknowns by our methodology, combining MudPIT and Profiler, led to the accurate identification of the anthracis strains present in samples. Thus, a new approach for the identification of B. anthracis strains in unknown samples, based on the molecular mass and sequences of marker proteins, has been ascertained.
Siede, W; Friedberg, E C
1992-03-01
In the yeast Saccharomyces cerevisiae the RAD2 gene is absolutely required for damage-specific incision of DNA during nucleotide excision repair and is inducible by DNA-damaging agents. In the present study we correlated sensitivity to killing by DNA-damaging agents with the deletion of previously defined specific promoter elements. Deletion of the element DRE2 increased the UV sensitivity of cells in both the G1/early S and S/G2 phases of the cell cycle as well as in stationary phase. On the other hand, increased UV sensitivity associated with deletion of the sequence-related element DRE1 was restricted to cells irradiated in G1/S. Specific binding of protein(s) to the promoter elements DRE1 and DRE2 was observed under non-inducing conditions using gel retardation assays. Exposure of cells to DNA-damaging agents resulted in increased protein binding that was dependent on de novo protein synthesis.
Dissecting substrate specificities of the mitochondrial AFG3L2 protease.
Ding, Bojian; Martin, Dwight W; Rampello, Anthony J; Glynn, Steven E
2018-06-22
Human AFG3L2 is a compartmental AAA+ protease that performs ATP-fueled degradation at the matrix face of the inner mitochondrial membrane. Identifying how AFG3L2 selects substrates from the diverse complement of matrix-localized proteins is essential for understanding mitochondrial protein biogenesis and quality control. Here, we create solubilized forms of AFG3L2 to examine the enzyme's substrate specificity mechanisms. We show that conserved residues within the pre-sequence of the mitochondrial ribosomal protein, MrpL32, target the subunit to the protease for processing into a mature form. Moreover, these residues can act as a degron, delivering diverse model proteins to AFG3L2 for degradation. By determining the sequence of degra-dation products from multiple substrates using mass spectrometry, we construct a peptidase specificity pro-file that displays constrained product lengths and is dominated by the identity of the residue at the P1' posi-tion, with a strong preference for hydrophobic and small polar residues. This specificity profile is validated by examining the cleavage of both fluorogenic reporter peptides and full polypeptide substrates bearing different P1' residues. Together, these results demonstrate that AFG3L2 contains multiple modes of specificity, dis-criminating between potential substrates by recognizing accessible degron sequences, and performing peptide bond cleavage at preferred patterns of residues within the compartmental chamber.
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Camproux, A C; Tufféry, P
2005-08-05
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
Marciniak, R A; Garcia-Blanco, M A; Sharp, P A
1990-01-01
Human immunodeficiency virus type 1 RNAs contain a sequence, trans-activation-response (TAR) element, which is required for tat protein-mediated trans-activation of viral gene expression. We have identified a nuclear protein from extracts of HeLa cells that binds to the TAR element RNA in a sequence-specific manner. The binding of this 68-kDa polypeptide was detected by UV cross-linking proteins to TAR element RNA transcribed in vitro. Competition experiments were performed by using a partially purified preparation of the protein to quantify the relative binding affinities of TAR element RNA mutants. The binding affinity of the TAR mutants paralleled the reported ability of those mutants to support tat trans-activation in vivo. We propose that this cellular protein moderates TAR activity in vivo. Images PMID:2333305
Insights in connecting phenotypes in bacteria to coevolutionary information
NASA Astrophysics Data System (ADS)
Cheng, Ryan; Morcos, Faruck; Hayes, Ryan; Helm, Rodney; Levine, Herbert; Onuchic, Jose
It has long been known that protein sequences are far from random. These sequences have been evolutionarily selected to maintain their ability to fold into stable, three-dimensional folded structures as well as their ability to form macromolecular assemblies, perform catalytic functions, etc. For these reasons, there exist quantifiable mutational patterns in the collection of sequence data for a protein family arising from the need to maintain favorable residue-residue interactions to facilitate folding as well as cellular function. Here, we focus on studying the correlated mutational patterns that give rise to interaction specificity in bacterial two-component signaling (TCS) systems. TCS proteins have evolved to be able to preferentially bind and transfer a phosphate group to their signaling partner while avoiding phosphotransfer with non-partners. We infer a Potts model Hamiltonian governing the correlated mutational patterns that are observed in the sequence data of TCS partners and apply this model to recently published in vivo mutational data. Our findings further support the notion that statistical models built from sequence data can be used to predict bacterial phenotypes as well as engineer interaction specificity between non-partner TCS proteins. This research has been supported by the NSF INSPIRE Award (MCB-1241332) and by the CTBP sponsored by the NSF (Grant PHY- 1427654).
Pal Choudhury, Pabitra
2017-01-01
Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850
Rajendran, Senthilnathan; Jothi, Arunachalam
2018-05-16
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Human HOXA5 homeodomain enhances protein transduction and its application to vascular inflammation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Ji Young; Park, Kyoung sook; Cho, Eun Jung
2011-07-01
Highlights: {yields} We have developed an E. coli protein expression vector including human specific gene sequences for protein cellular delivery. {yields} The plasmid was generated by ligation the nucleotides 770-817 of the homeobox A5 mRNA sequence. {yields} HOXA5-APE1/Ref-1 inhibited TNF-alpha-induced monocyte adhesion to endothelial cells. {yields} Human HOXA5-PTD vector provides a powerful research tools for uncovering cellular functions of proteins or for the generation of human PTD-containing proteins. -- Abstract: Cellular protein delivery is an emerging technique by which exogenous recombinant proteins are delivered into mammalian cells across the membrane. We have developed an Escherichia coli expression vector including humanmore » specific gene sequences for protein cellular delivery. The plasmid was generated by ligation the nucleotides 770-817 of the homeobox A5 mRNA sequence which was matched with protein transduction domain (PTD) of homeodomain protein A5 (HOXA5) into pET expression vector. The cellular uptake of HOXA5-PTD-EGFP was detected in 1 min and its transduction reached a maximum at 1 h within cell lysates. The cellular uptake of HOXA5-EGFP at 37 {sup o}C was greater than in 4 {sup o}C. For study for the functional role of human HOXA5-PTD, we purified HOXA5-APE1/Ref-1 and applied it on monocyte adhesion. Pretreatment with HOXA5-APE1/Ref-1 (100 nM) inhibited TNF-{alpha}-induced monocyte adhesion to endothelial cells, compared with HOXA5-EGFP. Taken together, our data suggested that human HOXA5-PTD vector provides a powerful research tools for uncovering cellular functions of proteins or for the generation of human PTD-containing proteins.« less
Structure based re-design of the binding specificity of anti-apoptotic Bcl-xL
Chen, T. Scott; Palacios, Hector; Keating, Amy E.
2012-01-01
Many native proteins are multi-specific and interact with numerous partners, which can confound analysis of their functions. Protein design provides a potential route to generating synthetic variants of native proteins with more selective binding profiles. Re-designed proteins could be used as research tools, diagnostics or therapeutics. In this work, we used a library screening approach to re-engineer the multi-specific anti-apoptotic protein Bcl-xL to remove its interactions with many of its binding partners, making it a high affinity and selective binder of the BH3 region of pro-apoptotic protein Bad. To overcome the enormity of the potential Bcl-xL sequence space, we developed and applied a computational/experimental framework that used protein structure information to generate focused combinatorial libraries. Sequence features were identified using structure-based modeling, and an optimization algorithm based on integer programming was used to select degenerate codons that maximally covered these features. A constraint on library size was used to ensure thorough sampling. Using yeast surface display to screen a designed library of Bcl-xL variants, we successfully identified a protein with ~1,000-fold improvement in binding specificity for the BH3 region of Bad over the BH3 region of Bim. Although negative design was targeted only against the BH3 region of Bim, the best re-designed protein was globally specific against binding to 10 other peptides corresponding to native BH3 motifs. Our design framework demonstrates an efficient route to highly specific protein binders and may readily be adapted for application to other design problems. PMID:23154169
Smith, Colin A; Kortemme, Tanja
2011-01-01
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
DWARF – a data warehouse system for analyzing protein families
Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen
2006-01-01
Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801
DNA binding specificity of the basic-helix-loop-helix protein MASH-1.
Meierhan, D; el-Ariss, C; Neuenschwander, M; Sieber, M; Stackhouse, J F; Allemann, R K
1995-09-05
Despite the high degree of sequence similarity in their basic-helix-loop-helix (BHLH) domains, MASH-1 and MyoD are involved in different biological processes. In order to define possible differences between the DNA binding specificities of these two proteins, we investigated the DNA binding properties of MASH-1 by circular dichroism spectroscopy and by electrophoretic mobility shift assays (EMSA). Upon binding to DNA, the BHLH domain of MASH-1 underwent a conformational change from a mainly unfolded to a largely alpha-helical form, and surprisingly, this change was independent of the specific DNA sequence. The same conformational transition could be induced by the addition of 20% 2,2,2-trifluoroethanol. The apparent dissociation constants (KD) of the complexes of full-length MASH-1 with various oligonucleotides were determined from half-saturation points in EMSAs. MASH-1 bound as a dimer to DNA sequences containing an E-box with high affinity KD = 1.4-4.1 x 10(-14) M2). However, the specificity of DNA binding was low. The dissociation constant for the complex between MASH-1 and the highest affinity E-box sequence (KD = 1.4 x 10(-14) M2) was only a factor of 10 smaller than for completely unrelated DNA sequences (KD = approximately 1 x 10(-13) M2). The DNA binding specificity of MASH-1 was not significantly increased by the formation of an heterodimer with the ubiquitous E12 protein. MASH-1 and MyoD displayed similar binding site preferences, suggesting that their different target gene specificities cannot be explained solely by differential DNA binding. An explanation for these findings is provided on the basis of the known crystal structure of the BHLH domain of MyoD.
The solution structure of the pentatricopeptide repeat protein PPR10 upon binding atpH RNA
Gully, Benjamin S.; Cowieson, Nathan; Stanley, Will A.; Shearston, Kate; Small, Ian D.; Barkan, Alice; Bond, Charles S.
2015-01-01
The pentatricopeptide repeat (PPR) protein family is a large family of RNA-binding proteins that is characterized by tandem arrays of a degenerate 35-amino-acid motif which form an α-solenoid structure. PPR proteins influence the editing, splicing, translation and stability of specific RNAs in mitochondria and chloroplasts. Zea mays PPR10 is amongst the best studied PPR proteins, where sequence-specific binding to two RNA transcripts, atpH and psaJ, has been demonstrated to follow a recognition code where the identity of two amino acids per repeat determines the base-specificity. A recently solved ZmPPR10:psaJ complex crystal structure suggested a homodimeric complex with considerably fewer sequence-specific protein–RNA contacts than inferred previously. Here we describe the solution structure of the ZmPPR10:atpH complex using size-exclusion chromatography-coupled synchrotron small-angle X-ray scattering (SEC-SY-SAXS). Our results support prior evidence that PPR10 binds RNA as a monomer, and that it does so in a manner that is commensurate with a canonical and predictable RNA-binding mode across much of the RNA–protein interface. PMID:25609698
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Pitre, S; North, C; Alamgir, M; Jessulat, M; Chan, A; Luo, X; Green, J R; Dumontier, M; Dehne, F; Golshani, A
2008-08-01
Protein-protein interaction (PPI) maps provide insight into cellular biology and have received considerable attention in the post-genomic era. While large-scale experimental approaches have generated large collections of experimentally determined PPIs, technical limitations preclude certain PPIs from detection. Recently, we demonstrated that yeast PPIs can be computationally predicted using re-occurring short polypeptide sequences between known interacting protein pairs. However, the computational requirements and low specificity made this method unsuitable for large-scale investigations. Here, we report an improved approach, which exhibits a specificity of approximately 99.95% and executes 16,000 times faster. Importantly, we report the first all-to-all sequence-based computational screen of PPIs in yeast, Saccharomyces cerevisiae in which we identify 29,589 high confidence interactions of approximately 2 x 10(7) possible pairs. Of these, 14,438 PPIs have not been previously reported and may represent novel interactions. In particular, these results reveal a richer set of membrane protein interactions, not readily amenable to experimental investigations. From the novel PPIs, a novel putative protein complex comprised largely of membrane proteins was revealed. In addition, two novel gene functions were predicted and experimentally confirmed to affect the efficiency of non-homologous end-joining, providing further support for the usefulness of the identified PPIs in biological investigations.
NASA Astrophysics Data System (ADS)
Agarwal, Sonya; Döring, Kristina; Gierusz, Leszek A.; Iyer, Pooja; Lane, Fiona M.; Graham, James F.; Goldmann, Wilfred; Pinheiro, Teresa J. T.; Gill, Andrew C.
2015-10-01
The β2-α2 loop of PrPC is a key modulator of disease-associated prion protein misfolding. Amino acids that differentiate mouse (Ser169, Asn173) and deer (Asn169, Thr173) PrPC appear to confer dramatically different structural properties in this region and it has been suggested that amino acid sequences associated with structural rigidity of the loop also confer susceptibility to prion disease. Using mouse recombinant PrP, we show that mutating residue 173 from Asn to Thr alters protein stability and misfolding only subtly, whilst changing Ser to Asn at codon 169 causes instability in the protein, promotes oligomer formation and dramatically potentiates fibril formation. The doubly mutated protein exhibits more complex folding and misfolding behaviour than either single mutant, suggestive of differential effects of the β2-α2 loop sequence on both protein stability and on specific misfolding pathways. Molecular dynamics simulation of protein structure suggests a key role for the solvent accessibility of Tyr168 in promoting molecular interactions that may lead to prion protein misfolding. Thus, we conclude that ‘rigidity’ in the β2-α2 loop region of the normal conformer of PrP has less effect on misfolding than other sequence-related effects in this region.
Arur, Swathi; Schedl, Tim
2014-01-01
Post-translational modifications alter protein structure, affecting activity, stability, localization and/or binding partners. Antibodies that specifically recognize post-translationally modified proteins have a number of uses including immuno-cytochemistry and immuno-precipitation of the modified protein to purify protein-protein and protein-nucleic acid complexes. However, antibodies directed at modified sites on individual proteins are often non-specific. Here we describe a protocol to purify polyclonal antibodies that specifically detect the modified protein of interest. The approach uses iterative rounds of subtraction and affinity purification, using stringent washes to remove antibodies that recognize the unmodified protein and low sequence complexity epitopes containing the modified amino acid. Dot and western blots assays are employed to assess antibody preparation specificity. The approach is designed to overcome the common occurrence that a single round of subtraction and affinity purification is not sufficient to obtain a modified protein specific antibody preparation. One full round of antibody purification and specificity testing takes 6 days of discontinuous time. PMID:24457330
The COG database: a tool for genome-scale analysis of protein functions and evolution
Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.
2000-01-01
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175
RaptorX server: a resource for template-based protein structure modeling.
Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo
2014-01-01
Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.
Upadhyay, Atul Kumar; Sowdhamini, Ramanathan
2016-01-01
3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
Aureobasidium pullulans xylanase, gene and signal sequence
Xin-Liang, Li; Ljungdahl, Lars G.
1997-01-01
A xylanase from Aureobasidium pullulans having a high specific activity is provided as well as a signal protein for controlling excretion into cell culture medium of proteins to which it is attached. DNA encoding these proteins is also provided.
ZifBASE: a database of zinc finger proteins and associated resources.
Jayakanthan, Mannu; Muthukumaran, Jayaraman; Chandrasekar, Sanniyasi; Chawla, Konika; Punetha, Ankita; Sundar, Durai
2009-09-09
Information on the occurrence of zinc finger protein motifs in genomes is crucial to the developing field of molecular genome engineering. The knowledge of their target DNA-binding sequences is vital to develop chimeric proteins for targeted genome engineering and site-specific gene correction. There is a need to develop a computational resource of zinc finger proteins (ZFP) to identify the potential binding sites and its location, which reduce the time of in vivo task, and overcome the difficulties in selecting the specific type of zinc finger protein and the target site in the DNA sequence. ZifBASE provides an extensive collection of various natural and engineered ZFP. It uses standard names and a genetic and structural classification scheme to present data retrieved from UniProtKB, GenBank, Protein Data Bank, ModBase, Protein Model Portal and the literature. It also incorporates specialized features of ZFP including finger sequences and positions, number of fingers, physiochemical properties, classes, framework, PubMed citations with links to experimental structures (PDB, if available) and modeled structures of natural zinc finger proteins. ZifBASE provides information on zinc finger proteins (both natural and engineered ones), the number of finger units in each of the zinc finger proteins (with multiple fingers), the synergy between the adjacent fingers and their positions. Additionally, it gives the individual finger sequence and their target DNA site to which it binds for better and clear understanding on the interactions of adjacent fingers. The current version of ZifBASE contains 139 entries of which 89 are engineered ZFPs, containing 3-7F totaling to 296 fingers. There are 50 natural zinc finger protein entries ranging from 2-13F, totaling to 307 fingers. It has sequences and structures from literature, Protein Data Bank, ModBase and Protein Model Portal. The interface is cross linked to other public databases like UniprotKB, PDB, ModBase and Protein Model Portal and PubMed for making it more informative. A database is established to maintain the information of the sequence features, including the class, framework, number of fingers, residues, position, recognition site and physio-chemical properties (molecular weight, isoelectric point) of both natural and engineered zinc finger proteins and dissociation constant of few. ZifBASE can provide more effective and efficient way of accessing the zinc finger protein sequences and their target binding sites with the links to their three-dimensional structures. All the data and functions are available at the advanced web-based search interface http://web.iitd.ac.in/~sundar/zifbase.
Van Hoeck, Arne; Horemans, Nele; Monsieurs, Pieter; Cao, Hieu Xuan; Vandenhove, Hildegarde; Blust, Ronny
2015-01-01
Freshwater duckweed, comprising the smallest, fastest growing and simplest macrophytes has various applications in agriculture, phytoremediation and energy production. Lemna minor, the so-called common duckweed, is a model system of these aquatic plants for ecotoxicological bioassays, genetic transformation tools and industrial applications. Given the ecotoxic relevance and high potential for biomass production, whole-genome information of this cosmopolitan duckweed is needed. The 472 Mbp assembly of the L. minor genome (2n = 40; estimated 481 Mbp; 98.1 %) contains 22,382 protein-coding genes and 61.5 % repetitive sequences. The repeat content explains 94.5 % of the genome size difference in comparison with the greater duckweed, Spirodela polyrhiza (2n = 40; 158 Mbp; 19,623 protein-coding genes; and 15.79 % repetitive sequences). Comparison of proteins from other monocot plants, protein ortholog identification, OrthoMCL, suggests 1356 duckweed-specific groups (3367 proteins, 15.0 % total L. minor proteins) and 795 Lemna-specific groups (2897 proteins, 12.9 % total L. minor proteins). Interestingly, proteins involved in biosynthetic processes in response to various stimuli and hydrolase activities are enriched in the Lemna proteome in comparison with the Spirodela proteome. The genome sequence and annotation of L. minor protein-coding genes provide new insights in biological understanding and biomass production applications of Lemna species.
Millar, A H; Knorpp, C; Leaver, C J; Hill, S A
1998-01-01
The pyruvate dehydrogenase complex (mPDC) from potato (Solanum tuberosum cv. Romano) tuber mitochondria was purified 40-fold to a specific activity of 5.60 micromol/min per mg of protein. The activity of the complex depended on pyruvate, divalent cations, NAD+ and CoA and was competitively inhibited by both NADH and acetyl-CoA. SDS/PAGE revealed the complex consisted of seven polypeptide bands with apparent molecular masses of 78, 60, 58, 55, 43, 41 and 37 kDa. N-terminal sequencing revealed that the 78 kDa protein was dihydrolipoamide transacetylase (E2), the 58 kDa protein was dihydrolipoamide dehydrogenase (E3), the 43 and 41 kDa proteins were alpha subunits of pyruvate dehydrogenase, and the 37 kDa protein was the beta subunit of pyruvate dehydrogenase. N-terminal sequencing of the 55 kDa protein band yielded two protein sequences: one was another E3; the other was similar to the sequence of E2 from plant and yeast sources but was distinctly different from the sequence of the 78 kDa protein. Incubation of the mPDC with [2-14C]pyruvate resulted in the acetylation of both the 78 and 55 kDa proteins. PMID:9729464
PUF Proteins: Cellular Functions and Potential Applications.
Kiani, Seyed Jalal; Taheri, Tahereh; Rafati, Sima; Samimi-Rad, Katayoun
2017-01-01
RNA-binding proteins play critical roles in the regulation of gene expression. Among several families of RNA-binding proteins, PUF (Pumilio and FBF) proteins have been the subject of extensive investigations, as they can bind RNA in a sequence-specific manner and they are evolutionarily conserved among a wide range of organisms. The outstanding feature of these proteins is a highly conserved RNA-binding domain, which is known as the Pumilio-homology domain (PUM-HD) that mostly consists of eight tandem repeats. Each repeat recognizes an RNA base with a simple three-letter code that can be programmed in order to change the sequence-specificity of the protein. Using this tailored architecture, researchers have been able to change the specificity of the PUM-HD and target desired transcripts in the cell, even in subcellular compartments. The potential applications of this versatile tool in molecular cell biology seem unbounded and the use of these factors in pharmaceutics might be an interesting field of study in near future. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
In vitro fluorescence studies of transcription factor IIB-DNA interaction.
Górecki, Andrzej; Figiel, Małgorzata; Dziedzicka-Wasylewska, Marta
2015-01-01
General transcription factor TFIIB is one of the basal constituents of the preinitiation complex of eukaryotic RNA polymerase II, acting as a bridge between the preinitiation complex and the polymerase, and binding promoter DNA in an asymmetric manner, thereby defining the direction of the transcription. Methods of fluorescence spectroscopy together with circular dichroism spectroscopy were used to observe conformational changes in the structure of recombinant human TFIIB after binding to specific DNA sequence. To facilitate the exploration of the structural changes, several site-directed mutations have been introduced altering the fluorescence properties of the protein. Our observations showed that binding of specific DNA sequences changed the protein structure and dynamics, and TFIIB may exist in two conformational states, which can be described by a different microenvironment of W52. Fluorescence studies using both intrinsic and exogenous fluorophores showed that these changes significantly depended on the recognition sequence and concerned various regions of the protein, including those interacting with other transcription factors and RNA polymerase II. DNA binding can cause rearrangements in regions of proteins interacting with the polymerase in a manner dependent on the recognized sequences, and therefore, influence the gene expression.
Applying the Concept of Peptide Uniqueness to Anti-Polio Vaccination.
Kanduc, Darja; Fasano, Candida; Capone, Giovanni; Pesce Delfino, Antonella; Calabrò, Michele; Polimeno, Lorenzo
2015-01-01
Although rare, adverse events may associate with anti-poliovirus vaccination thus possibly hampering global polio eradication worldwide. To design peptide-based anti-polio vaccines exempt from potential cross-reactivity risks and possibly able to reduce rare potential adverse events such as the postvaccine paralytic poliomyelitis due to the tendency of the poliovirus genome to mutate. Proteins from poliovirus type 1, strain Mahoney, were analyzed for amino acid sequence identity to the human proteome at the pentapeptide level, searching for sequences that (1) have zero percent of identity to human proteins, (2) are potentially endowed with an immunologic potential, and (3) are highly conserved among poliovirus strains. Sequence analyses produced a set of consensus epitopic peptides potentially able to generate specific anti-polio immune responses exempt from cross-reactivity with the human host. Peptide sequences unique to poliovirus proteins and conserved among polio strains might help formulate a specific and universal anti-polio vaccine able to react with multiple viral strains and exempt from the burden of possible cross-reactions with human proteins. As an additional advantage, using a peptide-based vaccine instead of current anti-polio DNA vaccines would eliminate the rare post-polio poliomyelitis cases and other disabling symptoms that may appear following vaccination.
MIPS: analysis and annotation of proteins from whole genomes
Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
Okuda, A; Imagawa, M; Maeda, Y; Sakai, M; Muramatsu, M
1989-10-05
We have recently identified a typical enhancer, termed GPEI, located about 2.5 kilobases upstream from the transcription initiation site of the rat glutathione transferase P gene. Analyses of 5' and 3' deletion mutants revealed that the cis-acting sequence of GPEI contained the phorbol 12-O-tetradecanoate 13-acetate responsive element (TRE)-like sequence in it. For the maximal activity, however, GPEI required an adjacent upstream sequence of about 19 base pairs in addition to the TRE-like sequence. With the DNA binding gel-shift assay, we could detect protein(s) that specifically binds to the TRE-like sequence of GPEI fragment, which was possibly c-jun.c-fos complex or a similar protein complex. The sequence immediately upstream of the TRE-like sequence did not have any activity by itself, but augmented the latter activity by about 5-fold.
Daher, Rana K; Stewart, Gale; Boissinot, Maurice; Boudreau, Dominique K; Bergeron, Michel G
2015-04-01
Recombinase polymerase amplification (RPA) technology relies on three major proteins, recombinase proteins, single-strand binding proteins, and polymerases, to specifically amplify nucleic acid sequences in an isothermal format. The performance of RPA with respect to sequence mismatches of closely-related non-target molecules is not well documented and the influence of the number and distribution of mismatches in DNA sequences on RPA amplification reaction is not well understood. We investigated the specificity of RPA by testing closely-related species bearing naturally occurring mismatches for the tuf gene sequence of Pseudomonas aeruginosa and/or Mycobacterium tuberculosis and for the cfb gene sequence of Streptococcus agalactiae. In addition, the impact of the number and distribution of mismatches on RPA efficiency was assessed by synthetically generating 14 types of mismatched forward primers for detecting five bacterial species of high diagnostic relevance such as Clostridium difficile, Staphylococcus aureus, S. agalactiae, P. aeruginosa, and M. tuberculosis as well as Bacillus atropheus subsp. globigii for which we use the spores as internal control in diagnostic assays. A total of 87 mismatched primers were tested in this study. We observed that target specific RPA primers with mismatches (n > 1) at their 3'extrimity hampered RPA reaction. In addition, 3 mismatches covering both extremities and the center of the primer sequence negatively affected RPA yield. We demonstrated that the specificity of RPA was multifactorial. Therefore its application in clinical settings must be selected and validated a priori. We recommend that the selection of a target gene must consider the presence of closely-related non-target genes. It is advisable to choose target regions with a high number of mismatches (≥36%, relative to the size of amplicon) with respect to closely-related species and the best case scenario would be by choosing a unique target gene. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
McMillan, R. Andrew; Howard, Jeanie; Zaluzec, Nestor J.; Kagawa, Hiromi K.; Li, Yi-Fen; Paavola, Chad D.; Trent, Jonathan D.
2004-01-01
Self-assembling biomolecules that form highly ordered structures have attracted interest as potential alternatives to conventional lithographic processes for patterning materials. Here we introduce a general technique for patterning materials on the nanoscale using genetically modified protein cage structures called chaperonins that self-assemble into crystalline templates. Constrained chemical synthesis of transition metal nanoparticles is specific to templates genetically functionalized with poly-Histidine sequences. These arrays of materials are ordered by the nanoscale structure of the crystallized protein. This system may be easily adapted to pattern a variety of materials given the rapidly growing list of peptide sequences selected by screening for specificity for inorganic materials.
NASA Technical Reports Server (NTRS)
Cai, X.; Henry, R. L.; Takemoto, L. J.; Guikema, J. A.; Wong, P. P.; Spooner, B. S. (Principal Investigator)
1992-01-01
The amino acid sequences of the beta and gamma subunit polypeptides of glutamine synthetase from bean (Phaseolus vulgaris L.) root nodules are very similar. However, there are small regions within the sequences that are significantly different between the two polypeptides. The sequences between amino acids 2 and 9 and between 264 and 274 are examples. Three peptides (gamma 2-9, gamma 264-274, and beta 264-274) corresponding to these sequences were synthesized. Antibodies against these peptides were raised in rabbits and purified with corresponding peptide-Sepharose affinity chromatography. Western blot analysis of polyacrylamide gel electrophoresis of bean nodule proteins demonstrated that the anti-beta 264-274 antibodies reacted specifically with the beta polypeptide and the anti-gamma 264-274 and anti-gamma 2-9 antibodies reacted specifically with the gamma polypeptide of the native and denatured glutamine synthetase. These results showed the feasibility of using synthetic peptides in developing antibodies that are capable of distinguishing proteins with similar primary structures.
Uhrig, R Glen; Kerk, David; Moorhead, Greg B
2013-12-01
Protein phosphorylation is a reversible regulatory process catalyzed by the opposing reactions of protein kinases and phosphatases, which are central to the proper functioning of the cell. Dysfunction of members in either the protein kinase or phosphatase family can have wide-ranging deleterious effects in both metazoans and plants alike. Previously, three bacterial-like phosphoprotein phosphatase classes were uncovered in eukaryotes and named according to the bacterial sequences with which they have the greatest similarity: Shewanella-like (SLP), Rhizobiales-like (RLPH), and ApaH-like (ALPH) phosphatases. Utilizing the wealth of data resulting from recently sequenced complete eukaryotic genomes, we conducted database searching by hidden Markov models, multiple sequence alignment, and phylogenetic tree inference with Bayesian and maximum likelihood methods to elucidate the pattern of evolution of eukaryotic bacterial-like phosphoprotein phosphatase sequences, which are predominantly distributed in photosynthetic eukaryotes. We uncovered a pattern of ancestral mitochondrial (SLP and RLPH) or archaeal (ALPH) gene entry into eukaryotes, supplemented by possible instances of lateral gene transfer between bacteria and eukaryotes. In addition to the previously known green algal and plant SLP1 and SLP2 protein forms, a more ancestral third form (SLP3) was found in green algae. Data from in silico subcellular localization predictions revealed class-specific differences in plants likely to result in distinct functions, and for SLP sequences, distinctive and possibly functionally significant differences between plants and nonphotosynthetic eukaryotes. Conserved carboxyl-terminal sequence motifs with class-specific patterns of residue substitutions, most prominent in photosynthetic organisms, raise the possibility of complex interactions with regulatory proteins.
Terminal region sequence variations in variola virus DNA.
Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J
1996-07-15
Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted.
Battig, Mark R; Soontornworajit, Boonchoy; Wang, Yong
2012-08-01
Polymeric delivery systems have been extensively studied to achieve localized and controlled release of protein drugs. However, it is still challenging to control the release of multiple protein drugs in distinct stages according to the progress of disease or treatment. This study successfully demonstrates that multiple protein drugs can be released from aptamer-functionalized hydrogels with adjustable release rates at predetermined time points using complementary sequences (CSs) as biomolecular triggers. Because both aptamer-protein interactions and aptamer-CS hybridization are sequence-specific, aptamer-functionalized hydrogels constitute a promising polymeric delivery system for the programmable release of multiple protein drugs to treat complex human diseases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Valley, Cary T.; Porter, Douglas F.; Qiu, Chen
2012-06-28
mRNA control hinges on the specificity and affinity of proteins for their RNA binding sites. Regulatory proteins must bind their own sites and reject even closely related noncognate sites. In the PUF [Pumilio and fem-3 binding factor (FBF)] family of RNA binding proteins, individual proteins discriminate differences in the length and sequence of binding sites, allowing each PUF to bind a distinct battery of mRNAs. Here, we show that despite these differences, the pattern of RNA interactions is conserved among PUF proteins: the two ends of the PUF protein make critical contacts with the two ends of the RNA sites.more » Despite this conserved 'two-handed' pattern of recognition, the RNA sequence is flexible. Among the binding sites of yeast Puf4p, RNA sequence dictates the pattern in which RNA bases are flipped away from the binding surface of the protein. Small differences in RNA sequence allow new modes of control, recruiting Puf5p in addition to Puf4p to a single site. This embedded information adds a new layer of biological meaning to the connections between RNA targets and PUF proteins.« less
Uversky, Vladimir N
2015-03-01
Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Herrera, Victoria L M; Steffen, Martin; Moran, Ann Marie; Tan, Glaiza A; Pasion, Khristine A; Rivera, Keith; Pappin, Darryl J; Ruiz-Opazo, Nelson
2016-06-14
In contrast to rat and mouse databases, the NCBI gene database lists the human dual-endothelin1/VEGFsp receptor (DEspR, formerly Dear) as a unitary transcribed pseudogene due to a stop [TGA]-codon at codon#14 in automated DNA and RNA sequences. However, re-analysis is needed given prior single gene studies detected a tryptophan [TGG]-codon#14 by manual Sanger sequencing, demonstrated DEspR translatability and functionality, and since the demonstration of actual non-translatability through expression studies, the standard-of-excellence for pseudogene designation, has not been performed. Re-analysis must meet UNIPROT criteria for demonstration of a protein's existence at the highest (protein) level, which a priori, would override DNA- or RNA-based deductions. To dissect the nucleotide sequence discrepancy, we performed Maxam-Gilbert sequencing and reviewed 727 RNA-seq entries. To comply with the highest level multiple UNIPROT criteria for determining DEspR's existence, we performed various experiments using multiple anti-DEspR monoclonal antibodies (mAbs) targeting distinct DEspR epitopes with one spanning the contested tryptophan [TGG]-codon#14, assessing: (a) DEspR protein expression, (b) predicted full-length protein size, (c) sequence-predicted protein-specific properties beyond codon#14: receptor glycosylation and internalization, (d) protein-partner interactions, and (e) DEspR functionality via DEspR-inhibition effects. Maxam-Gilbert sequencing and some RNA-seq entries demonstrate two guanines, hence a tryptophan [TGG]-codon#14 within a compression site spanning an error-prone compression sequence motif. Western blot analysis using anti-DEspR mAbs targeting distinct DEspR epitopes detect the identical glycosylated 17.5 kDa pull-down protein. Decrease in DEspR-protein size after PNGase-F digest demonstrates post-translational glycosylation, concordant with the consensus-glycosylation site beyond codon#14. Like other small single-transmembrane proteins, mass spectrometry analysis of anti-DEspR mAb pull-down proteins do not detect DEspR, but detect DEspR-protein interactions with proteins implicated in intracellular trafficking and cancer. FACS analyses also detect DEspR-protein in different human cancer stem-like cells (CSCs). DEspR-inhibition studies identify DEspR-roles in CSC survival and growth. Live cell imaging detects fluorescently-labeled anti-DEspR mAb targeted-receptor internalization, concordant with the single internalization-recognition sequence also located beyond codon#14. Data confirm translatability of DEspR, the full-length DEspR protein beyond codon#14, and elucidate DEspR-specific functionality. Along with detection of the tryptophan [TGG]-codon#14 within an error-prone compression site, cumulative data demonstrating DEspR protein existence fulfill multiple UNIPROT criteria, thus refuting its pseudogene designation.
Zhu, Xun; Xie, Shangbo; Armengaud, Jean; Xie, Wen; Guo, Zhaojiang; Kang, Shi; Wu, Qingjun; Wang, Shaoli; Xia, Jixing; He, Rongjun; Zhang, Youjun
2016-06-01
The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Watada, Hirotaka; Mirmira, Raghavendra G.; Kalamaras, Julie; German, Michael S.
2000-01-01
The developmentally important homeodomain transcription factors of the NK-2 class contain a highly conserved region, the NK2-specific domain (NK2-SD). The function of this domain, however, remains unknown. The primary structure of the NK2-SD suggests that it might function as an accessory DNA-binding domain or as a protein–protein interaction interface. To assess the possibility that the NK2-SD may contribute to DNA-binding specificity, we used a PCR-based approach to identify a consensus DNA-binding sequences for Nkx2.2, an NK-2 family member involved in pancreas and central nervous system development. The consensus sequence (TCTAAGTGAGCTT) is similar to the known binding sequences for other NK-2 homeodomain proteins, but we show that the NK2-SD does not contribute significantly to specific DNA binding to this sequence. To determine whether the NK2-SD contributes to transactivation, we used GAL4-Nkx2.2 fusion constructs to map a powerful transcriptional activation domain in the C-terminal region beyond the conserved NK2-SD. Interestingly, this C-terminal region functions as a transcriptional activator only in the absence of an intact NK2-SD. The NK2-SD also can mask transactivation from the paired homeodomain transcription factor Pax6, but it has no effect on transcription by itself. These results demonstrate that the NK2-SD functions as an intramolecular regulator of the C-terminal activation domain in Nkx2.2 and support a model in which interactions through the NK2-SD regulate the ability of NK-2-class proteins to activate specific genes during development. PMID:10944215
Spatially restricted G protein-coupled receptor activity via divergent endocytic compartments.
Jean-Alphonse, Frederic; Bowersox, Shanna; Chen, Stanford; Beard, Gemma; Puthenveedu, Manojkumar A; Hanyaloglu, Aylin C
2014-02-14
Postendocytic sorting of G protein-coupled receptors (GPCRs) is driven by their interactions between highly diverse receptor sequence motifs with their interacting proteins, such as postsynaptic density protein (PSD95), Drosophila disc large tumor suppressor (Dlg1), zonula occludens-1 protein (zo-1) (PDZ) domain proteins. However, whether these diverse interactions provide an underlying functional specificity, in addition to driving sorting, is unknown. Here we identify GPCRs that recycle via distinct PDZ ligand/PDZ protein pairs that exploit their recycling machinery primarily for targeted endosomal localization and signaling specificity. The luteinizing hormone receptor (LHR) and β2-adrenergic receptor (B2AR), two GPCRs sorted to the regulated recycling pathway, underwent divergent trafficking to distinct endosomal compartments. Unlike B2AR, which traffics to early endosomes (EE), LHR internalizes to distinct pre-early endosomes (pre-EEs) for its recycling. Pre-EE localization required interactions of the LHR C-terminal tail with the PDZ protein GAIP-interacting protein C terminus, inhibiting its traffic to EEs. Rerouting the LHR to EEs, or EE-localized GPCRs to pre-EEs, spatially reprograms MAPK signaling. Furthermore, LHR-mediated activation of MAPK signaling requires internalization and is maintained upon loss of the EE compartment. We propose that combinatorial specificity between GPCR sorting sequences and interacting proteins dictates an unprecedented spatiotemporal control in GPCR signal activity.
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Characterization and biological properties of a new staphylococcal exotoxin
1994-01-01
Staphylococcus aureus strain D4508 is a toxic shock syndrome toxin 1- negative clinical isolate from a nonmenstrual case of toxic shock syndrome (TSS). In the present study, we have purified and characterized a new exotoxin from the extracellular products of this strain. This toxin was found to have a molecular mass of 25.14 kD by mass spectrometry and an isoelectric point of 5.65 by isoelectric focusing. We have also cloned and sequenced its corresponding genomic determinant. The DNA sequence encoding the mature protein was found to be 654 base pairs and is predicted to encode a polypeptide of 218 amino acids. The deduced protein contains an NH2-terminal sequence identical to that of the native protein. The calculated molecular weight (25.21 kD) of the recombinant mature protein is also consistent with that of the native molecules. When injected intravenously into rabbits, both the native and recombinant toxins induce an acute TSS-like illness characterized by high fever, hypotension, diarrhea, shock, and in some cases death, with classical histological findings of TSS. Furthermore, the activity of the toxin is specifically enhanced by low quantities of endotoxins. The toxicity can be blocked by rabbit immunoglobulin G antibody specific for the toxin. Western blotting and DNA sequencing data confirm that the protein is a unique staphylococcal exotoxin, yet shares significant sequence homology with known staphylococcal enterotoxins, especially the SEA, SED, and SEE toxins. We conclude therefore that this 25-kD protein belongs to the staphylococcal enterotoxin gene family that is capable of inducing a TSS-like illness in rabbits. PMID:7964453
Analysis of Epstein-Barr Virus Genomes and Expression Profiles in Gastric Adenocarcinoma.
Borozan, Ivan; Zapatka, Marc; Frappier, Lori; Ferretti, Vincent
2018-01-15
Epstein-Barr virus (EBV) is a causative agent of a variety of lymphomas, nasopharyngeal carcinoma (NPC), and ∼9% of gastric carcinomas (GCs). An important question is whether particular EBV variants are more oncogenic than others, but conclusions are currently hampered by the lack of sequenced EBV genomes. Here, we contribute to this question by mining whole-genome sequences of 201 GCs to identify 13 EBV-positive GCs and by assembling 13 new EBV genome sequences, almost doubling the number of available GC-derived EBV genome sequences and providing the first non-Asian EBV genome sequences from GC. Whole-genome sequence comparisons of all EBV isolates sequenced to date (85 from tumors and 57 from healthy individuals) showed that most GC and NPC EBV isolates were closely related although American Caucasian GC samples were more distant, suggesting a geographical component. However, EBV GC isolates were found to contain some consistent changes in protein sequences regardless of geographical origin. In addition, transcriptome data available for eight of the EBV-positive GCs were analyzed to determine which EBV genes are expressed in GC. In addition to the expected latency proteins (EBNA1, LMP1, and LMP2A), specific subsets of lytic genes were consistently expressed that did not reflect a typical lytic or abortive lytic infection, suggesting a novel mechanism of EBV gene regulation in the context of GC. These results are consistent with a model in which a combination of specific latent and lytic EBV proteins promotes tumorigenesis. IMPORTANCE Epstein-Barr virus (EBV) is a widespread virus that causes cancer, including gastric carcinoma (GC), in a small subset of individuals. An important question is whether particular EBV variants are more cancer associated than others, but more EBV sequences are required to address this question. Here, we have generated 13 new EBV genome sequences from GC, almost doubling the number of EBV sequences from GC isolates and providing the first EBV sequences from non-Asian GC. We further identify sequence changes in some EBV proteins common to GC isolates. In addition, gene expression analysis of eight of the EBV-positive GCs showed consistent expression of both the expected latency proteins and a subset of lytic proteins that was not consistent with typical lytic or abortive lytic expression. These results suggest that novel mechanisms activate expression of some EBV lytic proteins and that their expression may contribute to oncogenesis. Copyright © 2018 American Society for Microbiology.
Ruff, Kiersten M; Roberts, Stefan; Chilkoti, Ashutosh; Pappu, Rohit V
2018-06-24
Proteins and synthetic polymers can undergo phase transitions in response to changes to intensive solution parameters such as temperature, proton chemical potentials (pH), and hydrostatic pressure. For proteins and protein-based polymers, the information required for stimulus responsive phase transitions is encoded in their amino acid sequence. Here, we review some of the key physical principles that govern the phase transitions of archetypal intrinsically disordered protein polymers (IDPPs). These are disordered proteins with highly repetitive amino acid sequences. Advances in recombinant technologies have enabled the design and synthesis of protein sequences of a variety of sequence complexities and lengths. We summarize insights that have been gleaned from the design and characterization of IDPPs that undergo thermo-responsive phase transitions and build on these insights to present a general framework for IDPPs with pH and pressure responsive phase behavior. In doing so, we connect the stimulus responsive phase behavior of IDPPs with repetitive sequences to the coil-to-globule transitions that these sequences undergo at the single chain level in response to changes in stimuli. The proposed framework and ongoing studies of stimulus responsive phase behavior of designed IDPPs have direct implications in bioengineering, where designing sequences with bespoke material properties broadens the spectrum of applications, and in biology and medicine for understanding the sequence-specific driving forces for the formation of protein-based membraneless organelles as well as biological matrices that act as scaffolds for cells and mediators of cell-to-cell communication. Copyright © 2018. Published by Elsevier Ltd.
Busk, Peter Kamp; Lange, Lene
2013-06-01
Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.
Aureobasidium pullulans xylanase, gene and signal sequence
Li Xinliang; Ljungdahl, L.G.
1997-01-07
A xylanase from Aureobasidium pullulans having a high specific activity is provided, as well as a signal protein for controlling excretion into cell culture medium of proteins to which it is attached. DNA encoding these proteins is also provided. 4 figs.
Data mining of enzymes using specific peptides
2009-01-01
Background Predicting the function of a protein from its sequence is a long-standing challenge of bioinformatic research, typically addressed using either sequence-similarity or sequence-motifs. We employ the novel motif method that consists of Specific Peptides (SPs) that are unique to specific branches of the Enzyme Commission (EC) functional classification. We devise the Data Mining of Enzymes (DME) methodology that allows for searching SPs on arbitrary proteins, determining from its sequence whether a protein is an enzyme and what the enzyme's EC classification is. Results We extract novel SP sets from Swiss-Prot enzyme data. Using a training set of July 2006, and test sets of July 2008, we find that the predictive power of SPs, both for true-positives (enzymes) and true-negatives (non-enzymes), depends on the coverage length of all SP matches (the number of amino-acids matched on the protein sequence). DME is quite different from BLAST. Comparing the two on an enzyme test set of July 2008, we find that DME has lower recall. On the other hand, DME can provide predictions for proteins regarded by BLAST as having low homologies with known enzymes, thus supplying complementary information. We test our method on a set of proteins belonging to 10 bacteria, dated July 2008, establishing the usefulness of the coverage-length cutoff to determine true-negatives. Moreover, sifting through our predictions we find that some of them have been substantiated by Swiss-Prot annotations by July 2009. Finally we extract, for production purposes, a novel SP set trained on all Swiss-Prot enzymes as of July 2009. This new set increases considerably the recall of DME. The new SP set is being applied to three metagenomes: Sargasso Sea with over 1,000,000 proteins, producing predictions of over 220,000 enzymes, and two human gut metagenomes. The outcome of these analyses can be characterized by the enzymatic profile of the metagenomes, describing the relative numbers of enzymes observed for different EC categories. Conclusions Employing SPs for predicting enzymatic activity of proteins works well once one utilizes coverage-length criteria. In our analysis, L ≥ 7 has led to highly accurate results. PMID:20034383
Emergence of novel domains in proteins
2013-01-01
Background Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. Results To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. Conclusions We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently. PMID:23425224
Sharan, Malvika; Förstner, Konrad U; Eulalio, Ana; Vogel, Jörg
2017-06-20
RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Parker, K A; Steitz, J A
1987-01-01
The human U3 ribonucleoprotein (RNP) has been analyzed to determine its protein constituents, sites of protein-RNA interaction, and RNA secondary structure. By using anti-U3 RNP antibodies and extracts prepared from HeLa cells labeled in vivo, the RNP was found to contain four nonphosphorylated proteins of 36, 30, 13, and 12.5 kilodaltons and two phosphorylated proteins of 74 and 59 kilodaltons. U3 nucleotides 72-90, 106-121, 154-166, and 190-217 must contain sites that interact with proteins since these regions are immunoprecipitated after treatment of the RNP with RNase A or T1. The secondary structure was probed with specific nucleases and by chemical modification with single-strand-specific reagents that block subsequent reverse transcription. Regions that are single stranded (and therefore potentially able to interact with a substrate RNA) include an evolutionarily conserved sequence at nucleotides 104-112 and nonconserved sequences at nucleotides 65-74, 80-84, and 88-93. Nucleotides 159-168 do not appear to be highly accessible, thus making it unlikely that this U3 sequence base pairs with sequences near the 5.8S rRNA-internal transcribed spacer II junction, as previously proposed. Alternative functions of the U3 RNP are discussed, including the possibility that U3 may participate in a processing event near the 3' end of 28S rRNA. Images PMID:2959855
Structure-Templated Predictions of Novel Protein Interactions from Sequence Information
Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V
2007-01-01
The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321
A close relative of the nuclear, chromosomal high-mobility group protein HMG1 in yeast mitochondria.
Diffley, J F; Stillman, B
1991-01-01
ABF2 (ARS-binding factor 2), a small, basic DNA-binding protein that binds specifically to the autonomously replicating sequence ARS1, is located primarily in the mitochondria of the yeast Saccharomyces cerevisiae. The abundance of ABF2 and the phenotype of abf2- null mutants argue that this protein plays a key role in the structure, maintenance, and expression of the yeast mitochondrial genome. The predicted amino acid sequence of ABF2 is closely related to the high-mobility group proteins HMG1 and HMG2 from vertebrate cell nuclei and to several other DNA-binding proteins. Additionally, ABF2 and the other HMG-related proteins are related to a globular domain from the heat shock protein hsp70 family. ABF2 interacts with DNA both nonspecifically and in a specific manner within regulatory regions, suggesting a mechanism whereby it may aid in compacting the mitochondrial genome without interfering with expression. Images PMID:1881919
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mangelsen, Elke; Kilian, Joachim; Berendzen, Kenneth W.
2008-02-01
WRKY proteins belong to the WRKY-GCM1 superfamily of zinc finger transcription factors that have been subject to a large plant-specific diversification. For the cereal crop barley (Hordeum vulgare), three different WRKY proteins have been characterized so far, as regulators in sucrose signaling, in pathogen defense, and in response to cold and drought, respectively. However, their phylogenetic relationship remained unresolved. In this study, we used the available sequence information to identify a minimum number of 45 barley WRKY transcription factor (HvWRKY) genes. According to their structural features the HvWRKY factors were classified into the previously defined polyphyletic WRKY subgroups 1 tomore » 3. Furthermore, we could assign putative orthologs of the HvWRKY proteins in Arabidopsis and rice. While in most cases clades of orthologous proteins were formed within each group or subgroup, other clades were composed of paralogous proteins for the grasses and Arabidopsis only, which is indicative of specific gene radiation events. To gain insight into their putative functions, we examined expression profiles of WRKY genes from publicly available microarray data resources and found group specific expression patterns. While putative orthologs of the HvWRKY transcription factors have been inferred from phylogenetic sequence analysis, we performed a comparative expression analysis of WRKY genes in Arabidopsis and barley. Indeed, highly correlative expression profiles were found between some of the putative orthologs. HvWRKY genes have not only undergone radiation in monocot or dicot species, but exhibit evolutionary traits specific to grasses. HvWRKY proteins exhibited not only sequence similarities between orthologs with Arabidopsis, but also relatedness in their expression patterns. This correlative expression is indicative for a putative conserved function of related WRKY proteins in mono- and dicot species.« less
Busslinger, M; Portmann, R; Irminger, J C; Birnstiel, M L
1980-01-01
The DNA sequences of the entire structural H4, H3, H2A and H2B genes and of their 5' flanking regions have been determined in the histone DNA clone h19 of the sea urchin Psammechinus miliaris. In clone h19 the polarity of transcription and the relative arrangement of the histone genes is identical to that in clone h22 of the same species. The histone proteins encoded by h19 DNA differ in their primary structure from those encoded by clone h22 and have been compared to histone protein sequences of other sea urchin species as well as other eukaryotes. A comparative analysis of the 5' flanking DNA sequences of the structural histone genes in both clones revealed four ubiquitous sequence motifs; a pentameric element GATCC, followed at short distance by the Hogness box GTATAAATAG, a conserved sequence PyCATTCPu, in or near which the 5' ends of the mRNAs map in h22 DNA and lastly a sequence A, containing the initiation codon. These sequences are also found, sometimes in modified version, in front of other eukaryotic genes transcribed by polymerase II. When prelude sequences of isocoding histone genes in clone h19 and h22 are compared areas of homology are seen to extend beyond the ubiquitous sequence motifs towards the divergent AT-rich spacer and terminate between approximately 140 and 240 nucleotides away from the structural gene. These prelude regions contain quite large conservative sequence blocks which are specific for each type of histone genes. Images PMID:7443547
Kurotani, Atsushi; Sakurai, Tetsuya
2015-01-01
Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups. PMID:26307970
Kurotani, Atsushi; Sakurai, Tetsuya
2015-08-20
Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups.
Evolutionary and biophysical relationships among the papillomavirus E2 proteins.
Blakaj, Dukagjin M; Fernandez-Fuentes, Narcis; Chen, Zigui; Hegde, Rashmi; Fiser, Andras; Burk, Robert D; Brenowitz, Michael
2009-01-01
Infection by human papillomavirus (HPV) may result in clinical conditions ranging from benign warts to invasive cancer. The HPV E2 protein represses oncoprotein transcription and is required for viral replication. HPV E2 binds to palindromic DNA sequences of highly conserved four base pair sequences flanking an identical length variable 'spacer'. E2 proteins directly contact the conserved but not the spacer DNA. Variation in naturally occurring spacer sequences results in differential protein affinity that is dependent on their sensitivity to the spacer DNA's unique conformational and/or dynamic properties. This article explores the biophysical character of this core viral protein with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, 3d structure and electrostatic features of the E2 protein DNA binding domain are highly conserved; specific interactions with DNA binding sites have also been conserved. In contrast, the E2 protein's transactivation domain does not have extensive surfaces of highly conserved residues. Rather, regions of high conservation are localized to small surface patches. Implications to cancer biology are discussed.
Pasricha, Gunisha; Mishra, Akhilesh C.; Chakrabarti, Alok K.
2012-01-01
Please cite this paper as: Pasricha et al. (2012) Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the Influenza A virus subtypes responsible for the 20th‐century pandemics. Influenza and Other Respiratory Viruses 7(4), 497–505. Background PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Methods Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Results Analysis showed that 96·4% of the H5N1 influenza viruses harbored full‐length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th‐century pandemic influenza viruses contained full‐length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human‐ and avian host‐specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Conclusions Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host‐specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. PMID:22788742
Physical Model of the Genotype-to-Phenotype Map of Proteins
NASA Astrophysics Data System (ADS)
Tlusty, Tsvi; Libchaber, Albert; Eckmann, Jean-Pierre
2017-04-01
How DNA is mapped to functional proteins is a basic question of living matter. We introduce and study a physical model of protein evolution which suggests a mechanical basis for this map. Many proteins rely on large-scale motion to function. We therefore treat protein as learning amorphous matter that evolves towards such a mechanical function: Genes are binary sequences that encode the connectivity of the amino acid network that makes a protein. The gene is evolved until the network forms a shear band across the protein, which allows for long-range, soft modes required for protein function. The evolution reduces the high-dimensional sequence space to a low-dimensional space of mechanical modes, in accord with the observed dimensional reduction between genotype and phenotype of proteins. Spectral analysis of the space of 1 06 solutions shows a strong correspondence between localization around the shear band of both mechanical modes and the sequence structure. Specifically, our model shows how mutations are correlated among amino acids whose interactions determine the functional mode.
Pepperl, Sandra; Benninger-Döring, Gerlinde; Modrow, Susanne; Wolf, Hans; Jilg, Wolfgang
1998-01-01
We analyzed the immediate-early transactivator Rta of Epstein-Barr virus (EBV) for its role as a target for specific cytotoxic T lymphocytes (CTL). Panels of overlapping peptides covering the entire amino acid sequence of Rta were synthesized and used to induce and analyze specific CTL responses in EBV-positive donors. Using peptide-pulsed target cells, we found nine different CTL epitopes that are distributed over the entire protein sequence. One epitope restricted by HLA-A24 could be mapped to the decameric sequence DYCNVLNKEF between amino acid positions 28 and 37 of the Rta protein. A second epitope could be assigned to the same region of Rta (residues 25 to 39) and was shown to be restricted by HLA-B18. Another, minimal epitope could be mapped to the nonameric sequence ATIGTAMYK between amino acid positions 134 and 142; this peptide was restricted by HLA-A11. Another four epitopes were proven to be restricted by HLA-A2, -A3, -B61, and -Cw4 and were located between Rta residues 225 and 239, 145 and 159, 529 and 543, and 393 and 407, respectively. For two other epitopes, only the location within the Rta protein is known so far (residues 121 to 135 and 441 to 455); their exact HLA restriction patterns have not yet been identified. Using target cells infected with recombinant vaccinia virus containing the gene for Rta, we showed that six of eight Rta-specific CTL lines recognized the corresponding peptides also after endogenous processing. These data suggest that Rta comprises an important target for EBV-specific cellular cytotoxicity. Together with recent findings of other immediate-early and early proteins also acting as CTL targets, they reveal the role of proteins of the lytic cycle in the immune recognition of EBV-infected cells. PMID:9765404
Targeted Quantitation of Proteins by Mass Spectrometry
2013-01-01
Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement. PMID:23517332
Targeted quantitation of proteins by mass spectrometry.
Liebler, Daniel C; Zimmerman, Lisa J
2013-06-04
Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement.
Identification and application of self-binding zipper-like sequences in SARS-CoV spike protein.
Zhang, Si Min; Liao, Ying; Neo, Tuan Ling; Lu, Yanning; Liu, Ding Xiang; Vahlne, Anders; Tam, James P
2018-05-22
Self-binding peptides containing zipper-like sequences, such as the Leu/Ile zipper sequence within the coiled coil regions of proteins and the cross-β spine steric zippers within the amyloid-like fibrils, could bind to the protein-of-origin through homophilic sequence-specific zipper motifs. These self-binding sequences represent opportunities for the development of biochemical tools and/or therapeutics. Here, we report on the identification of a putative self-binding β-zipper-forming peptide within the severe acute respiratory syndrome-associated coronavirus spike (S) protein and its application in viral detection. Peptide array scanning of overlapping peptides covering the entire length of S protein identified 34 putative self-binding peptides of six clusters, five of which contained octapeptide core consensus sequences. The Cluster I consensus octapeptide sequence GINITNFR was predicted by the Eisenberg's 3D profile method to have high amyloid-like fibrillation potential through steric β-zipper formation. Peptide C6 containing the Cluster I consensus sequence was shown to oligomerize and form amyloid-like fibrils. Taking advantage of this, C6 was further applied to detect the S protein expression in vitro by fluorescence staining. Meanwhile, the coiled-coil-forming Leu/Ile heptad repeat sequences within the S protein were under-represented during peptide array scanning, in agreement with that long peptide lengths were required to attain high helix-mediated interaction avidity. The data suggest that short β-zipper-like self-binding peptides within the S protein could be identified through combining the peptide scanning and predictive methods, and could be exploited as biochemical detection reagents for viral infection. Copyright © 2018. Published by Elsevier Ltd.
Ilk, Nicola; Völlenkle, Christine; Egelseer, Eva M.; Breitwieser, Andreas; Sleytr, Uwe B.; Sára, Margit
2002-01-01
The nucleotide sequence encoding the crystalline bacterial cell surface (S-layer) protein SbpA of Bacillus sphaericus CCM 2177 was determined by a PCR-based technique using four overlapping fragments. The entire sbpA sequence indicated one open reading frame of 3,804 bp encoding a protein of 1,268 amino acids with a theoretical molecular mass of 132,062 Da and a calculated isoelectric point of 4.69. The N-terminal part of SbpA, which is involved in anchoring the S-layer subunits via a distinct type of secondary cell wall polymer to the rigid cell wall layer, comprises three S-layer-homologous motifs. For screening of amino acid positions located on the outer surface of the square S-layer lattice, the sequence encoding Strep-tag I, showing affinity to streptavidin, was linked to the 5′ end of the sequence encoding the recombinant S-layer protein (rSbpA) or a C-terminally truncated form (rSbpA31-1068). The deletion of 200 C-terminal amino acids did not interfere with the self-assembly properties of the S-layer protein but significantly increased the accessibility of Strep-tag I. Thus, the sequence encoding the major birch pollen allergen (Bet v1) was fused via a short linker to the sequence encoding the C-terminally truncated form rSpbA31-1068. Labeling of the square S-layer lattice formed by recrystallization of rSbpA31-1068/Bet v1 on peptidoglycan-containing sacculi with a Bet v1-specific monoclonal mouse antibody demonstrated the functionality of the fused protein sequence and its location on the outer surface of the S-layer lattice. The specific interactions between the N-terminal part of SbpA and the secondary cell wall polymer will be exploited for an oriented binding of the S-layer fusion protein on solid supports to generate regularly structured functional protein lattices. PMID:12089001
Kono, H; Saven, J G
2001-02-23
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.
A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*
Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing
2011-01-01
Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108
HMPAS: Human Membrane Protein Analysis System
2013-01-01
Background Membrane proteins perform essential roles in diverse cellular functions and are regarded as major pharmaceutical targets. The significance of membrane proteins has led to the developing dozens of resources related with membrane proteins. However, most of these resources are built for specific well-known membrane protein groups, making it difficult to find common and specific features of various membrane protein groups. Methods We collected human membrane proteins from the dispersed resources and predicted novel membrane protein candidates by using ortholog information and our membrane protein classifiers. The membrane proteins were classified according to the type of interaction with the membrane, subcellular localization, and molecular function. We also made new feature dataset to characterize the membrane proteins in various aspects including membrane protein topology, domain, biological process, disease, and drug. Moreover, protein structure and ICD-10-CM based integrated disease and drug information was newly included. To analyze the comprehensive information of membrane proteins, we implemented analysis tools to identify novel sequence and functional features of the classified membrane protein groups and to extract features from protein sequences. Results We constructed HMPAS with 28,509 collected known membrane proteins and 8,076 newly predicted candidates. This system provides integrated information of human membrane proteins individually and in groups organized by 45 subcellular locations and 1,401 molecular functions. As a case study, we identified associations between the membrane proteins and diseases and present that membrane proteins are promising targets for diseases related with nervous system and circulatory system. A web-based interface of this system was constructed to facilitate researchers not only to retrieve organized information of individual proteins but also to use the tools to analyze the membrane proteins. Conclusions HMPAS provides comprehensive information about human membrane proteins including specific features of certain membrane protein groups. In this system, user can acquire the information of individual proteins and specified groups focused on their conserved sequence features, involved cellular processes, and diseases. HMPAS may contribute as a valuable resource for the inference of novel cellular mechanisms and pharmaceutical targets associated with the human membrane proteins. HMPAS is freely available at http://fcode.kaist.ac.kr/hmpas. PMID:24564858
Itzhaki, H; Maxson, J M; Woodson, W R
1994-09-13
The increased production of ethylene during carnation petal senescence regulates the transcription of the GST1 gene encoding a subunit of glutathione-S-transferase. We have investigated the molecular basis for this ethylene-responsive transcription by examining the cis elements and trans-acting factors involved in the expression of the GST1 gene. Transient expression assays following delivery of GST1 5' flanking DNA fused to a beta-glucuronidase receptor gene were used to functionally define sequences responsible for ethylene-responsive expression. Deletion analysis of the 5' flanking sequences of GST1 identified a single positive regulatory element of 197 bp between -667 and -470 necessary for ethylene-responsive expression. The sequences within this ethylene-responsive region were further localized to 126 bp between -596 and -470. The ethylene-responsive element (ERE) within this region conferred ethylene-regulated expression upon a minimal cauliflower mosaic virus-35S TATA-box promoter in an orientation-independent manner. Gel electrophoresis mobility-shift assays and DNase I footprinting were used to identify proteins that bind to sequences within the ERE. Nuclear proteins from carnation petals were shown to specifically interact with the 126-bp ERE and the presence and binding of these proteins were independent of ethylene or petal senescence. DNase I footprinting defined DNA sequences between -510 and -488 within the ERE specifically protected by bound protein. An 8-bp sequence (ATTTCAAA) within the protected region shares significant homology with promoter sequences required for ethylene responsiveness from the tomato fruit-ripening E4 gene.
Engineering and Evolution of Molecular Chaperones and Protein Disaggregases with Enhanced Activity
Mack, Korrie L.; Shorter, James
2016-01-01
Cells have evolved a sophisticated proteostasis network to ensure that proteins acquire and retain their native structure and function. Critical components of this network include molecular chaperones and protein disaggregases, which function to prevent and reverse deleterious protein misfolding. Nevertheless, proteostasis networks have limits, which when exceeded can have fatal consequences as in various neurodegenerative disorders, including Parkinson's disease and amyotrophic lateral sclerosis. A promising strategy is to engineer proteostasis networks to counter challenges presented by specific diseases or specific proteins. Here, we review efforts to enhance the activity of individual molecular chaperones or protein disaggregases via engineering and directed evolution. Remarkably, enhanced global activity or altered substrate specificity of various molecular chaperones, including GroEL, Hsp70, ClpX, and Spy, can be achieved by minor changes in primary sequence and often a single missense mutation. Likewise, small changes in the primary sequence of Hsp104 yield potentiated protein disaggregases that reverse the aggregation and buffer toxicity of various neurodegenerative disease proteins, including α-synuclein, TDP-43, and FUS. Collectively, these advances have revealed key mechanistic and functional insights into chaperone and disaggregase biology. They also suggest that enhanced chaperones and disaggregases could have important applications in treating human disease as well as in the purification of valuable proteins in the pharmaceutical sector. PMID:27014702
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.
Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz
2015-01-01
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Isvoran, Adriana; Craciun, Dana; Martiny, Virginie; Sperandio, Olivier; Miteva, Maria A
2013-06-14
Protein-Protein Interactions (PPIs) are key for many cellular processes. The characterization of PPI interfaces and the prediction of putative ligand binding sites and hot spot residues are essential to design efficient small-molecule modulators of PPI. Terphenyl and its derivatives are small organic molecules known to mimic one face of protein-binding alpha-helical peptides. In this work we focus on several PPIs mediated by alpha-helical peptides. We performed computational sequence- and structure-based analyses in order to evaluate several key physicochemical and surface properties of proteins known to interact with alpha-helical peptides and/or terphenyl and its derivatives. Sequence-based analysis revealed low sequence identity between some of the analyzed proteins binding alpha-helical peptides. Structure-based analysis was performed to calculate the volume, the fractal dimension roughness and the hydrophobicity of the binding regions. Besides the overall hydrophobic character of the binding pockets, some specificities were detected. We showed that the hydrophobicity is not uniformly distributed in different alpha-helix binding pockets that can help to identify key hydrophobic hot spots. The presence of hydrophobic cavities at the protein surface with a more complex shape than the entire protein surface seems to be an important property related to the ability of proteins to bind alpha-helical peptides and low molecular weight mimetics. Characterization of similarities and specificities of PPI binding sites can be helpful for further development of small molecules targeting alpha-helix binding proteins.
Lee, Mei-Ling Ting; Bulyk, Martha L; Whitmore, G A; Church, George M
2002-12-01
There is considerable scientific interest in knowing the probability that a site-specific transcription factor will bind to a given DNA sequence. Microarray methods provide an effective means for assessing the binding affinities of a large number of DNA sequences as demonstrated by Bulyk et al. (2001, Proceedings of the National Academy of Sciences, USA 98, 7158-7163) in their study of the DNA-binding specificities of Zif268 zinc fingers using microarray technology. In a follow-up investigation, Bulyk, Johnson, and Church (2002, Nucleic Acid Research 30, 1255-1261) studied the interdependence of nucleotides on the binding affinities of transcription proteins. Our article is motivated by this pair of studies. We present a general statistical methodology for analyzing microarray intensity measurements reflecting DNA-protein interactions. The log probability of a protein binding to a DNA sequence on an array is modeled using a linear ANOVA model. This model is convenient because it employs familiar statistical concepts and procedures and also because it is effective for investigating the probability structure of the binding mechanism.
Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri
2016-01-01
Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774
Meereis, Florian; Kaufmann, Michael
2004-10-15
The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.
Using peptide array to identify binding motifs and interaction networks for modular domains.
Li, Shawn S-C; Wu, Chenggang
2009-01-01
Specific protein-protein interactions underlie all essential biological processes and form the basis of cellular signal transduction. The recognition of a short, linear peptide sequence in one protein by a modular domain in another represents a common theme of macromolecular recognition in cells, and the importance of this mode of protein-protein interaction is highlighted by the large number of peptide-binding domains encoded by the human genome. This phenomenon also provides a unique opportunity to identify protein-protein binding events using peptide arrays and complementary biochemical assays. Accordingly, high-density peptide array has emerged as a useful tool by which to map domain-mediated protein-protein interaction networks at the proteome level. Using the Src-homology 2 (SH2) and 3 (SH3) domains as examples, we describe the application of oriented peptide array libraries in uncovering specific motifs recognized by an SH2 domain and the use of high-density peptide arrays in identifying interaction networks mediated by the SH3 domain. Methods reviewed here could also be applied to other modular domains, including catalytic domains, that recognize linear peptide sequences.
Metagenomics and the protein universe
Godzik, Adam
2011-01-01
Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones. PMID:21497084
Schlessinger, J
1994-02-01
SH2 and SH3 domains are small protein modules that mediate protein-protein interactions in signal transduction pathways that are activated by protein tyrosine kinases. SH2 domains bind to short phosphotyrosine-containing sequences in growth factor receptors and other phosphoproteins. SH3 domains bind to target proteins through sequences containing proline and hydrophobic amino acids. SH2 and SH3 domain containing proteins, such as Grb2 and phospholipase C gamma, utilize these modules in order to link receptor and cytoplasmic protein tyrosine kinases to the Ras signaling pathway and to phosphatidylinositol hydrolysis, respectively. The three-dimensional structures of several SH2 and SH3 domains have been determined by NMR and X-ray crystallography, and the molecular basis of their specificity is beginning to be unveiled.
Mohtar, M Aiman; Hernychova, Lenka; O'Neill, J Robert; Lawrence, Melanie L; Murray, Euan; Vojtesek, Borek; Hupp, Ted R
2018-04-01
AGR2 is an oncogenic endoplasmic reticulum (ER)-resident protein disulfide isomerase. AGR2 protein has a relatively unique property for a chaperone in that it can bind sequence-specifically to a specific peptide motif (TTIYY). A synthetic TTIYY-containing peptide column was used to affinity-purify AGR2 from crude lysates highlighting peptide selectivity in complex mixtures. Hydrogen-deuterium exchange mass spectrometry localized the dominant region in AGR2 that interacts with the TTIYY peptide to within a structural loop from amino acids 131-135 (VDPSL). A peptide binding site consensus of Tx[IL][YF][YF] was developed for AGR2 by measuring its activity against a mutant peptide library. Screening the human proteome for proteins harboring this motif revealed an enrichment in transmembrane proteins and we focused on validating EpCAM as a potential AGR2-interacting protein. AGR2 and EpCAM proteins formed a dose-dependent protein-protein interaction in vitro Proximity ligation assays demonstrated that endogenous AGR2 and EpCAM protein associate in cells. Introducing a single alanine mutation in EpCAM at Tyr251 attenuated its binding to AGR2 in vitro and in cells. Hydrogen-deuterium exchange mass spectrometry was used to identify a stable binding site for AGR2 on EpCAM, adjacent to the TLIYY motif and surrounding EpCAM's detergent binding site. These data define a dominant site on AGR2 that mediates its specific peptide-binding function. EpCAM forms a model client protein for AGR2 to study how an ER-resident chaperone can dock specifically to a peptide motif and regulate the trafficking a protein destined for the secretory pathway. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
Defining Differential Genetic Signatures in CXCR4- and the CCR5-Utilizing HIV-1 Co-Linear Sequences
Aiamkitsumrit, Benjamas; Dampier, Will; Martin-Garcia, Julio; Nonnemacher, Michael R.; Pirrone, Vanessa; Ivanova, Tatyana; Zhong, Wen; Kilareski, Evelyn; Aldigun, Hazeez; Frantz, Brian; Rimbey, Matthew; Wojno, Adam; Passic, Shendra; Williams, Jean W.; Shah, Sonia; Blakey, Brandon; Parikh, Nirzari; Jacobson, Jeffrey M.; Moldover, Brian; Wigdahl, Brian
2014-01-01
The adaptation of human immunodeficiency virus type-1 (HIV-1) to an array of physiologic niches is advantaged by the plasticity of the viral genome, encoded proteins, and promoter. CXCR4-utilizing (X4) viruses preferentially, but not universally, infect CD4+ T cells, generating high levels of virus within activated HIV-1-infected T cells that can be detected in regional lymph nodes and peripheral blood. By comparison, the CCR5-utilizing (R5) viruses have a greater preference for cells of the monocyte-macrophage lineage; however, while R5 viruses also display a propensity to enter and replicate in T cells, they infect a smaller percentage of CD4+ T cells in comparison to X4 viruses. Additionally, R5 viruses have been associated with viral transmission and CNS disease and are also more prevalent during HIV-1 disease. Specific adaptive changes associated with X4 and R5 viruses were identified in co-linear viral sequences beyond the Env-V3. The in silico position-specific scoring matrix (PSSM) algorithm was used to define distinct groups of X4 and R5 sequences based solely on sequences in Env-V3. Bioinformatic tools were used to identify genetic signatures involving specific protein domains or long terminal repeat (LTR) transcription factor sites within co-linear viral protein R (Vpr), trans-activator of transcription (Tat), or LTR sequences that were preferentially associated with X4 or R5 Env-V3 sequences. A number of differential amino acid and nucleotide changes were identified across the co-linear Vpr, Tat, and LTR sequences, suggesting the presence of specific genetic signatures that preferentially associate with X4 or R5 viruses. Investigation of the genetic relatedness between X4 and R5 viruses utilizing phylogenetic analyses of complete sequences could not be used to definitively and uniquely identify groups of R5 or X4 sequences; in contrast, differences in the genetic diversities between X4 and R5 were readily identified within these co-linear sequences in HIV-1-infected patients. PMID:25265194
Wallace, A. C.; Borkakoti, N.; Thornton, J. M.
1997-01-01
It is well established that sequence templates such as those in the PROSITE and PRINTS databases are powerful tools for predicting the biological function and tertiary structure for newly derived protein sequences. The number of X-ray and NMR protein structures is increasing rapidly and it is apparent that a 3D equivalent of the sequence templates is needed. Here, we describe an algorithm called TESS that automatically derives 3D templates from structures deposited in the Brookhaven Protein Data Bank. While a new sequence can be searched for sequence patterns, a new structure can be scanned against these 3D templates to identify functional sites. As examples, 3D templates are derived for enzymes with an O-His-O "catalytic triad" and for the ribonucleases and lysozymes. When these 3D templates are applied to a large data set of nonidentical proteins, several interesting hits are located. This suggests that the development of a 3D template database may help to identify the function of new protein structures, if unknown, as well as to design proteins with specific functions. PMID:9385633
Martínez-Castilla, León P.; Rodríguez-Sotres, Rogelio
2010-01-01
Background Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. Principal Findings The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449–460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. Conclusion Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone. PMID:20830209
Random heteropolymers preserve protein function in foreign environments
NASA Astrophysics Data System (ADS)
Panganiban, Brian; Qiao, Baofu; Jiang, Tao; DelRe, Christopher; Obadia, Mona M.; Nguyen, Trung Dac; Smith, Anton A. A.; Hall, Aaron; Sit, Izaac; Crosby, Marquise G.; Dennis, Patrick B.; Drockenmuller, Eric; Olvera de la Cruz, Monica; Xu, Ting
2018-03-01
The successful incorporation of active proteins into synthetic polymers could lead to a new class of materials with functions found only in living systems. However, proteins rarely function under the conditions suitable for polymer processing. On the basis of an analysis of trends in protein sequences and characteristic chemical patterns on protein surfaces, we designed four-monomer random heteropolymers to mimic intrinsically disordered proteins for protein solubilization and stabilization in non-native environments. The heteropolymers, with optimized composition and statistical monomer distribution, enable cell-free synthesis of membrane proteins with proper protein folding for transport and enzyme-containing plastics for toxin bioremediation. Controlling the statistical monomer distribution in a heteropolymer, rather than the specific monomer sequence, affords a new strategy to interface with biological systems for protein-based biomaterials.
Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiaohan; Jawdy, Sara; Tschaplinski, Timothy J
2009-01-01
Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 192, 641 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifsmore » in Arabidopsis and Oryza SS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.« less
He, Ruifeng; Kim, Min-Jeong; Nelson, William; Balbuena, Tiago S; Kim, Ryan; Kramer, Robin; Crow, John A; May, Greg D; Thelen, Jay J; Soderlund, Carol A; Gang, David R
2012-02-01
The common reed (Phragmites australis), one of the most widely distributed of all angiosperms, uses its rhizomes (underground stems) to invade new territory, making it one of the most successful weedy species worldwide. Characterization of the rhizome transcriptome and proteome is needed to identify candidate genes and proteins involved in rhizome growth, development, metabolism, and invasiveness. We employed next-generation sequencing technologies including 454 and Illumina platforms to characterize the reed rhizome transcriptome and used quantitative proteomics techniques to identify the rhizome proteome. Combining 336514 Roche 454 Titanium reads and 103350802 Illumina paired-end reads in a de novo hybrid assembly yielded 124450 unique transcripts with an average length of 549 bp, of which 54317 were annotated. Rhizome-specific and differentially expressed transcripts were identified between rhizome apical tips (apical meristematic region) and rhizome elongation zones. A total of 1280 nonredundant proteins were identified and quantified using GeLC-MS/MS based label-free proteomics, where 174 and 77 proteins were preferentially expressed in the rhizome elongation zone and apical tip tissues, respectively. Genes involved in allelopathy and in controlling development and potentially invasiveness were identified. In addition to being a valuable sequence and protein data resource for studying plant rhizome species, our results provide useful insights into identifying specific genes and proteins with potential roles in rhizome differentiation, development, and function.
Global Analysis of Transcription Factor-Binding Sites in Yeast Using ChIP-Seq
Lefrançois, Philippe; Gallagher, Jennifer E. G.; Snyder, Michael
2016-01-01
Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way. Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28–36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy. PMID:25213249
Structure of adenovirus bound to cellular receptor car
Freimuth, Paul I.
2007-01-02
Disclosed is a mutant CAR-DI-binding adenovirus which has a genome comprising one or more mutations in sequences which encode the fiber protein knob domain wherein the mutation causes the encoded viral particle to have a significantly weakened binding affinity for CAR-DI relative to wild-type adenovirus. Such mutations may be in sequences which encode either the AB loop, or the HI loop of the fiber protein knob domain. Specific residues and mutations are described. Also disclosed is a method for generating a mutant adenovirus which is characterized by a receptor binding affinity or specificity which differs substantially from wild type.
Theory on the mechanism of site-specific DNA-protein interactions in the presence of traps
NASA Astrophysics Data System (ADS)
Niranjani, G.; Murugan, R.
2016-08-01
The speed of site-specific binding of transcription factor (TFs) proteins with genomic DNA seems to be strongly retarded by the randomly occurring sequence traps. Traps are those DNA sequences sharing significant similarity with the original specific binding sites (SBSs). It is an intriguing question how the naturally occurring TFs and their SBSs are designed to manage the retarding effects of such randomly occurring traps. We develop a simple random walk model on the site-specific binding of TFs with genomic DNA in the presence of sequence traps. Our dynamical model predicts that (a) the retarding effects of traps will be minimum when the traps are arranged around the SBS such that there is a negative correlation between the binding strength of TFs with traps and the distance of traps from the SBS and (b) the retarding effects of sequence traps can be appeased by the condensed conformational state of DNA. Our computational analysis results on the distribution of sequence traps around the putative binding sites of various TFs in mouse and human genome clearly agree well the theoretical predictions. We propose that the distribution of traps can be used as an additional metric to efficiently identify the SBSs of TFs on genomic DNA.
Yang, Qin; Gilmartin, Gregory M.; Doublié, Sylvie
2010-01-01
Human Cleavage Factor Im (CFIm) is an essential component of the pre-mRNA 3′ processing complex that functions in the regulation of poly(A) site selection through the recognition of UGUA sequences upstream of the poly(A) site. Although the highly conserved 25 kDa subunit (CFIm25) of the CFIm complex possesses a characteristic α/β/α Nudix fold, CFIm25 has no detectable hydrolase activity. Here we report the crystal structures of the human CFIm25 homodimer in complex with UGUAAA and UUGUAU RNA sequences. CFIm25 is the first Nudix protein to be reported to bind RNA in a sequence-specific manner. The UGUA sequence contributes to binding specificity through an intramolecular G:A Watson–Crick/sugar-edge base interaction, an unusual pairing previously found to be involved in the binding specificity of the SAM-III riboswitch. The structures, together with mutational data, suggest a novel mechanism for the simultaneous sequence-specific recognition of two UGUA elements within the pre-mRNA. Furthermore, the mutually exclusive binding of RNA and the signaling molecule Ap4A (diadenosine tetraphosphate) by CFIm25 suggests a potential role for small molecules in the regulation of mRNA 3′ processing. PMID:20479262
Yang, Qin; Gilmartin, Gregory M; Doublié, Sylvie
2010-06-01
Human Cleavage Factor Im (CFI(m)) is an essential component of the pre-mRNA 3' processing complex that functions in the regulation of poly(A) site selection through the recognition of UGUA sequences upstream of the poly(A) site. Although the highly conserved 25 kDa subunit (CFI(m)25) of the CFI(m) complex possesses a characteristic alpha/beta/alpha Nudix fold, CFI(m)25 has no detectable hydrolase activity. Here we report the crystal structures of the human CFI(m)25 homodimer in complex with UGUAAA and UUGUAU RNA sequences. CFI(m)25 is the first Nudix protein to be reported to bind RNA in a sequence-specific manner. The UGUA sequence contributes to binding specificity through an intramolecular G:A Watson-Crick/sugar-edge base interaction, an unusual pairing previously found to be involved in the binding specificity of the SAM-III riboswitch. The structures, together with mutational data, suggest a novel mechanism for the simultaneous sequence-specific recognition of two UGUA elements within the pre-mRNA. Furthermore, the mutually exclusive binding of RNA and the signaling molecule Ap(4)A (diadenosine tetraphosphate) by CFI(m)25 suggests a potential role for small molecules in the regulation of mRNA 3' processing.
Liu, Qian; Xu, Xue-Nian; Zhou, Yan; Cheng, Na; Dong, Yu-Ting; Zheng, Hua-Jun; Zhu, Yong-Qiang; Zhu, Yong-Qiang
2013-08-01
To find and clone new antigen genes from the lambda-ZAP cDNA expression library of adult Clonorchis sinensis, and determine the immunological characteristics of the recombinant proteins. The cDNA expression library of adult C. sinensis was screened by pooled sera of clonorchiasis patients. The sequences of the positive phage clones were compared with the sequences in EST database, and the full-length sequence of the gene (Cs22 gene) was obtained by RT-PCR. cDNA fragments containing 2 and 3 times tandem repeat sequences were generated by jumping PCR. The sequence encoding the mature peptide or the tandem repeat sequence was respectively cloned into the prokaryotic expression vector pET28a (+), and then transformed into E. coli Rosetta DE3 cells for expression. The recombinant proteins (rCs22-2r, rCs22-3r, rCs22M-2r, and rCs22M-3r) were purified by His-bind-resin (Ni-NTA) affinity chromatography. The immunogenicity of rCs22-2r and rCs22-3r was identified by ELISA. To evaluate the immunological diagnostic value of rCs22-2r and rCs22-3r, serum samples from 35 clonorchiasis patients, 31 healthy individuals, 15 schistosomiasis patients, 15 paragonimiasis westermani patients and 13 cysticercosis patients were examined by ELISA. To locate antigenic determinants, the pooled sera of clonorchiasis patients and healthy persons were analyzed for specific antibodies by ELISA with recombinant protein rCs22M-2r and rCs22M-3r containing the tandem repeat sequences. The full-length sequence of Cs22 antigen gene of C. sinensis was obtained. It contained 13 times tandem repeat sequences of EQQDGDEEGMGGDGGRGKEKGKVEGEDGAGEQKEQA. Bioinformatics analysis indicated that the protein (Cs22) belonged to GPI-anchored proteins family. The recombinant proteins rCs22-2r and rCs22-3r showed a certain level of immunogenicity. The positive rate by ELISA coated with the purified PrCs22-2r and PrCs22-3r for sera of clonorchiasis patients both were 45.7% (16/35), and 3.2% (1/31) for those of healthy persons. There was no cross reaction with sera of schistosomiasis and cysticercosis patients. The cross reaction with sera of paragonimiasis westermani patients was 1/15. The recombinant proteins rCs22M-2r and rCs22M-3r which only contained tandem repeats were specifically recognized by pooled sera of clonorchiasis patients. The Cs22 antigen gene of Clonorchis sinensis is obtained, and the recombinant proteins have certain diagnostic value. The antigenic determinant is located in tandem repeat sequences.
Recombinant Collagenlike Proteins
NASA Technical Reports Server (NTRS)
Fertala, Andzej
2007-01-01
A group of collagenlike recombinant proteins containing high densities of biologically active sites has been invented. The method used to express these proteins is similar to a method of expressing recombinant procollagens and collagens described in U. S. Patent 5,593,859, "Synthesis of human procollagens and collagens in recombinant DNA systems." Customized collagenous proteins are needed for biomedical applications. In particular, fibrillar collagens are attractive for production of matrices needed for tissue engineering and drug delivery. Prior to this invention, there was no way of producing customized collagenous proteins for these and other applications. Heretofore, collagenous proteins have been produced by use of such biological systems as yeasts, bacteria, and transgenic animals and plants. These products are normal collagens that can also be extracted from such sources as tendons, bones, and hides. These products cannot be made to consist only of biologically active, specific amino acid sequences that may be needed for specific applications. Prior to this invention, it had been established that fibrillar collagens consist of domains that are responsible for such processes as interaction with cells, binding of growth factors, and interaction with a number of structural proteins present in the extracellular matrix. A normal collagen consists of a sequence of domains that can be represented by a corresponding sequence of labels, e.g., D1D2D3D4. A collagenlike protein of the present invention contains regions of collagen II that contain multiples of a single domain (e.g., D1D1D1D1 or D4D4D4D4) chosen for its specific biological activity. By virtue of the multiplicity of the chosen domain, the density of sites having that specific biological activity is greater than it is in a normal collagen. A collagenlike protein according to this invention can thus be made to have properties that are necessary for tissue engineering.
Gabsalilow, Lilia; Schierling, Benno; Friedhoff, Peter; Pingoud, Alfred; Wende, Wolfgang
2013-04-01
Targeted genome engineering requires nucleases that introduce a highly specific double-strand break in the genome that is either processed by homology-directed repair in the presence of a homologous repair template or by non-homologous end-joining (NHEJ) that usually results in insertions or deletions. The error-prone NHEJ can be efficiently suppressed by 'nickases' that produce a single-strand break rather than a double-strand break. Highly specific nickases have been produced by engineering of homing endonucleases and more recently by modifying zinc finger nucleases (ZFNs) composed of a zinc finger array and the catalytic domain of the restriction endonuclease FokI. These ZF-nickases work as heterodimers in which one subunit has a catalytically inactive FokI domain. We present two different approaches to engineer highly specific nickases; both rely on the sequence-specific nicking activity of the DNA mismatch repair endonuclease MutH which we fused to a DNA-binding module, either a catalytically inactive variant of the homing endonuclease I-SceI or the DNA-binding domain of the TALE protein AvrBs4. The fusion proteins nick strand specifically a bipartite recognition sequence consisting of the MutH and the I-SceI or TALE recognition sequences, respectively, with a more than 1000-fold preference over a stand-alone MutH site. TALE-MutH is a programmable nickase.
Chromosome ends: different sequences may provide conserved functions.
Louis, Edward J; Vershinin, Alexander V
2005-07-01
The structures of specific chromosome regions, centromeres and telomeres, present a number of puzzles. As functions performed by these regions are ubiquitous and essential, their DNA, proteins and chromatin structure are expected to be conserved. Recent studies of centromeric DNA from human, Drosophila and plant species have demonstrated that a hidden universal centromere-specific sequence is highly unlikely. The DNA of telomeres is more conserved consisting of a tandemly repeated 6-8 bp Arabidopsis-like sequence in a majority of organisms as diverse as protozoan, fungi, mammals and plants. However, there are alternatives to short DNA repeats at the ends of chromosomes and for telomere elongation by telomerase. Here we focus on the similarities and diversity that exist among the structural elements, DNA sequences and proteins, that make up terminal domains (telomeres and subtelomeres), and how organisms use these in different ways to fulfil the functions of end-replication and end-protection. Copyright (c) 2005 Wiley Periodicals, Inc.
Tappaz, M; Bitoun, M; Reymond, I; Sergeant, A
1999-09-01
Cysteine sulfinate decarboxylase (CSD) is considered as the rate-limiting enzyme in the biosynthesis of taurine, a possible osmoregulator in brain. Through cloning and sequencing of RT-PCR and RACE-PCR products of rat brain mRNAs, a 2,396-bp cDNA sequence was obtained encoding a protein of 493 amino acids (calculated molecular mass, 55.2 kDa). The corresponding fusion protein showed a substrate specificity similar to that of the endogenous enzyme. The sequence of the encoded protein is identical to that encoded by liver CSD cDNA. Among other characterized amino acid decarboxylases, CSD shows the highest homology (54%) with either isoform of glutamic acid decarboxylase (GAD65 and GAD67). A single mRNA band, approximately 2.5 kb, was detected by northern blot in RNA extracts of brain, liver, and kidney. However, brain and liver CSD cDNA sequences differed in the 5' untranslated region. This indicates two forms of CSD mRNA. Analysis of PCR-amplified products of genomic DNA suggests that the brain form results from the use of a 3' alternative internal splicing site within an exon specifically found in liver CSD mRNA. Through selective RT-PCR the brain form was detected in brain only, whereas the liver form was found in liver and kidney. These results indicate a tissue-specific regulation of CSD genomic expression.
Simone, Domenico; Bay, Denice C.; Leach, Thorin; Turner, Raymond J.
2013-01-01
Background The twin-arginine translocation (Tat) protein export system enables the transport of fully folded proteins across a membrane. This system is composed of two integral membrane proteins belonging to TatA and TatC protein families and in some systems a third component, TatB, a homolog of TatA. TatC participates in substrate protein recognition through its interaction with a twin arginine leader peptide sequence. Methodology/Principal Findings The aim of this study was to explore TatC diversity, evolution and sequence conservation in bacteria to identify how TatC is evolving and diversifying in various bacterial phyla. Surveying bacterial genomes revealed that 77% of all species possess one or more tatC loci and half of these classes possessed only tatC and tatA genes. Phylogenetic analysis of diverse TatC homologues showed that they were primarily inherited but identified a small subset of taxonomically unrelated bacteria that exhibited evidence supporting lateral gene transfer within an ecological niche. Examination of bacilli tatCd/tatCy isoform operons identified a number of known and potentially new Tat substrate genes based on their frequent association to tatC loci. Evolutionary analysis of these Bacilli isoforms determined that TatCy was the progenitor of TatCd. A bacterial TatC consensus sequence was determined and highlighted conserved and variable regions within a three dimensional model of the Escherichia coli TatC protein. Comparative analysis between the TatC consensus sequence and Bacilli TatCd/y isoform consensus sequences revealed unique sites that may contribute to isoform substrate specificity or make TatA specific contacts. Synonymous to non-synonymous nucleotide substitution analyses of bacterial tatC homologues determined that tatC sequence variation differs dramatically between various classes and suggests TatC specialization in these species. Conclusions/Significance TatC proteins appear to be diversifying within particular bacterial classes and its specialization may be driven by the substrates it transports and the environment of its host. PMID:24236045
Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan
2018-02-15
A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Crystal Structure of the Minimalist Max-E47 Protein Chimera
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahmadpour, Faraz; Ghirlando, Rodolfo; De Jong, Antonia T.
Max-E47 is a protein chimera generated from the fusion of the DNA-binding basic region of Max and the dimerization region of E47, both members of the basic region/helix-loop-helix (bHLH) superfamily of transcription factors. Like native Max, Max-E47 binds with high affinity and specificity to the E-box site, 5'-CACGTG, both in vivo and in vitro. We have determined the crystal structure of Max-E47 at 1.7 Å resolution, and found that it associates to form a well-structured dimer even in the absence of its cognate DNA. Analytical ultracentrifugation confirms that Max-E47 is dimeric even at low micromolar concentrations, indicating that the Max-E47more » dimer is stable in the absence of DNA. Circular dichroism analysis demonstrates that both non-specific DNA and the E-box site induce similar levels of helical secondary structure in Max-E47. These results suggest that Max-E47 may bind to the E-box following the two-step mechanism proposed for other bHLH proteins. In this mechanism, a rapid step where protein binds to DNA without sequence specificity is followed by a slow step where specific protein:DNA interactions are fine-tuned, leading to sequence-specific recognition. Collectively, these results show that the designed Max-E47 protein chimera behaves both structurally and functionally like its native counterparts.« less
UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.
Meinicke, Peter
2009-09-02
Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.
Yu, Chuanhe; Gan, Haiyun; Zhang, Zhiguo
2018-01-01
DNA replication initiates at DNA replication origins after unwinding of double-strand DNA(dsDNA) by replicative helicase to generate single-stranded DNA (ssDNA) templates for the continuous synthesis of leading-strand and the discontinuous synthesis of lagging-strand. Therefore, methods capable of detecting strand-specific information will likely yield insight into the association of proteins at leading and lagging strand of DNA replication forks and the regulation of leading and lagging strand synthesis during DNA replication. The enrichment and Sequencing of Protein-Associated Nascent DNA (eSPAN), which measure the relative amounts of proteins at nascent leading and lagging strands of DNA replication forks, is a step-wise procedure involving the chromatin immunoprecipitation (ChIP) of a protein of interest followed by the enrichment of protein-associated nascent DNA through BrdU immunoprecipitation. The isolated ssDNA is then subjected to strand-specific sequencing. This method can detect whether a protein is enriched at leading or lagging strand of DNA replication forks. In addition to eSPAN, two other strand-specific methods, (ChIP-ssSeq), which detects potential protein-ssDNA binding and BrdU-IP-ssSeq, which can measure synthesis of both leading and lagging strand, were developed along the way. These methods can provide strand-specific and complementary information about the association of the target protein with DNA replication forks as well as synthesis of leading and lagging strands genome wide. Below, we describe the detailed eSPAN, ChIP-ssSeq, and BrdU-IP-ssSeq protocols.
Roy, Susmita; Bagchi, Biman
2014-05-29
Elucidation of possible pathways between folded (native) and unfolded states of a protein is a challenging task, as the intermediates are often hard to detect. Here, we alter the solvent environment in a controlled manner by choosing two different cosolvents of water, urea, and dimethyl sulfoxide (DMSO) and study unfolding of four different proteins to understand the respective sequence of melting by computer simulation methods. We indeed find interesting differences in the sequence of melting of α helices and β sheets in these two solvents. For example, in 8 M urea solution, β-sheet parts of a protein are found to unfold preferentially, followed by the unfolding of α helices. In contrast, 8 M DMSO solution unfolds α helices first, followed by the separation of β sheets for the majority of proteins. Sequence of unfolding events in four different α/β proteins and also in chicken villin head piece (HP-36) both in urea and DMSO solutions demonstrate that the unfolding pathways are determined jointly by relative exposure of polar and nonpolar residues of a protein and the mode of molecular action of a solvent on that protein.
Soto, M; Requena, J M; Quijada, L; García, M; Guzman, F; Patarroyo, M E; Alonso, C
1995-12-01
Antibodies reacting against the H2A histone protein were frequently observed in the sera from dogs naturally infected with the protozoan parasite Leishmania infantum. Using synthetic peptides covering the complete sequence of the protein we have identified the amino terminal region, comprising from amino acids 1 to 20, and the carboxyl terminal region, comprising from amino acids 106 to 132, as conforming the antigenic determinants of the protein. Those regions, exposed in the nucleosome surface, are highly divergent in sequence relative to the mammalian H2A histones. The anti-H2A histone antibodies present in the sera of these dogs specifically recognize the L. infantum H2A histone and they do not react with mammalian histones. The present data indicate that, in spite of the evolutionary conservation of the H2A histone protein among eukaryotic organisms, the humoral response against this protein during natural infection is specifically triggered by the parasite protein antigenic determinants.
Subrahmanyam, S; Cronan, J E
1999-01-21
We report an efficient and flexible in vitro method for the isolation of genomic DNA sequences that are the binding targets of a given DNA binding protein. This method takes advantage of the fact that binding of a protein to a DNA molecule generally increases the rate of migration of the protein in nondenaturing gel electrophoresis. By the use of a radioactively labeled DNA-binding protein and nonradioactive DNA coupled with PCR amplification from gel slices, we show that specific binding sites can be isolated from Escherichia coli genomic DNA. We have applied this method to isolate a binding site for FadR, a global regulator of fatty acid metabolism in E. coli. We have also isolated a second binding site for BirA, the biotin operon repressor/biotin ligase, from the E. coli genome that has a very low binding efficiency compared with the bio operator region.
Computational Design of DNA-Binding Proteins.
Thyme, Summer; Song, Yifan
2016-01-01
Predicting the outcome of engineered and naturally occurring sequence perturbations to protein-DNA interfaces requires accurate computational modeling technologies. It has been well established that computational design to accommodate small numbers of DNA target site substitutions is possible. This chapter details the basic method of design used in the Rosetta macromolecular modeling program that has been successfully used to modulate the specificity of DNA-binding proteins. More recently, combining computational design and directed evolution has become a common approach for increasing the success rate of protein engineering projects. The power of such high-throughput screening depends on computational methods producing multiple potential solutions. Therefore, this chapter describes several protocols for increasing the diversity of designed output. Lastly, we describe an approach for building comparative models of protein-DNA complexes in order to utilize information from homologous sequences. These models can be used to explore how nature modulates specificity of protein-DNA interfaces and potentially can even be used as starting templates for further engineering.
Identification and characterisation of seed storage protein transcripts from Lupinus angustifolius
2011-01-01
Background In legumes, seed storage proteins are important for the developing seedling and are an important source of protein for humans and animals. Lupinus angustifolius (L.), also known as narrow-leaf lupin (NLL) is a grain legume crop that is gaining recognition as a potential human health food as the grain is high in protein and dietary fibre, gluten-free and low in fat and starch. Results Genes encoding the seed storage proteins of NLL were characterised by sequencing cDNA clones derived from developing seeds. Four families of seed storage proteins were identified and comprised three unique α, seven β, two γ and four δ conglutins. This study added eleven new expressed storage protein genes for the species. A comparison of the deduced amino acid sequences of NLL conglutins with those available for the storage proteins of Lupinus albus (L.), Pisum sativum (L.), Medicago truncatula (L.), Arachis hypogaea (L.) and Glycine max (L.) permitted the analysis of a phylogenetic relationships between proteins and demonstrated, in general, that the strongest conservation occurred within species. In the case of 7S globulin (β conglutins) and 2S sulphur-rich albumin (δ conglutins), the analysis suggests that gene duplication occurred after legume speciation. This contrasted with 11S globulin (α conglutin) and basic 7S (γ conglutin) sequences where some of these sequences appear to have diverged prior to speciation. The most abundant NLL conglutin family was β (56%), followed by α (24%), δ (15%) and γ (6%) and the transcript levels of these genes increased 103 to 106 fold during seed development. We used the 16 NLL conglutin sequences identified here to determine that for individuals specifically allergic to lupin, all seven members of the β conglutin family were potential allergens. Conclusion This study has characterised 16 seed storage protein genes in NLL including 11 newly-identified members. It has helped lay the foundation for efforts to use molecular breeding approaches to improve lupins, for example by reducing allergens or increasing the expression of specific seed storage protein(s) with desirable nutritional properties. PMID:21457583
Identification and characterisation of seed storage protein transcripts from Lupinus angustifolius.
Foley, Rhonda C; Gao, Ling-Ling; Spriggs, Andrew; Soo, Lena Y C; Goggin, Danica E; Smith, Penelope M C; Atkins, Craig A; Singh, Karam B
2011-04-04
In legumes, seed storage proteins are important for the developing seedling and are an important source of protein for humans and animals. Lupinus angustifolius (L.), also known as narrow-leaf lupin (NLL) is a grain legume crop that is gaining recognition as a potential human health food as the grain is high in protein and dietary fibre, gluten-free and low in fat and starch. Genes encoding the seed storage proteins of NLL were characterised by sequencing cDNA clones derived from developing seeds. Four families of seed storage proteins were identified and comprised three unique α, seven β, two γ and four δ conglutins. This study added eleven new expressed storage protein genes for the species. A comparison of the deduced amino acid sequences of NLL conglutins with those available for the storage proteins of Lupinus albus (L.), Pisum sativum (L.), Medicago truncatula (L.), Arachis hypogaea (L.) and Glycine max (L.) permitted the analysis of a phylogenetic relationships between proteins and demonstrated, in general, that the strongest conservation occurred within species. In the case of 7S globulin (β conglutins) and 2S sulphur-rich albumin (δ conglutins), the analysis suggests that gene duplication occurred after legume speciation. This contrasted with 11S globulin (α conglutin) and basic 7S (γ conglutin) sequences where some of these sequences appear to have diverged prior to speciation. The most abundant NLL conglutin family was β (56%), followed by α (24%), δ (15%) and γ (6%) and the transcript levels of these genes increased 103 to 106 fold during seed development. We used the 16 NLL conglutin sequences identified here to determine that for individuals specifically allergic to lupin, all seven members of the β conglutin family were potential allergens. This study has characterised 16 seed storage protein genes in NLL including 11 newly-identified members. It has helped lay the foundation for efforts to use molecular breeding approaches to improve lupins, for example by reducing allergens or increasing the expression of specific seed storage protein(s) with desirable nutritional properties.
Hogan, Daniel J; Riordan, Daniel P; Gerber, André P; Herschlag, Daniel; Brown, Patrick O
2008-10-28
RNA-binding proteins (RBPs) have roles in the regulation of many post-transcriptional steps in gene expression, but relatively few RBPs have been systematically studied. We searched for the RNA targets of 40 proteins in the yeast Saccharomyces cerevisiae: a selective sample of the approximately 600 annotated and predicted RBPs, as well as several proteins not annotated as RBPs. At least 33 of these 40 proteins, including three of the four proteins that were not previously known or predicted to be RBPs, were reproducibly associated with specific sets of a few to several hundred RNAs. Remarkably, many of the RBPs we studied bound mRNAs whose protein products share identifiable functional or cytotopic features. We identified specific sequences or predicted structures significantly enriched in target mRNAs of 16 RBPs. These potential RNA-recognition elements were diverse in sequence, structure, and location: some were found predominantly in 3'-untranslated regions, others in 5'-untranslated regions, some in coding sequences, and many in two or more of these features. Although this study only examined a small fraction of the universe of yeast RBPs, 70% of the mRNA transcriptome had significant associations with at least one of these RBPs, and on average, each distinct yeast mRNA interacted with three of the RBPs, suggesting the potential for a rich, multidimensional network of regulation. These results strongly suggest that combinatorial binding of RBPs to specific recognition elements in mRNAs is a pervasive mechanism for multi-dimensional regulation of their post-transcriptional fate.
Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L
2015-12-04
Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.
The biological activity of ABA-1-like protein from Ascaris lumbricoides.
Muto, R; Imai, S; Tezuka, H; Furuhashi, Y; Fujita, K
2001-09-01
The elevation of non-specific IgE (total IgE) in Ascaris infection can be seen one week after infection, and reaches a peak after approximately two weeks. It has been reported that ABA-1 protein is the main constituent in the pseudocoelomic fluid of Ascaris suum. To investigate the effect of the ABA-1-like protein from Ascaris lumbricoides (ALB), the cDNA was cloned by reverse transcriptase polymerase chain reaction, using original primers based on the consensus sequences of ABA-1 and TBA-1, that is an ABA-1-like protein from Toxocara canis. The clone was sequenced, we constructed the recombinant polyprotein of ALB (rALB14 and rALB7) based on the ALB sequence, and rALB was administrated to BALB/c mice. Fourteen days after inoculation with rALB14 which is the full length of ALB, the elevation of total IgE which we supposed to contain non-specific IgE was observed, and the results were as we expected. Furthermore, in an in-vitro experiment, we confirmed that the spleen cells proliferated when stimulated by rALB14 and concanavalin A. Therefore, the whole conformation of ALB is considered to be involved in the elevation of non-specific IgE, and is involved in the activation of T cells.
Rose, Annkatrin; Schraegle, Shannon J; Stahlberg, Eric A; Meier, Iris
2005-11-16
Long alpha-helical coiled-coil proteins are involved in diverse organizational and regulatory processes in eukaryotic cells. They provide cables and networks in the cyto- and nucleoskeleton, molecular scaffolds that organize membrane systems and tissues, motors, levers, rotating arms, and possibly springs. Mutations in long coiled-coil proteins have been implemented in a growing number of human diseases. Using the coiled-coil prediction program MultiCoil, we have previously identified all long coiled-coil proteins from the model plant Arabidopsis thaliana and have established a searchable Arabidopsis coiled-coil protein database. Here, we have identified all proteins with long coiled-coil domains from 21 additional fully sequenced genomes. Because regions predicted to form coiled-coils interfere with sequence homology determination, we have developed a sequence comparison and clustering strategy based on masking predicted coiled-coil domains. Comparing and grouping all long coiled-coil proteins from 22 genomes, the kingdom-specificity of coiled-coil protein families was determined. At the same time, a number of proteins with unknown function could be grouped with already characterized proteins from other organisms. MultiCoil predicts proteins with extended coiled-coil domains (more than 250 amino acids) to be largely absent from bacterial genomes, but present in archaea and eukaryotes. The structural maintenance of chromosomes proteins and their relatives are the only long coiled-coil protein family clearly conserved throughout all kingdoms, indicating their ancient nature. Motor proteins, membrane tethering and vesicle transport proteins are the dominant eukaryote-specific long coiled-coil proteins, suggesting that coiled-coil proteins have gained functions in the increasingly complex processes of subcellular infrastructure maintenance and trafficking control of the eukaryotic cell.
Rose, Annkatrin; Schraegle, Shannon J; Stahlberg, Eric A; Meier, Iris
2005-01-01
Background Long alpha-helical coiled-coil proteins are involved in diverse organizational and regulatory processes in eukaryotic cells. They provide cables and networks in the cyto- and nucleoskeleton, molecular scaffolds that organize membrane systems and tissues, motors, levers, rotating arms, and possibly springs. Mutations in long coiled-coil proteins have been implemented in a growing number of human diseases. Using the coiled-coil prediction program MultiCoil, we have previously identified all long coiled-coil proteins from the model plant Arabidopsis thaliana and have established a searchable Arabidopsis coiled-coil protein database. Results Here, we have identified all proteins with long coiled-coil domains from 21 additional fully sequenced genomes. Because regions predicted to form coiled-coils interfere with sequence homology determination, we have developed a sequence comparison and clustering strategy based on masking predicted coiled-coil domains. Comparing and grouping all long coiled-coil proteins from 22 genomes, the kingdom-specificity of coiled-coil protein families was determined. At the same time, a number of proteins with unknown function could be grouped with already characterized proteins from other organisms. Conclusion MultiCoil predicts proteins with extended coiled-coil domains (more than 250 amino acids) to be largely absent from bacterial genomes, but present in archaea and eukaryotes. The structural maintenance of chromosomes proteins and their relatives are the only long coiled-coil protein family clearly conserved throughout all kingdoms, indicating their ancient nature. Motor proteins, membrane tethering and vesicle transport proteins are the dominant eukaryote-specific long coiled-coil proteins, suggesting that coiled-coil proteins have gained functions in the increasingly complex processes of subcellular infrastructure maintenance and trafficking control of the eukaryotic cell. PMID:16288662
Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas
2014-01-01
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727
NASA Technical Reports Server (NTRS)
Haney, P. J.; Badger, J. H.; Buldak, G. L.; Reich, C. I.; Woese, C. R.; Olsen, G. J.
1999-01-01
The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.
Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse
Hillier, LaDeana W.; Zody, Michael C.; Goldstein, Steve; She, Xinwe; Bult, Carol J.; Agarwala, Richa; Cherry, Joshua L.; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C.; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C.; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E.; Ponting, Chris P.
2009-01-01
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. PMID:19468303
Marshall, Owen J; Southall, Tony D; Cheetham, Seth W; Brand, Andrea H
2016-09-01
This protocol is an extension to: Nat. Protoc. 2, 1467-1478 (2007); doi:10.1038/nprot.2007.148; published online 7 June 2007The ability to profile transcription and chromatin binding in a cell-type-specific manner is a powerful aid to understanding cell-fate specification and cellular function in multicellular organisms. We recently developed targeted DamID (TaDa) to enable genome-wide, cell-type-specific profiling of DNA- and chromatin-binding proteins in vivo without cell isolation. As a protocol extension, this article describes substantial modifications to an existing protocol, and it offers additional applications. TaDa builds upon DamID, a technique for detecting genome-wide DNA-binding profiles of proteins, by coupling it with the GAL4 system in Drosophila to enable both temporal and spatial resolution. TaDa ensures that Dam-fusion proteins are expressed at very low levels, thus avoiding toxicity and potential artifacts from overexpression. The modifications to the core DamID technique presented here also increase the speed of sample processing and throughput, and adapt the method to next-generation sequencing technology. TaDa is robust, reproducible and highly sensitive. Compared with other methods for cell-type-specific profiling, the technique requires no cell-sorting, cross-linking or antisera, and binding profiles can be generated from as few as 10,000 total induced cells. By profiling the genome-wide binding of RNA polymerase II (Pol II), TaDa can also identify transcribed genes in a cell-type-specific manner. Here we describe a detailed protocol for carrying out TaDa experiments and preparing the material for next-generation sequencing. Although we developed TaDa in Drosophila, it should be easily adapted to other organisms with an inducible expression system. Once transgenic animals are obtained, the entire experimental procedure-from collecting tissue samples to generating sequencing libraries-can be accomplished within 5 d.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carbonell, Alberto; Martinez de Alba, Angel-Emilio; Flores, Ricardo
2008-02-05
Infection by viroids, non-protein-coding circular RNAs, occurs with the accumulation of 21-24 nt viroid-derived small RNAs (vd-sRNAs) with characteristic properties of small interfering RNAs (siRNAs) associated to RNA silencing. The vd-sRNAs most likely derive from dicer-like (DCL) enzymes acting on viroid-specific dsRNA, the key elicitor of RNA silencing, or on the highly structured genomic RNA. Previously, viral dsRNAs delivered mechanically or agroinoculated have been shown to interfere with virus infection in a sequence-specific manner. Here, we report similar results with members of the two families of nuclear- and chloroplast-replicating viroids. Moreover, homologous vd-sRNAs co-delivered mechanically also interfered with one ofmore » the viroids examined. The interference was sequence-specific, temperature-dependent and, in some cases, also dependent on the dose of the co-inoculated dsRNA or vd-sRNAs. The sequence-specific nature of these effects suggests the involvement of the RNA induced silencing complex (RISC), which provides sequence specificity to RNA silencing machinery. Therefore, viroid titer in natural infections might be regulated by the concerted action of DCL and RISC. Viroids could have evolved their secondary structure as a compromise between resistance to DCL and RISC, which act preferentially against RNAs with compact and relaxed secondary structures, respectively. In addition, compartmentation, association with proteins or active replication might also help viroids to elude their host RNA silencing machinery.« less
Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2012-05-10
RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
Kobayashi, Takehito; Yagi, Yusuke; Nakamura, Takahiro
2016-01-01
The pentatricopeptide repeat (PPR) motif is a sequence-specific RNA/DNA-binding module. Elucidation of the RNA/DNA recognition mechanism has enabled engineering of PPR motifs as new RNA/DNA manipulation tools in living cells, including for genome editing. However, the biochemical characteristics of PPR proteins remain unknown, mostly due to the instability and/or unfolding propensities of PPR proteins in heterologous expression systems such as bacteria and yeast. To overcome this issue, we constructed reporter systems using animal cultured cells. The cell-based system has highly attractive features for PPR engineering: robust eukaryotic gene expression; availability of various vectors, reagents, and antibodies; highly efficient DNA delivery ratio (>80 %); and rapid, high-throughput data production. In this chapter, we introduce an example of such reporter systems: a PPR-based sequence-specific translational activation system. The cell-based reporter system can be applied to characterize plant genes of interested and to PPR engineering.
Musinova, Yana R; Kananykhina, Eugenia Y; Potashnikova, Daria M; Lisitsyna, Olga M; Sheval, Eugene V
2015-01-01
The majority of known nucleolar proteins are freely exchanged between the nucleolus and the surrounding nucleoplasm. One way proteins are retained in the nucleoli is by the presence of specific amino acid sequences, namely nucleolar localization signals (NoLSs). The mechanism by which NoLSs retain proteins inside the nucleoli is still unclear. Here, we present data showing that the charge-dependent (electrostatic) interactions of NoLSs with nucleolar components lead to nucleolar accumulation as follows: (i) known NoLSs are enriched in positively charged amino acids, but the NoLS structure is highly heterogeneous, and it is not possible to identify a consensus sequence for this type of signal; (ii) in two analyzed proteins (NF-κB-inducing kinase and HIV-1 Tat), the NoLS corresponds to a region that is enriched for positively charged amino acid residues; substituting charged amino acids with non-charged ones reduced the nucleolar accumulation in proportion to the charge reduction, and nucleolar accumulation efficiency was strongly correlated with the predicted charge of the tested sequences; and (iii) sequences containing only lysine or arginine residues (which were referred to as imitative NoLSs, or iNoLSs) are accumulated in the nucleoli in a charge-dependent manner. The results of experiments with iNoLSs suggested that charge-dependent accumulation inside the nucleoli was dependent on interactions with nucleolar RNAs. The results of this work are consistent with the hypothesis that nucleolar protein accumulation by NoLSs can be determined by the electrostatic interaction of positively charged regions with nucleolar RNAs rather than by any sequence-specific mechanism. Copyright © 2014 Elsevier B.V. All rights reserved.
Stratification of co-evolving genomic groups using ranked phylogenetic profiles
Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A
2009-01-01
Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884
de Souza, C R; Aragão, F J; Moreira, E C O; Costa, C N M; Nascimento, S B; Carvalho, L J
2009-03-24
Cassava is one of the most important tropical food crops for more than 600 million people worldwide. Transgenic technologies can be useful for increasing its nutritional value and its resistance to viral diseases and insect pests. However, tissue-specific promoters that guarantee correct expression of transgenes would be necessary. We used inverse polymerase chain reaction to isolate a promoter sequence of the Mec1 gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in cassava storage roots. In silico analysis revealed putative cis-acting regulatory elements within this promoter sequence, including root-specific elements that may be required for its expression in vascular tissues. Transient expression experiments showed that the Mec1 promoter is functional, since this sequence was able to drive GUS expression in bean embryonic axes. Results from our computational analysis can serve as a guide for functional experiments to identify regions with tissue-specific Mec1 promoter activity. The DNA sequence that we identified is a new promoter that could be a candidate for genetic engineering of cassava roots.
Nature of the protein universe
Levitt, Michael
2009-01-01
The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by ≈15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families. PMID:19541617
Gastrointestinal Endogenous Proteins as a Source of Bioactive Peptides - An In Silico Study
Dave, Lakshmi A.; Montoya, Carlos A.; Rutherfurd, Shane M.; Moughan, Paul J.
2014-01-01
Dietary proteins are known to contain bioactive peptides that are released during digestion. Endogenous proteins secreted into the gastrointestinal tract represent a quantitatively greater supply of protein to the gut lumen than those of dietary origin. Many of these endogenous proteins are digested in the gastrointestinal tract but the possibility that these are also a source of bioactive peptides has not been considered. An in silico prediction method was used to test if bioactive peptides could be derived from the gastrointestinal digestion of gut endogenous proteins. Twenty six gut endogenous proteins and seven dietary proteins were evaluated. The peptides present after gastric and intestinal digestion were predicted based on the amino acid sequence of the proteins and the known specificities of the major gastrointestinal proteases. The predicted resultant peptides possessing amino acid sequences identical to those of known bioactive peptides were identified. After gastrointestinal digestion (based on the in silico simulation), the total number of bioactive peptides predicted to be released ranged from 1 (gliadin) to 55 (myosin) for the selected dietary proteins and from 1 (secretin) to 39 (mucin-5AC) for the selected gut endogenous proteins. Within the intact proteins and after simulated gastrointestinal digestion, angiotensin converting enzyme (ACE)-inhibitory peptide sequences were the most frequently observed in both the dietary and endogenous proteins. Among the dietary proteins, after in silico simulated gastrointestinal digestion, myosin was found to have the highest number of ACE-inhibitory peptide sequences (49 peptides), while for the gut endogenous proteins, mucin-5AC had the greatest number of ACE-inhibitory peptide sequences (38 peptides). Gut endogenous proteins may be an important source of bioactive peptides in the gut particularly since gut endogenous proteins represent a quantitatively large and consistent source of protein. PMID:24901416
Feldhoff, A; Wetzel, T; Peters, D; Kellner, R; Krczal, G
1998-01-01
With the introduction of cutting-grown Petunia x hybrida plants on the European market, a new potyvirus which showed no serological reaction with antisera against any other potyviruses infecting petunias was discovered. Infected leaves contained flexuous rod-shaped virus particles of 750-800 nm in length and inclusion bodies (pinwheel structures) typical for potyviruses in ultrathin leaf sections. The purified coat protein with a Mr of approximately 36 kDa could be detected in Western immunoblots with a specific antibody to the coat protein of the petunia-infecting virus. The 3' end of the viral genome encompassing the 3' non-coding region, the coat protein gene, and part of the NIb gene was amplified from infected leaf material by IC/PCR using degenerate and specific primers. Sequences of PCR-generated cDNA clones were compared to other known sequences of potyviruses. Maximum homology of 56% was found in the 3' non-coding region between the petunia isolate and other potyviruses. A maximum homology of 69% was found between the amino acid sequence of the coat protein of the petunia isolate and corresponding sequences of other potyviruses. These data indicate that the petunia-infecting virus is a previously undescribed potyvirus and the name petunia flower mottle virus (PetFMV) is suggested.
Applying the Concept of Peptide Uniqueness to Anti-Polio Vaccination
Kanduc, Darja; Fasano, Candida; Capone, Giovanni; Pesce Delfino, Antonella; Calabrò, Michele; Polimeno, Lorenzo
2015-01-01
Background. Although rare, adverse events may associate with anti-poliovirus vaccination thus possibly hampering global polio eradication worldwide. Objective. To design peptide-based anti-polio vaccines exempt from potential cross-reactivity risks and possibly able to reduce rare potential adverse events such as the postvaccine paralytic poliomyelitis due to the tendency of the poliovirus genome to mutate. Methods. Proteins from poliovirus type 1, strain Mahoney, were analyzed for amino acid sequence identity to the human proteome at the pentapeptide level, searching for sequences that (1) have zero percent of identity to human proteins, (2) are potentially endowed with an immunologic potential, and (3) are highly conserved among poliovirus strains. Results. Sequence analyses produced a set of consensus epitopic peptides potentially able to generate specific anti-polio immune responses exempt from cross-reactivity with the human host. Conclusion. Peptide sequences unique to poliovirus proteins and conserved among polio strains might help formulate a specific and universal anti-polio vaccine able to react with multiple viral strains and exempt from the burden of possible cross-reactions with human proteins. As an additional advantage, using a peptide-based vaccine instead of current anti-polio DNA vaccines would eliminate the rare post-polio poliomyelitis cases and other disabling symptoms that may appear following vaccination. PMID:26568962
Modahl, Cassandra M.; Mackessy, Stephen P.
2016-01-01
Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides access to cDNA sequences in the absence of living specimens, even from commercial venom sources, to evaluate important regional differences in venom composition and to study snake venom protein evolution. PMID:27280639
Pastor, N; Pardo, L; Weinstein, H
1997-01-01
The binding of the TATA box-binding protein (TBP) to a TATA sequence in DNA is essential for eukaryotic basal transcription. TBP binds in the minor groove of DNA, causing a large distortion of the DNA helix. Given the apparent stereochemical equivalence of AT and TA basepairs in the minor groove, DNA deformability must play a significant role in binding site selection, because not all AT-rich sequences are bound effectively by TBP. To gain insight into the precise role that the properties of the TATA sequence have in determining the specificity of the DNA substrates of TBP, the solution structure and dynamics of seven DNA dodecamers have been studied by using molecular dynamics simulations. The analysis of the structural properties of basepair steps in these TATA sequences suggests a reason for the preference for alternating pyrimidine-purine (YR) sequences, but indicates that these properties cannot be the sole determinant of the sequence specificity of TBP. Rather, recognition depends on the interplay between the inherent deformability of the DNA and steric complementarity at the molecular interface. Images FIGURE 2 PMID:9251783
Formin homology 2 domains occur in multiple contexts in angiosperms
Cvrčková, Fatima; Novotný, Marian; Pícková, Denisa; Žárský, Viktor
2004-01-01
Background Involvement of conservative molecular modules and cellular mechanisms in the widely diversified processes of eukaryotic cell morphogenesis leads to the intriguing question: how do similar proteins contribute to dissimilar morphogenetic outputs. Formins (FH2 proteins) play a central part in the control of actin organization and dynamics, providing a good example of evolutionarily versatile use of a conserved protein domain in the context of a variety of lineage-specific structural and signalling interactions. Results In order to identify possible plant-specific sequence features within the FH2 protein family, we performed a detailed analysis of angiosperm formin-related sequences available in public databases, with particular focus on the complete Arabidopsis genome and the nearly finished rice genome sequence. This has led to revision of the current annotation of half of the 22 Arabidopsis formin-related genes. Comparative analysis of the two plant genomes revealed a good conservation of the previously described two subfamilies of plant formins (Class I and Class II), as well as several subfamilies within them that appear to predate the separation of monocot and dicot plants. Moreover, a number of plant Class II formins share an additional conserved domain, related to the protein phosphatase/tensin/auxilin fold. However, considerable inter-species variability sets limits to generalization of any functional conclusions reached on a single species such as Arabidopsis. Conclusions The plant-specific domain context of the conserved FH2 domain, as well as plant-specific features of the domain itself, may reflect distinct functional requirements in plant cells. The variability of formin structures found in plants far exceeds that known from both fungi and metazoans, suggesting a possible contribution of FH2 proteins in the evolution of the plant type of multicellularity. PMID:15256004
Development of a dedicated peptide tandem mass spectral library for conservation science.
Fremout, Wim; Dhaenens, Maarten; Saverwyns, Steven; Sanyova, Jana; Vandenabeele, Peter; Deforce, Dieter; Moens, Luc
2012-05-30
In recent years, the use of liquid chromatography tandem mass spectrometry (LC-MS/MS) on tryptic digests of cultural heritage objects has attracted much attention. It allows for unambiguous identification of peptides and proteins, and even in complex mixtures species-specific identification becomes feasible with minimal sample consumption. Determination of the peptides is commonly based on theoretical cleavage of known protein sequences and on comparison of the expected peptide fragments with those found in the MS/MS spectra. In this approach, complex computer programs, such as Mascot, perform well identifying known proteins, but fail when protein sequences are unknown or incomplete. Often, when trying to distinguish evolutionarily well preserved collagens of different species, Mascot lacks the required specificity. Complementary and often more accurate information on the proteins can be obtained using a reference library of MS/MS spectra of species-specific peptides. Therefore, a library dedicated to various sources of proteins in works of art was set up, with an initial focus on collagen rich materials. This paper discusses the construction and the advantages of this spectral library for conservation science, and its application on a number of samples from historical works of art. Copyright © 2012 Elsevier B.V. All rights reserved.
Cathepsins L and Z Are Critical in Degrading Polyglutamine-containing Proteins within Lysosomes*
Bhutani, Nidhi; Piccirillo, Rosanna; Hourez, Raphael; Venkatraman, Prasanna; Goldberg, Alfred L.
2012-01-01
In neurodegenerative diseases caused by extended polyglutamine (polyQ) sequences in proteins, aggregation-prone polyQ proteins accumulate in intraneuronal inclusions. PolyQ proteins can be degraded by lysosomes or proteasomes. Proteasomes are unable to hydrolyze polyQ repeat sequences, and during breakdown of polyQ proteins, they release polyQ repeat fragments for degradation by other cellular enzymes. This study was undertaken to identify the responsible proteases. Lysosomal extracts (unlike cytosolic enzymes) were found to rapidly hydrolyze polyQ sequences in peptides, proteins, or insoluble aggregates. Using specific inhibitors against lysosomal proteases, enzyme-deficient extracts, and pure cathepsins, we identified cathepsins L and Z as the lysosomal cysteine proteases that digest polyQ proteins and peptides. RNAi for cathepsins L and Z in different cell lines and adult mouse muscles confirmed that they are critical in degrading polyQ proteins (expanded huntingtin exon 1) but not other types of aggregation-prone proteins (e.g. mutant SOD1). Therefore, the activities of these two lysosomal cysteine proteases are important in host defense against toxic accumulation of polyQ proteins. PMID:22451661
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan
2012-06-19
We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less
Cell proteins bind to multiple sites within the 5' untranslated region of poliovirus RNA.
del Angel, R M; Papavassiliou, A G; Fernández-Tomás, C; Silverstein, S J; Racaniello, V R
1989-01-01
The 5' noncoding region of poliovirus RNA contains sequences necessary for translation and replication. These functions are probably carried out by recognition of poliovirus RNA by cellular and/or viral proteins. Using a mobility-shift electrophoresis assay and 1,10-phenanthroline/Cu+ footprinting, we demonstrate specific binding of cytoplasmic factors with a sequence from nucleotides 510-629 within the 5' untranslated region (UTR). Complex formation was also observed with a second sequence (nucleotides 97-182) within the 5' UTR. These two regions of the 5' UTR appear to be recognized by distinct cell factors as determined by competition analysis and the effects of ionic strength on complex formation. However, both complexes contain eukaryotic initiation factor 2 alpha, as revealed by their reaction with specific antibody. Images PMID:2554308
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-02-06
Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-09-01
Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Poliovirus serotype-specific VP1 sequencing primers.
Kilpatrick, David R; Iber, Jane C; Chen, Qi; Ching, Karen; Yang, Su-Ju; De, Lina; Mandelbaum, Mark D; Emery, Brian; Campagnoli, Ray; Burns, Cara C; Kew, Olen
2011-06-01
The Global Polio Laboratory Network routinely uses poliovirus-specific PCR primers and probes to determine the serotype and genotype of poliovirus isolates obtained as part of global poliovirus surveillance. To provide detailed molecular epidemiologic information, poliovirus isolates are further characterized by sequencing the ~900-nucleotide region encoding the major capsid protein, VP1. It is difficult to obtain quality sequence information when clinical or environmental samples contain poliovirus mixtures. As an alternative to conventional methods for resolving poliovirus mixtures, sets of serotype-specific primers were developed for amplifying and sequencing the VP1 regions of individual components of mixed populations of vaccine-vaccine, vaccine-wild, and wild-wild polioviruses. Published by Elsevier B.V.
Huh, T L; Ryu, J H; Huh, J W; Sung, H C; Oh, I U; Song, B J; Veech, R L
1993-01-01
Mitochondrial NADP(+)-specific isocitrate dehydrogenase (IDP) was co-purified with the pyruvate dehydrogenase complex from bovine kidney mitochondria. The determination of its N-terminal 16-amino-acid sequence revealed that it is highly similar to the IDP from yeast. A cDNA clone (1.8 kb long) encoding this protein was isolated from a bovine kidney lambda gt11 cDNA library using a synthetic oligodeoxynucleotide. The deduced protein sequence of this cDNA clone rendered a precursor protein of 452 amino-acid residues (50,830 Da) and a mature protein of 413 amino-acid residues (46,519 Da). It is 100% identical to the internal tryptic peptide sequences of the autologous form from pig heart and 62% similar to that from yeast. However, it shares little similarity with the mitochondrial NAD(+)-specific isoenzyme from yeast. Structural analyses of the deduced proteins of IDP isoenzymes from different species indicated that similarity exists in certain regions, which may represent the common domains for the active sites or coenzyme-binding sites. In Northern-blot analysis, one species of mRNA (about 2.2 kb for both bovine and human) was hybridized with a 32P-labelled cDNA probe. Southern-blot analysis of genomic DNAs verified simple patterns of hybridization with this cDNA. These results strongly indicate that the mitochondrial IDP may be derived from a single gene family which does not appear to be closely related to that of the NAD(+)-specific isoenzyme. Images Figure 1 Figure 3 Figure 4 Figure 5 PMID:8318002
Computer-based prediction of mitochondria-targeting peptides.
Martelli, Pier Luigi; Savojardo, Castrense; Fariselli, Piero; Tasco, Gianluca; Casadio, Rita
2015-01-01
Computational methods are invaluable when protein sequences, directly derived from genomic data, need functional and structural annotation. Subcellular localization is a feature necessary for understanding the protein role and the compartment where the mature protein is active and very difficult to characterize experimentally. Mitochondrial proteins encoded on the cytosolic ribosomes carry specific patterns in the precursor sequence from where it is possible to recognize a peptide targeting the protein to its final destination. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequences and benchmark our and other methods on the human mitochondrial proteins endowed with experimentally characterized targeting peptides. Furthermore, we illustrate our newly implemented web server and its usage on the whole human proteome in order to infer mitochondrial targeting peptides, their cleavage sites, and whether the targeting peptide regions contain or not arginine-rich recurrent motifs. By this, we add some other 2,800 human proteins to the 124 ones already experimentally annotated with a mitochondrial targeting peptide.
Yamane, Asaka; Fukui, Mina; Sugimura, Yoshiaki; Itoh, Miho; Alea, Mileidys Perez; Thomas, Vincent; El Alaoui, Said; Akiyama, Masashi; Hitomi, Kiyotaka
2010-09-01
Transglutaminases (TGases) are a family of enzymes that catalyze cross-linking reactions between proteins. During epidermal differentiation, these enzymatic reactions are essential for formation of the cornified envelope, which consists of cross-linked structural proteins. Two main transglutaminases isoforms, epidermal-type (TGase 3) and keratinocyte-type (TGase 1), are cooperatively involved in this process of differentiating keratinocytes. Information regarding their substrate preference is of great importance to determine the functional role of these isozymes and clarify their possible co-operative action. Thus far, we have identified highly reactive peptide sequences specifically recognized by TGases isozymes such as TGase 1, TGase 2 (tissue-type isozyme) and the blood coagulation isozyme, Factor XIII. In this study, several substrate peptide sequences for human TGase 3 were screened from a phage-displayed peptide library. The preferred substrate sequences for TGase 3 were selected and evaluated as fusion proteins with mutated glutathione S-transferase. From these studies, a highly reactive and isozyme-specific sequence (E51) was identified. Furthermore, this sequence was found to be a prominent substrate in the peptide form and was suitable for detection of in situ TGase 3 activity in the mouse epidermis. TGase 3 enzymatic activity was detected in the layers of differentiating keratinocytes and hair follicles with patterns distinct from those of TGase 1. Our findings provide new information on the specific distribution of TGase 3 and constitute a useful tool to clarify its functional role in the epidermis.
GuiTope: an application for mapping random-sequence peptides to protein sequences.
Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert
2012-01-03
Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo
2014-01-01
Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
Chen, Dana; Orenstein, Yaron; Golodnitsky, Rada; Pellach, Michal; Avrahami, Dorit; Wachtel, Chaim; Ovadia-Shochat, Avital; Shir-Shapira, Hila; Kedmi, Adi; Juven-Gershon, Tamar; Shamir, Ron; Gerber, Doron
2016-01-01
Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression. PMID:27628341
Hess, M A; Duncan, R F
1996-01-01
Preferential translation of Drosophila heat shock protein 70 (Hsp70) mRNA requires only the 5'-untranslated region (5'-UTR). The sequence of this region suggests that it has relatively little secondary structure, which may facilitate efficient protein synthesis initiation. To determine whether minimal 5'-UTR secondary structure is required for preferential translation during heat shock, the effect of introducing stem-loops into the Hsp70 mRNA 5'-UTR was measured. Stem-loops of -11 kcal/mol abolished translation during heat shock, but did not reduce translation in non-heat shocked cells. A -22 kcal/mol stem-loop was required to comparably inhibit translation during growth at normal temperatures. To investigate whether specific sequence elements are also required for efficient preferential translation, deletion and mutation analyses were conducted in a truncated Hsp70 5'-UTR containing only the cap-proximal and AUG-proximal segments. Linker-scanner mutations in the cap-proximal segment (+1 to +37) did not impair translation. Re-ordering the segments reduced mRNA translational efficiency by 50%. Deleting the AUG-proximal segment severely inhibited translation. A 5-extension of the full-length leader specifically impaired heat shock translation. These results indicate that heat shock reduces the capacity to unwind 5-UTR secondary structure, allowing only mRNAs with minimal 5'-UTR secondary structure to be efficiently translated. A function for specific sequences is also suggested. PMID:8710519
Detecting Coevolution in and among Protein Domains
Yeang, Chen-Hsiang; Haussler, David
2007-01-01
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264
Positive modulator of bone morphogenic protein-2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Paul O.; Pena, Louis A.; Lin, Xinhua
Compounds of the present invention of formula I and formula II are disclosed in the specification and wherein the compounds are modulators of Bone Morphogenic Protein activity. Compounds are synthetic peptides having a non-growth factor heparin binding region, a linker, and sequences that bind specifically to a receptor for Bone Morphogenic Protein. Uses of compounds of the present invention in the treatment of bone lesions, degenerative joint disease and to enhance bone formation are disclosed.
Positive modulator of bone morphogenic protein-2
Zamora, Paul O [Gaithersburg, MD; Pena, Louis A [Poquott, NY; Lin, Xinhua [Plainview, NY; Takahashi, Kazuyuki [Germantown, MD
2009-01-27
Compounds of the present invention of formula I and formula II are disclosed in the specification and wherein the compounds are modulators of Bone Morphogenic Protein activity. Compounds are synthetic peptides having a non-growth factor heparin binding region, a linker, and sequences that bind specifically to a receptor for Bone Morphogenic Protein. Uses of compounds of the present invention in the treatment of bone lesions, degenerative joint disease and to enhance bone formation are disclosed.
A bacterial Argonaute with noncanonical guide RNA specificity
Kaya, Emine; Doxzen, Kevin W.; Knoll, Kilian R.; Wilson, Ross C.; Strutt, Steven C.; Kranzusch, Philip J.; Doudna, Jennifer A.
2016-01-01
Eukaryotic Argonaute proteins induce gene silencing by small RNA-guided recognition and cleavage of mRNA targets. Although structural similarities between human and prokaryotic Argonautes are consistent with shared mechanistic properties, sequence and structure-based alignments suggested that Argonautes encoded within CRISPR-cas [clustered regularly interspaced short palindromic repeats (CRISPR)-associated] bacterial immunity operons have divergent activities. We show here that the CRISPR-associated Marinitoga piezophila Argonaute (MpAgo) protein cleaves single-stranded target sequences using 5′-hydroxylated guide RNAs rather than the 5′-phosphorylated guides used by all known Argonautes. The 2.0-Å resolution crystal structure of an MpAgo–RNA complex reveals a guide strand binding site comprising residues that block 5′ phosphate interactions. Using structure-based sequence alignment, we were able to identify other putative MpAgo-like proteins, all of which are encoded within CRISPR-cas loci. Taken together, our data suggest the evolution of an Argonaute subclass with noncanonical specificity for a 5′-hydroxylated guide. PMID:27035975
Fiore, Nicola; Fajardo, Thor V M; Prodan, Simona; Herranz, María Carmen; Aparicio, Frederic; Montealegre, Jaime; Elena, Santiago F; Pallás, Vicente; Sánchez-Navarro, Jesús
2008-01-01
Prunus necrotic ringspot virus (PNRSV) is distributed worldwide, but no molecular data have been previously reported from South American isolates. The nucleotide sequences corresponding to the movement (MP) and coat (CP) proteins of 23 isolates of PNRSV from Chile, Brazil, and Uruguay, and from different Prunus species, have been obtained. Phylogenetic analysis performed with full-length MP and CP sequences from all the PNRSV isolates confirmed the clustering of the isolates into the previously reported PV32-I, PV96-II and PE5-III phylogroups. No association was found between specific sequences and host, geographic origin or symptomatology. Comparative analysis showed that both MP and CP have phylogroup-specific amino acids and all of the motifs previously characterized for both proteins. The study of the distribution of synonymous and nonsynonymous changes along both open reading frames revealed that most amino acid sites are under the effect of negative purifying selection.
Sypabekova, Marzhan; Bekmurzayeva, Aliya; Wang, Ronghui; Li, Yanbin; Nogues, Claude; Kanayeva, Damira
2017-05-01
Rapid detection of Mycobacterium tuberculosis (Mtb), an etiological agent of tuberculosis (TB), is important for global control of this disease. Aptamers have emerged as a potential rival for antibodies in therapeutics, diagnostics and biosensing due to their inherent characteristics. The aim of the current study was to select and characterize single-stranded DNA aptamers against MPT64 protein, one of the predominant secreted proteins of Mtb pathogen. Aptamers specific to MPT64 protein were selected in vitro using systematic evolution of ligands through exponential enrichment (SELEX) method. The selection was started with a pool of ssDNA library with randomized 40-nucleotide region. A total of 10 cycles were performed and seventeen aptamers with unique sequences were identified by sequencing. Dot Blot analysis was performed to monitor the SELEX process and to conduct the preliminary tests on the affinity and specificity of aptamers. Enzyme linked oligonucleotide assay (ELONA) showed that most of the aptamers were specific to the MPT64 protein with a linear correlation of R 2 = 0.94 for the most selective. Using Surface Plasmon Resonance (SPR), dissociation equilibrium constant K D of 8.92 nM was obtained. Bioinformatics analysis of the most specific aptamers revealed the existence of a conserved as well as distinct sequences and possible binding site on MPT64. The specificity was determined by testing non-target ESAT-6 and CFP-10. Negligible cross-reactivity confirmed the high specificity of the selected aptamer. The selected aptamer was further tested on clinical sputum samples using ELONA and had sensitivity and specificity of 91.3% and 90%, respectively. Microscopy, culture positivity and nucleotide amplification methods were used as reference standards. The aptamers studied could be further used for the development of medical diagnostic tools and detection assays for Mtb. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences.
Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu
2016-12-22
Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org.
GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences
Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu
2016-01-01
Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org. PMID:28004786
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
1978-01-01
Three immunologically cross-reactive and non-cross-reactive streptococcal M proteins were analyzed by a chromatographic tryptic peptide mapping system. The results indicate that cross-reactions correlate with the extent of structural similarity among the M protein molecules analyzed. The data also reveal that free lysine is released by the action of trypsin from these three M proteins, suggesting a common lys-lys or arg-lys sequence. In addition, only one peptide has been found to be common within all three M types. This limited structural relatedness among the three M proteins examined indicates that sequence variation plays a major role in the immunological specificity of the M antigens. However, despite sequence variation, all M protein molecules have a common antiphagocytic activity. The fact that no common opsonic antibody has yet been found, even against limited M types, argues against this biological activity being solely the result of a common sequence. Based on these data, it is suggested that the antiphagocytic effect of M protein may be due to a conformationally created environment on the surface of the molecule which is selected by both immunological and biological pressure. PMID:355596
Capturing novel mouse genes encoding chromosomal and other nuclear proteins.
Tate, P; Lee, M; Tweedie, S; Skarnes, W C; Bickmore, W A
1998-09-01
The burgeoning wealth of gene sequences contrasts with our ignorance of gene function. One route to assigning function is by determining the sub-cellular location of proteins. We describe the identification of mouse genes encoding proteins that are confined to nuclear compartments by splicing endogeneous gene sequences to a promoterless betageo reporter, using a gene trap approach. Mouse ES (embryonic stem) cell lines were identified that express betageo fusions located within sub-nuclear compartments, including chromosomes, the nucleolus and foci containing splicing factors. The sequences of 11 trapped genes were ascertained, and characterisation of endogenous protein distribution in two cases confirmed the validity of the approach. Three novel proteins concentrated within distinct chromosomal domains were identified, one of which appears to be a serine/threonine kinase. The sequence of a gene whose product co-localises with splicesome components suggests that this protein may be an E3 ubiquitin-protein ligase. The majority of the other genes isolated represent novel genes. This approach is shown to be a powerful tool for identifying genes encoding novel proteins with specific sub-nuclear localisations and exposes our ignorance of the protein composition of the nucleus. Motifs in two of the isolated genes suggest new links between cellular regulatory mechanisms (ubiquitination and phosphorylation) and mRNA splicing and chromosome structure/function.
Regulation of the Human Endogenous Retrovirus K (HML-2) Transcriptome by the HIV-1 Tat Protein
Gonzalez-Hernandez, Marta J.; Cavalcoli, James D.; Sartor, Maureen A.; Contreras-Galindo, Rafael; Meng, Fan; Dai, Manhong; Dube, Derek; Saha, Anjan K.; Gitlin, Scott D.; Omenn, Gilbert S.; Kaplan, Mark H.
2014-01-01
ABSTRACT Approximately 8% of the human genome is made up of endogenous retroviral sequences. As the HIV-1 Tat protein activates the overall expression of the human endogenous retrovirus type K (HERV-K) (HML-2), we used next-generation sequencing to determine which of the 91 currently annotated HERV-K (HML-2) proviruses are regulated by Tat. Transcriptome sequencing of total RNA isolated from Tat- and vehicle-treated peripheral blood lymphocytes from a healthy donor showed that Tat significantly activates expression of 26 unique HERV-K (HML-2) proviruses, silences 12, and does not significantly alter the expression of the remaining proviruses. Quantitative reverse transcription-PCR validation of the sequencing data was performed on Tat-treated PBLs of seven donors using provirus-specific primers and corroborated the results with a substantial degree of quantitative similarity. IMPORTANCE The expression of HERV-K (HML-2) is tightly regulated but becomes markedly increased following infection with HIV-1, in part due to the HIV-1 Tat protein. The findings reported here demonstrate the complexity of the genome-wide regulation of HERV-K (HML-2) expression by Tat. This work also demonstrates that although HERV-K (HML-2) proviruses in the human genome are highly similar in terms of DNA sequence, modulation of the expression of specific proviruses in a given biological situation can be ascertained using next-generation sequencing and bioinformatics analysis. PMID:24872592
Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N
2001-08-15
This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-02-28
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-01-01
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
Zhang, Yun; Liu, Fang; Nie, Jinfang; Jiang, Fuyang; Zhou, Caibin; Yang, Jiani; Fan, Jinlong; Li, Jianping
2014-05-07
In this paper, we report for the first time an electrochemical biosensor for single-step, reagentless, and picomolar detection of a sequence-specific DNA-binding protein using a double-stranded, electrode-bound DNA probe terminally modified with a redox active label close to the electrode surface. This new methodology is based upon local repression of electrolyte diffusion associated with protein-DNA binding that leads to reduction of the electrochemical response of the label. In the proof-of-concept study, the resulting electrochemical biosensor was quantitatively sensitive to the concentrations of the TATA binding protein (TBP, a model analyte) ranging from 40 pM to 25.4 nM with an estimated detection limit of ∼10.6 pM (∼80 to 400-fold improvement on the detection limit over previous electrochemical analytical systems).
TANDEM: matching proteins with tandem mass spectra.
Craig, Robertson; Beavis, Ronald C
2004-06-12
Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.
Dang, Xibei; Young, Nicolas L
2014-05-01
Ultraviolet photodissociation (UVPD) is a compelling fragmentation technique with great potential to enhance proteomics generally and top-down MS specifically. In this issue, Cannon et al. (Proteomics 2014, 14, XXXX-XXXX) use UVPD to perform top-down MS on several sequence variants of green fluorescent protein and compare the results to CID, higher energy collision induced dissociation, and electron transfer dissociation. As compared to the other techniques UVPD produces a wider variety of fragment ion types that are relatively evenly distributed across the protein sequences. Overall, their results demonstrate enhanced sequence coverage and higher confidence in sequence assignment via UVPD MS. Based on these and other recent results UVPD is certain to become an increasingly widespread and valuable tool for top-down proteomics. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parish, D.; Benach, J; Liu, G
2008-01-01
The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe)more » hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.« less
Biologically active LIL proteins built with minimal chemical diversity
Heim, Erin N.; Marston, Jez L.; Federman, Ross S.; Edwards, Anne P. B.; Karabadzhak, Alexander G.; Petti, Lisa M.; Engelman, Donald M.; DiMaio, Daniel
2015-01-01
We have constructed 26-amino acid transmembrane proteins that specifically transform cells but consist of only two different amino acids. Most proteins are long polymers of amino acids with 20 or more chemically distinct side-chains. The artificial transmembrane proteins reported here are the simplest known proteins with specific biological activity, consisting solely of an initiating methionine followed by specific sequences of leucines and isoleucines, two hydrophobic amino acids that differ only by the position of a methyl group. We designate these proteins containing leucine (L) and isoleucine (I) as LIL proteins. These proteins functionally interact with the transmembrane domain of the platelet-derived growth factor β-receptor and specifically activate the receptor to transform cells. Complete mutagenesis of these proteins identified individual amino acids required for activity, and a protein consisting solely of leucines, except for a single isoleucine at a particular position, transformed cells. These surprisingly simple proteins define the minimal chemical diversity sufficient to construct proteins with specific biological activity and change our view of what can constitute an active protein in a cellular context. PMID:26261320
Predictive Bcl-2 Family Binding Models Rooted in Experiment or Structure
DeBartolo, Joe; Dutta, Sanjib; Reich, Lothar; Keating, Amy E.
2013-01-01
Proteins of the Bcl-2 family either enhance or suppress programmed cell death and are centrally involved in cancer development and resistance to chemotherapy. BH3 (Bcl-2 homology 3)-only Bcl-2 proteins promote cell death by docking an α-helix into a hydrophobic groove on the surface of one or more of five pro-survival Bcl-2 receptor proteins. There is high structural homology within the pro-death and pro-survival families, yet a high degree of interaction specificity is nevertheless encoded, posing an interesting and important molecular recognition problem. Understanding protein features that dictate Bcl-2 interaction specificity is critical for designing peptide-based cancer therapeutics and diagnostics. In this study, we present peptide SPOT arrays and deep sequencing data from yeast display screening experiments that significantly expand the BH3 sequence space that has been experimentally tested for interaction with five human anti-apoptotic receptors. These data provide rich information about the determinants of Bcl-2 family specificity. To interpret and use the information, we constructed two simple data-based models that can predict affinity and specificity when evaluated on independent data sets within a limited sequence space. We also constructed a novel structure-based statistical potential, called STATIUM, which is remarkably good at predicting Bcl-2 affinity and specificity, especially considering it is not trained on experimental data. We compare the performance of our three models to each other and to alternative structure-based methods and discuss how such tools can guide prediction and design of new Bcl-2 family complexes. PMID:22617328
Identification of tissue-specific targeting peptide
NASA Astrophysics Data System (ADS)
Jung, Eunkyoung; Lee, Nam Kyung; Kang, Sang-Kee; Choi, Seung-Hoon; Kim, Daejin; Park, Kisoo; Choi, Kihang; Choi, Yun-Jaie; Jung, Dong Hyun
2012-11-01
Using phage display technique, we identified tissue-targeting peptide sets that recognize specific tissues (bone-marrow dendritic cell, kidney, liver, lung, spleen and visceral adipose tissue). In order to rapidly evaluate tissue-specific targeting peptides, we performed machine learning studies for predicting the tissue-specific targeting activity of peptides on the basis of peptide sequence information using four machine learning models and isolated the groups of peptides capable of mediating selective targeting to specific tissues. As a representative liver-specific targeting sequence, the peptide "DKNLQLH" was selected by the sequence similarity analysis. This peptide has a high degree of homology with protein ligands which can interact with corresponding membrane counterparts. We anticipate that our models will be applicable to the prediction of tissue-specific targeting peptides which can recognize the endothelial markers of target tissues.
Richard, François D; Kajava, Andrey V
2014-06-01
The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.
2018-01-01
New, as yet undiscovered aptamers for Protein A were identified by applying next generation sequencing (NGS) to a previously selected aptamer pool. This pool was obtained in a classical SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiment using the FluMag-SELEX procedure followed by cloning and Sanger sequencing. PA#2/8 was identified as the only Protein A-binding aptamer from the Sanger sequence pool, and was shown to be able to bind intact cells of Staphylococcus aureus. In this study, we show the extension of the SELEX results by re-sequencing of the same aptamer pool using a medium throughput NGS approach and data analysis. Both data pools were compared. They confirm the selection of a highly complex and heterogeneous oligonucleotide pool and show consistently a high content of orphans as well as a similar relative frequency of certain sequence groups. But in contrast to the Sanger data pool, the NGS pool was clearly dominated by one sequence group containing the known Protein A-binding aptamer PA#2/8 as the most frequent sequence in this group. In addition, we found two new sequence groups in the NGS pool represented by PA-C10 and PA-C8, respectively, which also have high specificity for Protein A. Comparative affinity studies reveal differences between the aptamers and confirm that PA#2/8 remains the most potent sequence within the selected aptamer pool reaching affinities in the low nanomolar range of KD = 20 ± 1 nM. PMID:29495282
Stoltenburg, Regina; Strehlitz, Beate
2018-02-24
New, as yet undiscovered aptamers for Protein A were identified by applying next generation sequencing (NGS) to a previously selected aptamer pool. This pool was obtained in a classical SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiment using the FluMag-SELEX procedure followed by cloning and Sanger sequencing. PA#2/8 was identified as the only Protein A-binding aptamer from the Sanger sequence pool, and was shown to be able to bind intact cells of Staphylococcus aureus . In this study, we show the extension of the SELEX results by re-sequencing of the same aptamer pool using a medium throughput NGS approach and data analysis. Both data pools were compared. They confirm the selection of a highly complex and heterogeneous oligonucleotide pool and show consistently a high content of orphans as well as a similar relative frequency of certain sequence groups. But in contrast to the Sanger data pool, the NGS pool was clearly dominated by one sequence group containing the known Protein A-binding aptamer PA#2/8 as the most frequent sequence in this group. In addition, we found two new sequence groups in the NGS pool represented by PA-C10 and PA-C8, respectively, which also have high specificity for Protein A. Comparative affinity studies reveal differences between the aptamers and confirm that PA#2/8 remains the most potent sequence within the selected aptamer pool reaching affinities in the low nanomolar range of K D = 20 ± 1 nM.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
The pig X and Y Chromosomes: structure, sequence, and evolution
Skinner, Benjamin M.; Sargent, Carole A.; Churcher, Carol; Hunt, Toby; Herrero, Javier; Loveland, Jane E.; Dunn, Matt; Louzada, Sandra; Fu, Beiyuan; Chow, William; Gilbert, James; Austin-Guest, Siobhan; Beal, Kathryn; Carvalho-Silva, Denise; Cheng, William; Gordon, Daria; Grafham, Darren; Hardy, Matt; Harley, Jo; Hauser, Heidi; Howden, Philip; Howe, Kerstin; Lachani, Kim; Ellis, Peter J.I.; Kelly, Daniel; Kerry, Giselle; Kerwin, James; Ng, Bee Ling; Threadgold, Glen; Wileman, Thomas; Wood, Jonathan M.D.; Yang, Fengtang; Harrow, Jen; Affara, Nabeel A.; Tyler-Smith, Chris
2016-01-01
We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which are protein coding. Gene order closely matches that found in primates (including humans) and carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X Chromosome were absent from the pig, and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y-specific Chromosome sequence generated here comprises 30 megabases (Mb). A 15-Mb subset of this sequence was assembled, revealing two clusters of male-specific low copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. Many of the ancestral X-related genes previously reported in at least one mammalian Y Chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes—both single copy and amplified—on the pig Y Chromosome, to compare the pig X and Y Chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y Chromosome evolution. PMID:26560630
Environmental Mycobiome Modifiers of Inflammation and Fibrosis in Systemic Sclerosis
2016-09-01
TUBB), and ribosomal proteins), while others are considered specific to SSc despite trace level detection in controls. For ex- ample, multiple SSc...Strong re- activity was seen against all five proteins in SSc with only trace levels detected in controls (Fig. 3a), indicating widespread immune...sequences in SSc RNA-seq data was used to detect microbial sequences in human tissues in an unbiased, quantitative manner. Our studies suggest that
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H
2010-01-01
The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. Themore » second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.« less
Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C
2010-12-01
The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.
Labudde, Dirk
2015-01-01
The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Grunert, Steffen; Labudde, Dirk
2015-01-01
The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
Peyretaillade, E; Broussolle, V; Peyret, P; Méténier, G; Gouy, M; Vivarès, C P
1998-06-01
An intronless gene encoding a protein of 592 amino acid residues with similarity to 70-kDa heat shock proteins (HSP70s) has been cloned and sequenced from the amitochondrial protist Encephalitozoon cuniculi (phylum Microsporidia). Southern blot analyses show the presence of a single gene copy located on chromosome XI. The encoded protein exhibits an N-terminal hydrophobic leader sequence and two motifs shared by proteobacterial and mitochondrially expressed HSP70 homologs. Phylogenetic analysis using maximum likelihood and evolutionary distances place the E. cuniculi sequence in the cluster of mitochondrially expressed HSP70s, with a higher evolutionary rate than those of homologous sequences. Similar results were obtained after cloning a fragment of the homologous gene in the closely related species E. hellem. The presence of a nuclear targeting signal-like sequence supports a role of the Encephalitozoon HSP70 as a molecular chaperone of nuclear proteins. No evidence for cytosolic or endoplasmic reticulum forms of HSP70 was obtained through PCR amplification. These data suggest that Encephalitozoon species have evolved from an ancestor bearing mitochondria, which is in disagreement with the postulated presymbiotic origin of Microsporidia. The specific role and intracellular localization of the mitochondrial HSP70-like protein remain to be elucidated.
ssrA (tmRNA) Plays a Role in Salmonella enterica Serovar Typhimurium Pathogenesis
Julio, Steven M.; Heithoff, Douglas M.; Mahan, Michael J.
2000-01-01
Escherichia coli ssrA encodes a small stable RNA molecule, tmRNA, that has many diverse functions, including tagging abnormal proteins for degradation, supporting phage growth, and modulating the activity of DNA binding proteins. Here we show that ssrA plays a role in Salmonella enterica serovar Typhimurium pathogenesis and in the expression of several genes known to be induced during infection. Moreover, the phage-like attachment site, attL, encoded within ssrA, serves as the site of integration of a region of Salmonella-specific sequence; adjacent to the 5′ end of ssrA is another region of Salmonella-specific sequence with extensive homology to predicted proteins encoded within the unlinked Salmonella pathogenicity island SPI4. S. enterica serovar Typhimurium ssrA mutants fail to support the growth of phage P22 and are delayed in their ability to form viable phage particles following induction of a phage P22 lysogen. These data indicate that ssrA plays a role in the pathogenesis of Salmonella, serves as an attachment site for Salmonella-specific sequences, and is required for the growth of phage P22. PMID:10692360
Learning cellular sorting pathways using protein interactions and sequence motifs.
Lin, Tien-Ho; Bar-Joseph, Ziv; Murphy, Robert F
2011-11-01
Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.
Gileva, Irina P; Nepomnyashchikh, Tatiana S; Antonets, Denis V; Lebedev, Leonid R; Kochneva, Galina V; Grazhdantseva, Antonina V; Shchelkunov, Sergei N
2006-11-01
Tumor necrosis factor (TNF), a potent proinflammatory and antiviral cytokine, is a critical extracellular immune regulator targeted by poxviruses through the activity of virus-encoded family of TNF-binding proteins (CrmB, CrmC, CrmD, and CrmE). The only TNF-binding protein from variola virus (VARV), the causative agent of smallpox, infecting exclusively humans, is CrmB. Here we have aligned the amino acid sequences of CrmB proteins from 10 VARV, 14 cowpox virus (CPXV), and 22 monkeypox virus (MPXV) strains. Sequence analyses demonstrated a high homology of these proteins. The regions homologous to cd00185 domain of the TNF receptor family, determining the specificity of ligand-receptor binding, were found in the sequences of CrmB proteins. In addition, a comparative analysis of the C-terminal SECRET domain sequences of CrmB proteins was performed. The differences in the amino acid sequences of these domains characteristic of each particular orthopoxvirus species were detected. It was assumed that the species-specific distinctions between the CrmB proteins might underlie the differences in these physicochemical and biological properties. The individual recombinant proteins VARV-CrmB, MPXV-CrmB, and CPXV-CrmB were synthesized in a baculovirus expression system in insect cells and isolated. Purified VARV-CrmB was detectable as a dimer with a molecular weight of 90 kDa, while MPXV- and CPXV-CrmBs, as monomers when fractioned by non-reducing SDS-PAGE. The CrmB proteins of VARV, MPXV, and CPXV differed in the efficiencies of inhibition of the cytotoxic effects of human, mouse, or rabbit TNFs in L929 mouse fibroblast cell line. Testing of CrmBs in the experimental model of LPS-induced shock using SPF BALB/c mice detected a pronounced protective effect of VARV-CrmB. Thus, our data demonstrated the difference in anti-TNF activities of VARV-, MPXV-, and CPXV-CrmBs and efficiency of VARV-CrmB rather than CPXV- or MPXV-CrmBs against LPS-induced mortality in mice.
Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa
2006-01-01
Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
An Efficient Site-Specific Method for Irreversible Covalent Labeling of Proteins with a Fluorophore.
Liu, Jiaquan; Hanne, Jeungphill; Britton, Brooke M; Shoffner, Matthew; Albers, Aaron E; Bennett, Jared; Zatezalo, Rachel; Barfield, Robyn; Rabuka, David; Lee, Jong-Bong; Fishel, Richard
2015-11-19
Fluorophore labeling of proteins while preserving native functions is essential for bulk Förster resonance energy transfer (FRET) interaction and single molecule imaging analysis. Here we describe a versatile, efficient, specific, irreversible, gentle and low-cost method for labeling proteins with fluorophores that appears substantially more robust than a similar but chemically distinct procedure. The method employs the controlled enzymatic conversion of a central Cys to a reactive formylglycine (fGly) aldehyde within a six amino acid Formylglycine Generating Enzyme (FGE) recognition sequence in vitro. The fluorophore is then irreversibly linked to the fGly residue using a Hydrazinyl-Iso-Pictet-Spengler (HIPS) ligation reaction. We demonstrate the robust large-scale fluorophore labeling and purification of E.coli (Ec) mismatch repair (MMR) components. Fluorophore labeling did not alter the native functions of these MMR proteins in vitro or in singulo. Because the FGE recognition sequence is easily portable, FGE-HIPS fluorophore-labeling may be easily extended to other proteins.
The NPC2 protein: A novel dog allergen.
Khurana, Taruna; Newman-Lindsay, Shoshana; Young, Philip R; Slater, Jay E
2016-05-01
Dogs are an important source of indoor allergens that cause rhinoconjunctivitis, urticaria, and asthma in sensitized individuals. Can f 1 is reported as a major dog allergen, but other allergens have also been identified. Identification of immunologically important allergens is important for both the diagnosis and treatment of dog allergy. To identify and characterize the canine NPC2 protein, a novel dog allergen. We screened commercial and laboratory-generated aqueous dog extracts by 2-dimensional polyacrylamide gel electrophoresis with IgE immunoblotting using human serum samples from 71 dog-allergic individuals. A target of interest was excised from the gel and sequenced. Canine NPC2 sequence was generated, and recombinant proteins expressed in yeast and bacteria were used to determine allergenicity. An IgE enzyme-linked immunosorbent assay was used for screening 71 dog-positive and 30 dog-negative serum samples. A 16-kDa protein (pK = 8.5) in dog allergen extracts was recognized by specific IgE. The protein was identified by sequencing as a CE1 protein or NPC2 protein. Human IgE bound to recombinant protein was expressed in both yeast and bacteria. Ten (14%) of 71 individuals had specific IgE to NPC2 protein from bacteria, and 12 (17%) had IgE to NPC2 protein from yeast. Binding of pooled dog-allergic serum IgE to the dust mite protein Der p 2 was partially inhibited by recombinant NPC2 protein. NPC2 protein, a member of the MD-2-related lipid recognition family, is identified as a dog allergen (Can f 7), with an apparent seroprevalence of 10% to 20%. Published by Elsevier Inc.
Yoga, Yano M. K.; Traore, Daouda A. K.; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R.; Barker, Andrew; Leedman, Peter J.; Wilce, Jacqueline A.; Wilce, Matthew C. J.
2012-01-01
Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5′-CCCTCCCT-3′ DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5′-ACCCCA-3′ DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad. PMID:22344691
Yoga, Yano M K; Traore, Daouda A K; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R; Barker, Andrew; Leedman, Peter J; Wilce, Jacqueline A; Wilce, Matthew C J
2012-06-01
Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5'-CCCTCCCT-3' DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5'-ACCCCA-3' DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad.
Hayat, Maqsood; Khan, Asifullah
2011-02-21
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.
Sittka, Alexandra; Sharma, Cynthia M; Rolle, Katarzyna; Vogel, Jörg
2009-01-01
The bacterial Sm-like protein, Hfq, is a key factor for the stability and function of small non-coding RNAs (sRNAs) in Escherichia coli. Homologues of this protein have been predicted in many distantly related organisms yet their functional conservation as sRNA-binding proteins has not entirely been clear. To address this, we expressed in Salmonella the Hfq proteins of two eubacteria (Neisseria meningitides, Aquifex aeolicus) and an archaeon (Methanocaldococcus jannaschii), and analyzed the associated RNA by deep sequencing. This in vivo approach identified endogenous Salmonella sRNAs as a major target of the foreign Hfq proteins. New Salmonella sRNA species were also identified, and some of these accumulated specifically in the presence of a foreign Hfq protein. In addition, we observed specific RNA processing defects, e.g., suppression of precursor processing of SraH sRNA by Methanocaldococcus Hfq, or aberrant accumulation of extracytoplasmic target mRNAs of the Salmonella GcvB, MicA or RybB sRNAs. Taken together, our study provides evidence of a conserved inherent sRNA-binding property of Hfq, which may facilitate the lateral transmission of regulatory sRNAs among distantly related species. It also suggests that the expression of heterologous RNA-binding proteins combined with deep sequencing analysis of RNA ligands can be used as a molecular tool to dissect individual steps of RNA metabolism in vivo.
Heyduk, E; Baichoo, N; Heyduk, T
2001-11-30
The alpha-subunit of Escherichia coli RNA polymerase plays an important role in the activity of many promoters by providing a direct protein-DNA contact with a specific sequence (UP element) located upstream of the core promoter sequence. To obtain insight into the nature of thermodynamic forces involved in the formation of this protein-DNA contact, the binding of the alpha-subunit of E. coli RNA polymerase to a fluorochrome-labeled DNA fragment containing the rrnB P1 promoter UP element sequence was quantitatively studied using fluorescence polarization. The alpha dimer and DNA formed a 1:1 complex in solution. Complex formation at 25 degrees C was enthalpy-driven, the binding was accompanied by a net release of 1-2 ions, and no significant specific ion effects were observed. The van't Hoff plot of temperature dependence of binding was linear suggesting that the heat capacity change (Deltac(p)) was close to zero. Protein footprinting with hydroxyradicals showed that the protein did not change its conformation upon protein-DNA contact formation. No conformational changes in the DNA molecule were detected by CD spectroscopy upon protein-DNA complex formation. The thermodynamic characteristics of the binding together with the lack of significant conformational changes in the protein and in the DNA suggested that the alpha-subunit formed a rigid body-like contact with the DNA in which a tight complementary recognition interface between alpha-subunit and DNA was not formed.
Evolution of the arginase fold and functional diversity
Dowling, Daniel P.; Costanzo, Luigi Di; Gennadios, Heather A.; Christianson, David W.
2009-01-01
The large number of protein structures deposited in the Protein Data Bank allows for the identification of novel structural superfamilies based on conservation of fold in addition to conservation of amino acid sequence. Since sequence diverges more rapidly than fold in protein evolution, proteins with little or no significant sequence identity are occasionally observed to adopt similar folds, thereby reflecting unanticipated evolutionary relationships. Here, we review the unique α/β fold first observed in the manganese metalloenzyme rat liver arginase, consisting of a parallel 8 stranded β-sheet surrounded by several helices, and its evolutionary relationship with the zinc-requiring and/or iron-requiring histone deacetylases and acetylpolyamine amidohydrolases. Structural comparisons reveal key features of the core α/β fold that contribute to the divergent metal ion specificity and stoichiometry required for the chemical and biological functions of these enzymes. PMID:18360740
Bozzoni, I; Beccari, E; Luo, Z X; Amaldi, F
1981-01-01
Poly-A+ mRNA from Xenopus laevis oocytes, partially enriched for r-protein coding capacity has been used as starting material for preparing a cDNA bank in plasmid pBR322. The clones containing sequences specific for r-proteins have been selected by translation of the complementary mRNAs. Clones for six different r-proteins have been identified and utilized as probes for studying their genomic organization. Two gene copies per haploid genome were found for r-proteins L1, L14, S19, and four-five for protein S1, S8 and L32. Moreover a population polymorphism has been observed for the genomic regions containing sequences for r-protein S1, S8 and L14. Images PMID:6112733
O'Brien, Frances G.; Yui Eto, Karina; Murphy, Riley J. T.; Fairhurst, Heather M.; Coombs, Geoffrey W.; Grubb, Warren B.; Ramsay, Joshua P.
2015-01-01
Staphylococcus aureus is a common cause of hospital, community and livestock-associated infections and is increasingly resistant to multiple antimicrobials. A significant proportion of antimicrobial-resistance genes are plasmid-borne, but only a minority of S. aureus plasmids encode proteins required for conjugative transfer or Mob relaxase proteins required for mobilisation. The pWBG749 family of S. aureus conjugative plasmids can facilitate the horizontal transfer of diverse antimicrobial-resistance plasmids that lack Mob genes. Here we reveal that these mobilisable plasmids carry copies of the pWBG749 origin-of-transfer (oriT) sequence and that these oriT sequences facilitate mobilisation by pWBG749. Sequences resembling the pWBG749 oriT were identified on half of all sequenced S. aureus plasmids, including the most prevalent large antimicrobial-resistance/virulence-gene plasmids, pIB485, pMW2 and pUSA300HOUMR. oriT sequences formed five subfamilies with distinct inverted-repeat-2 (IR2) sequences. pWBG749-family plasmids encoding each IR2 were identified and pWBG749 mobilisation was found to be specific for plasmids carrying matching IR2 sequences. Specificity of mobilisation was conferred by a putative ribbon-helix-helix-protein gene smpO. Several plasmids carried 2–3 oriT variants and pWBG749-mediated recombination occurred between distinct oriT sites during mobilisation. These observations suggest this relaxase-in trans mechanism of mobilisation by pWBG749-family plasmids is a common mechanism of plasmid dissemination in S. aureus. PMID:26243776
Shared epitopes of glycoprotein A and protein 4.1 defined by antibody NaM10-3C10.
Rasamoelisolo, M; Czerwinski, M; Willem, C; Blanchard, D
1998-06-01
We have produced the murine monoclonal antibody (MAb) NaM70-3C10 (IgM) from splenocytes of mice immunized with human red blood cells (RBCs). The MAb agglutinated untreated as well as trypsin, chymotrypsin, neuraminidase, or ficin-treated RBCs from controls. In contrast, control RBCs treated with papaine or bromelaine were not agglutinated. On immunoblots, the MAb bound to glycophorin A (GPA) and to a 80 kDa protein identified as protein 4.1. Analysis by agglutination of variant RBCs carrying hybrid glycophorins made of the N-terminus (amino acids 1-58) of GPA and of the C-terminus (amino acids 27-72) of glycophorin B (GPB) and competition-inhibition test using purified GPA and a synthetic peptide corresponding to the amino acid sequence 48-58 of GPA demonstrated that the epitope is located within residues 48-58 of GPA. Epitope analysis with immobilized peptides showed that the MAb recognizes the sequence 53Pro-Pro-Glu-Glu-GIu58 of GPA. A homologous sequence is also present within amino acids 395 to 405 of protein 4.1. Finally, the MAb bound to 16 kDa chymotryptic peptide of protein 4.1, which carries the above amino acid sequence. In conclusion, it may be assumed that NaM70-3C10 specifically recognizes a common epitope on the extracellular domain of GPA and on the intracellular protein 4.1; this specificity explains the persistence of the 80 kDa band on blots when RBCs are treated with papain.
Application of the MIDAS approach for analysis of lysine acetylation sites.
Evans, Caroline A; Griffiths, John R; Unwin, Richard D; Whetton, Anthony D; Corfe, Bernard M
2013-01-01
Multiple Reaction Monitoring Initiated Detection and Sequencing (MIDAS™) is a mass spectrometry-based technique for the detection and characterization of specific post-translational modifications (Unwin et al. 4:1134-1144, 2005), for example acetylated lysine residues (Griffiths et al. 18:1423-1428, 2007). The MIDAS™ technique has application for discovery and analysis of acetylation sites. It is a hypothesis-driven approach that requires a priori knowledge of the primary sequence of the target protein and a proteolytic digest of this protein. MIDAS essentially performs a targeted search for the presence of modified, for example acetylated, peptides. The detection is based on the combination of the predicted molecular weight (measured as mass-charge ratio) of the acetylated proteolytic peptide and a diagnostic fragment (product ion of m/z 126.1), which is generated by specific fragmentation of acetylated peptides during collision induced dissociation performed in tandem mass spectrometry (MS) analysis. Sequence information is subsequently obtained which enables acetylation site assignment. The technique of MIDAS was later trademarked by ABSciex for targeted protein analysis where an MRM scan is combined with full MS/MS product ion scan to enable sequence confirmation.
Ai, X; Butts, B; Vora, K; Li, W; Tache-Talmadge, C; Fridman, A; Mehmet, H
2011-01-01
Apoptosis research has been significantly aided by the generation of antibodies against caspase-cleaved peptide neo-epitopes. However, most of these antibodies recognize the N-terminal fragment and are specific for the protein in question. The aim of this project was to create antibodies, which could identify caspase-cleaved proteins without a priori knowledge of the cleavage sites or even the proteins themselves. We hypothesized that many caspase-cleavage products might have a common antigenic shape, given that they must all fit into the same active site of caspases. Rabbits were immunized with the eight most prevalent exposed C-terminal tetrapeptide sequences following caspase cleavage. After purification of the antibodies we demonstrated (1) their specificity for exposed C-terminal (but not internal) peptides, (2) their ability to detect known caspase-cleaved proteins from apoptotic cell lysates or supernatants from apoptotic cell culture and (3) their ability to detect a caspase-cleaved protein whose tetrapeptide sequence differs from the eight tetrapeptides used to generate the antibodies. These antibodies have the potential to identify novel neo-epitopes produced by caspase cleavage and so can be used to identify pathway-specific caspase cleavage events in a specific cell type. Additionally this methodology may be applied to generate antibodies against products of other proteases, which have a well-defined and non-promiscuous cleavage activity. PMID:21881607
Toward a General Approach for RNA-Templated Hierarchical Assembly of Split-Proteins
Furman, Jennifer L.; Badran, Ahmed H.; Ajulo, Oluyomi; Porter, Jason R.; Stains, Cliff I.; Segal, David J.; Ghosh, Indraneel
2010-01-01
The ability to conditionally turn on a signal or induce a function in the presence of a user-defined RNA target has potential applications in medicine and synthetic biology. Although sequence-specific pumilio repeat proteins can target a limited set of ssRNA sequences, there are no general methods for targeting ssRNA with designed proteins. As a first step toward RNA recognition, we utilized the RNA binding domain of argonaute, implicated in RNA interference, for specifically targeting generic 2-nucleotide, 3' overhangs of any dsRNA. We tested the reassembly of a split-luciferase enzyme guided by argonaute-mediated recognition of newly generated nucleotide overhangs when ssRNA is targeted by a designed complementary guide sequence. This approach was successful when argonaute was utilized in conjunction with a pumilio repeat and expanded the scope of potential ssRNA targets. However, targeting any desired ssRNA remained elusive as two argonaute domains provided minimal reassembled split-luciferase. We next designed and tested a second hierarchical assembly, wherein ssDNA guides are appended to DNA hairpins that serve as a scaffold for high affinity zinc fingers attached to split-luciferase. In the presence of a ssRNA target containing adjacent sequences complementary to the guides, the hairpins are brought into proximity, allowing for zinc finger binding and concomitant reassembly of the fragmented luciferase. The scope of this new approach was validated by specifically targeting RNA encoding VEGF, hDM2, and HER2. These approaches provide potentially general design paradigms for the conditional reassembly of fragmented proteins in the presence of any desired ssRNA target. PMID:20681585
Hou, Peili; Zhao, Guimin; He, Chengqiang; Wang, Hongmei; He, Hongbin
2018-01-04
The bovine ephemeral fever virus (BEFV) glycoprotein neutralization site 1 (also referred as G 1 protein), is a critical protein responsible for virus infectivity and eliciting immune-protection, however, binding peptides of BEFV G 1 protein are still unclear. Thus, the aim of the present study was to screen specific polypeptides, which bind BEFV G 1 protein with high-affinity and inhibit BEFV replication. The purified BEFV G 1 was coated and then reacted with the M13-based Ph.D.-7 phage random display library. The peptides for target binding were automated sequenced after four rounds of enrichment biopanning. The amino acid sequences of polypeptide displayed on positive clones were deduced and the affinity of positive polypeptides with BEFV G 1 was assayed by ELISA. Then the roles of specific G 1 -binding peptides in the context of BEFV infection were analyzed. The results showed that 27 specific peptide ligands displaying 11 different amino acid sequences were obtained, and the T18 and T25 clone had a higher affinity to G 1 protein than the other clones. Then their antiviral roles of two phage clones (T25 and T18) showed that both phage polypeptide T25 and T18 exerted inhibition on BEFV replication compared to control group. Moreover, synthetic peptide based on T18 (HSIRYDF) and T25 (YSLRSDY) alone or combined use on BEFV replication showed that the synthetic peptides could effectively inhibit the formation of cytopathic plaque and significantly inhibit BEFV RNA replication in a dose-dependent manner. Two antiviral peptide ligands binding to bovine ephemeral fever virus G 1 protein from phage display peptide library were identified, which may provide a potential research tool for diagnostic reagents and novel antiviral agents.
Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L
2011-06-02
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas
2017-10-28
Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.
Are plant formins integral membrane proteins?
Cvrcková, F
2000-01-01
The formin family of proteins has been implicated in signaling pathways of cellular morphogenesis in both animals and fungi; in the latter case, at least, they participate in communication between the actin cytoskeleton and the cell surface. Nevertheless, they appear to be cytoplasmic or nuclear proteins, and it is not clear whether they communicate with the plasma membrane, and if so, how. Because nothing is known about formin function in plants, I performed a systematic search for putative Arabidopsis thaliana formin homologs. I found eight putative formin-coding genes in the publicly available part of the Arabidopsis genome sequence and analyzed their predicted protein sequences. Surprisingly, some of them lack parts of the conserved formin-homology 2 (FH2) domain and the majority of them seem to have signal sequences and putative transmembrane segments that are not found in yeast or animals formins. Plant formins define a distinct subfamily. The presence in most Arabidopsis formins of sequence motifs typical or transmembrane proteins suggests a mechanism of membrane attachment that may be specific to plant formins, and indicates an unexpected evolutionary flexibility of the conserved formin domain.
Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins
NASA Astrophysics Data System (ADS)
Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil
2014-03-01
Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.
Comparative analysis of the XopD T3S effector family in plant pathogenic bacteria
Kim, Jung-Gun; Taylor, Kyle W.; Mudgett, Mary Beth
2011-01-01
SUMMARY XopD is a type III effector protein that is required for Xanthomonas campestris pathovar vesicatoria (Xcv) growth in tomato. It is a modular protein consisting of an N-terminal DNA-binding domain, two EAR transcriptional repressor motifs, and a C-terminal SUMO protease. In tomato, XopD functions as a transcriptional repressor, resulting in the suppression of defense responses at late stages of infection. A survey of available genome sequences for phytopathogenic bacteria revealed that XopD homologs are limited to species within three Genera of Proteobacteria – Xanthomonas, Acidovorax, and Pseudomonas. While the EAR motif(s) and SUMO protease domain are conserved in all the XopD-like proteins, variation exists in the length and sequence identity of the N-terminal domains. Comparative analysis of the DNA sequences surrounding xopD and xopD-like genes led to revised annotation of the xopD gene. Edman degradation sequence analysis and functional complementation studies confirmed that the xopD gene from Xcv encodes a 760 amino acid protein with a longer N-terminal domain than previously predicted. None of the XopD-like proteins studied complemented Xcv ΔxopD mutant phenotypes in tomato leaves suggesting that the N-terminus of XopD defines functional specificity. Xcv ΔxopD strains expressing chimeric fusion proteins containing the N-terminus of XopD fused to the EAR motif(s) and SUMO protease domain of the XopD-like protein from Xanthomonas campestris pathovar campestris strain B100 were fully virulent in tomato demonstrating that the N-terminus of XopD controls specificity in tomato. PMID:21726373
Rahman, K Shamsur; Chowdhury, Erfan U; Poudel, Anil; Ruettger, Anke; Sachse, Konrad; Kaltenboeck, Bernhard
2015-05-01
Urgently needed species-specific enzyme-linked immunosorbent assays (ELISAs) for the detection of antibodies against Chlamydia spp. have been elusive due to high cross-reactivity of chlamydial antigens. To identify Chlamydia species-specific B cell epitopes for such assays, we ranked the potential epitopes of immunodominant chlamydial proteins that are polymorphic among all Chlamydia species. High-scoring peptides were synthesized with N-terminal biotin, followed by a serine-glycine-serine-glycine spacer, immobilized onto streptavidin-coated microtiter plates, and tested with mono-specific mouse hyperimmune sera against each Chlamydia species in chemiluminescent ELISAs. For each of nine Chlamydia species, three to nine dominant polymorphic B cell epitope regions were identified on OmpA, CT618, PmpD, IncA, CT529, CT442, IncG, Omp2, TarP, and IncE proteins. Peptides corresponding to 16- to 40-amino-acid species-specific sequences of these epitopes reacted highly and with absolute specificity with homologous, but not heterologous, Chlamydia monospecies-specific sera. Host-independent reactivity of such epitopes was confirmed by testing of six C. pecorum-specific peptides from five proteins with C. pecorum-reactive sera from cattle, the natural host of C. pecorum. The probability of cross-reactivity of peptide antigens from closely related chlamydial species or strains correlated with percent sequence identity and declined to zero at <50% sequence identity. Thus, phylograms of B cell epitope regions predict the specificity of peptide antigens for rational use in the genus-, species-, or serovar-specific molecular serology of Chlamydia spp. We anticipate that these peptide antigens will improve chlamydial serology by providing easily accessible assays to nonspecialist laboratories. Our approach also lends itself to the identification of relevant epitopes of other microbial pathogens. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Defining Species-Specific Immunodominant B Cell Epitopes for Molecular Serology of Chlamydia Species
Rahman, K. Shamsur; Chowdhury, Erfan U.; Poudel, Anil; Ruettger, Anke; Sachse, Konrad
2015-01-01
Urgently needed species-specific enzyme-linked immunosorbent assays (ELISAs) for the detection of antibodies against Chlamydia spp. have been elusive due to high cross-reactivity of chlamydial antigens. To identify Chlamydia species-specific B cell epitopes for such assays, we ranked the potential epitopes of immunodominant chlamydial proteins that are polymorphic among all Chlamydia species. High-scoring peptides were synthesized with N-terminal biotin, followed by a serine-glycine-serine-glycine spacer, immobilized onto streptavidin-coated microtiter plates, and tested with mono-specific mouse hyperimmune sera against each Chlamydia species in chemiluminescent ELISAs. For each of nine Chlamydia species, three to nine dominant polymorphic B cell epitope regions were identified on OmpA, CT618, PmpD, IncA, CT529, CT442, IncG, Omp2, TarP, and IncE proteins. Peptides corresponding to 16- to 40-amino-acid species-specific sequences of these epitopes reacted highly and with absolute specificity with homologous, but not heterologous, Chlamydia monospecies-specific sera. Host-independent reactivity of such epitopes was confirmed by testing of six C. pecorum-specific peptides from five proteins with C. pecorum-reactive sera from cattle, the natural host of C. pecorum. The probability of cross-reactivity of peptide antigens from closely related chlamydial species or strains correlated with percent sequence identity and declined to zero at <50% sequence identity. Thus, phylograms of B cell epitope regions predict the specificity of peptide antigens for rational use in the genus-, species-, or serovar-specific molecular serology of Chlamydia spp. We anticipate that these peptide antigens will improve chlamydial serology by providing easily accessible assays to nonspecialist laboratories. Our approach also lends itself to the identification of relevant epitopes of other microbial pathogens. PMID:25761461
Glareosin: a novel sexually dimorphic urinary lipocalin in the bank vole, Myodes glareolus.
Loxley, Grace M; Unsworth, Jennifer; Turton, Michael J; Jebb, Alexandra; Lilley, Kathryn S; Simpson, Deborah M; Rigden, Daniel J; Hurst, Jane L; Beynon, Robert J
2017-09-01
The urine of bank voles ( Myodes glareolus ) contains substantial quantities of a small protein that is expressed at much higher levels in males than females, and at higher levels in males in the breeding season. This protein was purified and completely sequenced at the protein level by mass spectrometry. Leucine/isoleucine ambiguity was completely resolved by metabolic labelling, monitoring the incorporation of dietary deuterated leucine into specific sites in the protein. The predicted mass of the sequenced protein was exactly consonant with the mass of the protein measured in bank vole urine samples, correcting for the formation of two disulfide bonds. The sequence of the protein revealed that it was a lipocalin related to aphrodisin and other odorant-binding proteins (OBPs), but differed from all OBPs previously described. The pattern of secretion in urine used for scent marking by male bank voles, and the similarity to other lipocalins used as chemical signals in rodents, suggest that this protein plays a role in male sexual and/or competitive communication. We propose the name glareosin for this novel protein to reflect the origin of the protein and to emphasize the distinction from known OBPs. © 2017 The Authors.
Ahmed, Mumdooh A M; Bamm, Vladimir V; Harauz, George; Ladizhansky, Vladimir
2007-08-28
The genes of the oligodendrocyte lineage (Golli) encode a family of developmentally regulated isoforms of myelin basic protein. The "classic" MBP isoforms arise from transcription start site 3, whereas Golli-specific isoforms arise from transcription start site 1, and comprise both Golli-specific and classic MBP sequences. The Golli isoform BG21 has been suggested to play roles in myelination and T cell activation pathways. It is an intrinsically disordered protein, thereby presenting a large effective surface area for interaction with other proteins such as Golli-interacting protein. We have used multidimensional heteronuclear NMR spectroscopy to achieve sequence-specific resonance assignments of the recombinant murine BG21 in physiologically relevant buffer, to analyze its secondary structure using chemical shift indexing (CSI), and to investigate its backbone dynamics using 15N spin relaxation measurements. We have assigned 184 out of 199 residues unambiguously. The CSI analysis revealed little ordered secondary structure under these conditions, with only some small fragments having a slight tendency toward alpha-helicity, which may represent putative recognition motifs. The 15N relaxation and NOE measurements confirmed the general behavior of the protein as an extended polypeptide chain, with the N-terminal Golli-specific portion (residues S5-T69) being exceptionally flexible, even in comparison to other intrinsically disordered proteins that have been studied this way. The high degree of flexibility of this N-terminal region may be to provide additional plasticity, or conformational adaptability, in protein-protein interactions. Another highly mobile segment, A126-S127-G128-G129, may function as a hinge.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Diversity in Requirement of Genetic and Epigenetic Factors for Centromere Function in Fungi ▿
Roy, Babhrubahan; Sanyal, Kaustuv
2011-01-01
A centromere is a chromosomal region on which several proteins assemble to form the kinetochore. The centromere-kinetochore complex helps in the attachment of chromosomes to spindle microtubules to mediate segregation of chromosomes to daughter cells during mitosis and meiosis. In several budding yeast species, the centromere forms in a DNA sequence-dependent manner, whereas in most other fungi, factors other than the DNA sequence also determine the centromere location, as centromeres were able to form on nonnative sequences (neocentromeres) when native centromeres were deleted in engineered strains. Thus, in the absence of a common DNA sequence, the cues that have facilitated centromere formation on a specific DNA sequence for millions of years remain a mystery. Kinetochore formation is facilitated by binding of a centromere-specific histone protein member of the centromeric protein A (CENP-A) family that replaces a canonical histone H3 to form a specialized centromeric chromatin structure. However, the process of kinetochore formation on the rapidly evolving and seemingly diverse centromere DNAs in different fungal species is largely unknown. More interestingly, studies in various yeasts suggest that the factors required for de novo centromere formation (establishment) may be different from those required for maintenance (propagation) of an already established centromere. Apart from the DNA sequence and CENP-A, many other factors, such as posttranslational modification (PTM) of histones at centric and pericentric chromatin, RNA interference, and DNA methylation, are also involved in centromere formation, albeit in a species-specific manner. In this review, we discuss how several genetic and epigenetic factors influence the evolution of structure and function of centromeres in fungal species. PMID:21908596
Ivanov, Stefan M; Huber, Roland G; Warwicker, Jim; Bond, Peter J
2016-11-01
Critical regulatory pathways are replete with instances of intra- and interfamily protein-protein interactions due to the pervasiveness of gene duplication throughout evolution. Discerning the specificity determinants within these systems has proven a challenging task. Here, we present an energetic analysis of the specificity determinants within the Bcl-2 family of proteins (key regulators of the intrinsic apoptotic pathway) via a total of ∼20 μs of simulation of 60 distinct protein-protein complexes. We demonstrate where affinity and specificity of protein-protein interactions arise across the family, and corroborate our conclusions with extensive experimental evidence. We identify energy and specificity hotspots that may offer valuable guidance in the design of targeted therapeutics for manipulating the protein-protein interactions within the apoptosis-regulating pathway. Moreover, we propose a conceptual framework that allows us to quantify the relationship between sequence, structure, and binding energetics. This approach may represent a general methodology for investigating other paralogous protein-protein interaction sites. Copyright © 2016 Elsevier Ltd. All rights reserved.
Liu, Yi-Ke; Li, He-Ping; Huang, Tao; Cheng, Wei; Gao, Chun-Sheng; Zuo, Dong-Yun; Zhao, Zheng-Xi; Liao, Yu-Cai
2014-10-29
Wheat-specific ribosomal protein L21 (RPL21) is an endogenous reference gene suitable for genetically modified (GM) wheat identification. This taxon-specific RPL21 sequence displayed high homogeneity in different wheat varieties. Southern blots revealed 1 or 3 copies, and sequence analyses showed one amplicon in common wheat. Combined analyses with sequences from common wheat (AABBDD) and three diploid ancestral species, Triticum urartu (AA), Aegilops speltoides (BB), and Aegilops tauschii (DD), demonstrated the presence of this amplicon in the AA genome. Using conventional qualitative polymerase chain reaction (PCR), the limit of detection was 2 copies of wheat haploid genome per reaction. In the quantitative real-time PCR assay, limits of detection and quantification were about 2 and 8 haploid genome copies, respectively, the latter of which is 2.5-4-fold lower than other reported wheat endogenous reference genes. Construct-specific PCR assays were developed using RPL21 as an endogenous reference gene, and as little as 0.5% of GM wheat contents containing Arabidopsis NPR1 were properly quantified.
Real-time functional imaging for monitoring miR-133 during myogenic differentiation.
Kato, Yoshio; Miyaki, Shigeru; Yokoyama, Shigetoshi; Omori, Shin; Inoue, Atsushi; Horiuchi, Machiko; Asahara, Hiroshi
2009-11-01
MicroRNAs (miRNAs) are a class of non-coding small RNAs that act as negative regulators of gene expression through sequence-specific interactions with the 3' untranslated regions (UTRs) of target mRNA and play various biological roles. miR-133 was identified as a muscle-specific miRNA that enhanced the proliferation of myoblasts during myogenic differentiation, although its activity in myogenesis has not been fully characterized. Here, we developed a novel retroviral vector system for monitoring muscle-specific miRNA in living cells by using a green fluorescent protein (GFP) that is connected to the target sequence of miR-133 via the UTR and a red fluorescent protein for normalization. We demonstrated that the functional promotion of miR-133 during myogenesis is visualized by the reduction of GFP carrying the miR-133 target sequence, suggesting that miR-133 specifically down-regulates its targets during myogenesis in accordance with its expression. Our cell-based miRNA functional assay monitoring miR-133 activity should be a useful tool in elucidating the role of miRNAs in various biological events.
The active site of O-GlcNAc transferase imposes constraints on substrate sequence
Rafie, Karim; Blair, David E.; Borodkin, Vladimir S.; Albarbarawi, Osama; van Aalten, Daan M. F.
2016-01-01
O-GlcNAc transferase (OGT) glycosylates a diverse range of intracellular proteins with O-linked N-acetylglucosamine (O-GlcNAc), an essential and dynamic post-translational modification in metazoa. Although this enzyme modifies hundreds of proteins with O-GlcNAc, it is not understood how OGT achieves substrate specificity. In this study, we describe the application of a high-throughput OGT assay on a library of peptides. The sites of O-GlcNAc modification were mapped by ETD-mass spectrometry, and found to correlate with previously detected O-GlcNAc sites. Crystal structures of four acceptor peptides in complex with human OGT suggest that a combination of size and conformational restriction defines sequence specificity in the −3 to +2 subsites. This work reveals that while the N-terminal TPR repeats of hOGT may play a role in substrate recognition, the sequence restriction imposed by the peptide-binding site makes a significant contribution to O-GlcNAc site specificity. PMID:26237509
A Novel Helicase-Type Protein in the Nucleolus: Protein NOH61
Zirwes, Rudolf F.; Eilbracht, Jens; Kneissel, Sandra; Schmidt-Zachmann, Marion S.
2000-01-01
We report the identification, cDNA cloning, and molecular characterization of a novel, constitutive nucleolar protein. The cDNA-deduced amino acid sequence of the human protein defines a polypeptide of a calculated mass of 61.5 kDa and an isoelectric point of 9.9. Inspection of the primary sequence disclosed that the protein is a member of the family of “DEAD-box” proteins, representing a subgroup of putative ATP-dependent RNA helicases. ATPase activity of the recombinant protein is evident and stimulated by a variety of polynucleotides tested. Immunolocalization studies revealed that protein NOH61 (nucleolar helicase of 61 kDa) is highly conserved during evolution and shows a strong accumulation in nucleoli. Biochemical experiments have shown that protein NOH61 synthesized in vitro sediments with ∼11.5 S, i.e., apparently as homo-oligomeric structures. By contrast, sucrose gradient centrifugation analysis of cellular extracts obtained with buffers of elevated ionic strength (600 mM NaCl) revealed that the solubilized native protein sediments with ∼4 S, suggestive of the monomeric form. Interestingly, protein NOH61 has also been identified as a specific constituent of free nucleoplasmic 65S preribosomal particles but is absent from cytoplasmic ribosomes. Treatment of cultured cells with 1) the transcription inhibitor actinomycin D and 2) RNase A results in a complete dissociation of NOH61 from nucleolar structures. The specific intracellular localization and its striking sequence homology to other known RNA helicases lead to the hypothesis that protein NOH61 might be involved in ribosome synthesis, most likely during the assembly process of the large (60S) ribosomal subunit. PMID:10749921
Muda, Marco; Worby, Carolyn A; Simonson-Leff, Nancy; Clemens, James C; Dixon, Jack E
2002-08-15
Despite the wealth of information generated by genome-sequencing projects, the identification of in vivo substrates of specific protein kinases and phosphatases is hampered by the large number of candidate enzymes, overlapping enzyme specificity and sequence similarity. In the present study, we demonstrate the power of RNA interference (RNAi) to dissect signal transduction cascades involving specific kinases and phosphatases. RNAi is used to identify the cellular tyrosine kinases upstream of the phosphorylation of Down-Syndrome cell-adhesion molecule (Dscam), a novel cell-surface molecule of the immunoglobulin-fibronectin super family, which has been shown to be important for axonal path-finding in Drosophila. Tyrosine phosphorylation of Dscam recruits the Src homology 2 domain of the adaptor protein Dock to the receptor. Dock, the ortho- logue of mammalian Nck, is also essential for correct axonal path-finding in Drosophila. We further determined that Dock is tyrosine-phosphorylated in vivo and identified DPTP61F as the protein tyrosine phosphatase responsible for maintaining Dock in its non-phosphorylated state. The present study illustrates the versatility of RNAi in the identification of the physiological substrates for protein kinases and phosphatases.
Giessen, Tobias W
2016-10-01
Compartmentalization is one of the defining features of life. Cells use protein compartments to exert spatial control over their metabolism, store nutrients and create unique microenvironments needed for essential physiological processes. Encapsulins are a recently discovered class of protein nanocompartments found in bacteria and archaea that naturally encapsulate cargo proteins. A short C-terminal targeting sequence directs the highly specific encapsulation process in vivo. Here, I will initially discuss the properties, diversity and putative function of encapsulins. The unique characteristics and potential uses of the self-sorting cargo-packaging process found in encapsulin systems will then be highlighted. Examples for the application of encapsulins as cell-specific optical nanoprobes and targeted therapeutic delivery systems will be discussed with an emphasis on the ability to integrate multiple functionalities within a single nanodevice. By fusing targeting sequences to non-native proteins, encapsulins can also be used as specific nanocontainers and enzymatic nanoreactors in vivo. I will end by briefly discussing future avenues for encapsulin research related to both basic microbial metabolism and applications in biomedicine, catalysis and materials science. Copyright © 2016 Elsevier Ltd. All rights reserved.
A framework for classification of prokaryotic protein kinases.
Tyagi, Nidhi; Anamika, Krishanpal; Srinivasan, Narayanaswamy
2010-05-26
Overwhelming majority of the Serine/Threonine protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Serine/Threonine protein kinases recognized from genomes of prokaryotes have been used to develop a classification framework for prokaryotic Ser/Thr protein kinases. We have used traditional sequence alignment and phylogenetic approaches and clustered the prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence database we recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, we also identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses. Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity and protein-protein interactions in the signaling pathways in these microbes.
The Classification of Protein Domains.
Dawson, Natalie; Sillitoe, Ian; Marsden, Russell L; Orengo, Christine A
2017-01-01
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
A new fusion protein platform for quantitatively measuring activity of multiple proteases
2014-01-01
Background Recombinant proteins fused with specific cleavage sequences are widely used as substrate for quantitatively analyzing the activity of proteases. Here we propose a new fusion platform for multiple proteases, by using diaminopropionate ammonia-lyase (DAL) as the fusion protein. It was based on the finding that a fused His6-tag could significantly decreases the activities of DAL from E. coli (eDAL) and Salmonella typhimurium (sDAL). Previously, we have shown that His6GST-tagged eDAL could be used to determine the activity of tobacco etch virus protease (TEVp) under different temperatures or in the denaturant at different concentrations. In this report, we will assay different tags and cleavage sequences on DAL for expressing yield in E. coli, stability of the fused proteins and performance of substrate of other common proteases. Results We tested seven different protease cleavage sequences (rhinovirus 3C, TEV protease, factor Xa, Ssp DnaB intein, Sce VMA1 intein, thrombin and enterokinase), three different tags (His6, GST, CBD and MBP) and two different DALs (eDAL and sDAL), for their performance as substrate to the seven corresponding proteases. Among them, we found four active DAL-fusion substrates suitable for TEVp, factor Xa, thrombin and DnaB intein. Enterokinase cleaved eDAL at undesired positions and did not process sDAL. Substitution of GST with MBP increase the expression level of the fused eDAL and this fusion protein was suitable as a substrate for analyzing activity of rhinovirus 3C. We demonstrated that SUMO protease Ulp1 with a N-terminal His6-tag or MBP tag displayed different activity using the designed His6SUMO-eDAL as substrate. Finally, owing to the high level of the DAL-fusion protein in E. coli, these protein substrates can also be detected directly from the crude extract. Conclusion The results show that our designed DAL-fusion proteins can be used to quantify the activities of both sequence- and conformational-specific proteases, with sufficient substrate specificity. PMID:24649897
Detection of nucleic acids by multiple sequential invasive cleavages
Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.
1999-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann; Kwiatkowski, Robert W.; Vavra, Stephanie H.
2005-03-29
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of nucleic acid from various viruses in a sample.
Detection of nucleic acids by multiple sequential invasive cleavages 02
Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.
2002-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
Detection of nucleic acids by multiple sequential invasive cleavages
Hall, Jeff G; Lyamichev, Victor I; Mast, Andrea L; Brow, Mary Ann D
2012-10-16
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
Peng, Tao; Free, Paul; Fernig, David G.; Lim, Sierin; Tomczak, Nikodem
2016-01-01
Porous protein cages are supramolecular protein self-assemblies presenting pores that allow the access of surrounding molecules and ions into their core in order to store and transport them in biological environments. Protein cages’ pores are attractive channels for the internalisation of inorganic nanoparticles and an alternative for the preparation of hybrid bioinspired nanoparticles. However, strategies based on nanoparticle transport through the pores are largely unexplored, due to the difficulty of tailoring nanoparticles that have diameters commensurate with the pores size and simultaneously displaying specific affinity to the cages’ core and low non-specific binding to the cages’ outer surface. We evaluated the specific internalisation of single small gold nanoparticles, 3.9 nm in diameter, into porous protein cages via affinity binding. The E2 protein cage derived from the Geobacillus stearothermophilus presents 12 pores, 6 nm in diameter, and an empty core of 13 nm in diameter. We engineered the E2 protein by site-directed mutagenesis with oligohistidine sequences exposing them into the cage’s core. Dynamic light scattering and electron microscopy analysis show that the structures of E2 protein cages mutated with bis- or penta-histidine sequences are well conserved. The surface of the gold nanoparticles was passivated with a self-assembled monolayer made of a mixture of short peptidols and thiolated alkane ethylene glycol ligands. Such monolayers are found to provide thin coatings preventing non-specific binding to proteins. Further functionalisation of the peptide coated gold nanoparticles with Ni2+ nitrilotriacetic moieties enabled the specific binding to oligohistidine tagged cages. The internalisation via affinity binding was evaluated by electron microscopy analysis. From the various mutations tested, only the penta-histidine mutated E2 protein cage showed repeatable and stable internalisation. The present work overcomes the limitations of currently available approaches and provides a new route to design tailored and well-controlled hybrid nanoparticles. PMID:27622533
A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.
Michnick, S W; Shakhnovich, E
1998-01-01
Nucleation-growth theory predicts that fast-folding peptide sequences fold to their native structure via structures in a transition-state ensemble that share a small number of native contacts (the folding nucleus). Experimental and theoretical studies of proteins suggest that residues participating in folding nuclei are conserved among homologs. We attempted to determine if this is true in proteins with highly diverged sequences but identical folds (superfamilies). We describe a strategy based on comparisons of residue conservation in natural superfamily sequences with simulated sequences (generated with a Monte-Carlo sequence design strategy) for the same proteins. The basic assumptions of the strategy were that natural sequences will conserve residues needed for folding and stability plus function, the simulated sequences contain no functional conservation, and nucleus residues make native contacts with each other. Based on these assumptions, we identified seven potential nucleus residues in ubiquitin superfamily members. Non-nucleus conserved residues were also identified; these are proposed to be involved in stabilizing native interactions. We found that all superfamily members conserved the same potential nucleus residue positions, except those for which the structural topology is significantly different. Our results suggest that the conservation of the nucleus of a specific fold can be predicted by comparing designed simulated sequences with natural highly diverged sequences that fold to the same structure. We suggest that such a strategy could be used to help plan protein folding and design experiments, to identify new superfamily members, and to subdivide superfamilies further into classes having a similar folding mechanism.
GDAP: a web tool for genome-wide protein disulfide bond prediction.
O'Connor, Brian D; Yeates, Todd O
2004-07-01
The Genomic Disulfide Analysis Program (GDAP) provides web access to computationally predicted protein disulfide bonds for over one hundred microbial genomes, including both bacterial and achaeal species. In the GDAP process, sequences of unknown structure are mapped, when possible, to known homologous Protein Data Bank (PDB) structures, after which specific distance criteria are applied to predict disulfide bonds. GDAP also accepts user-supplied protein sequences and subsequently queries the PDB sequence database for the best matches, scans for possible disulfide bonds and returns the results to the client. These predictions are useful for a variety of applications and have previously been used to show a dramatic preference in certain thermophilic archaea and bacteria for disulfide bonds within intracellular proteins. Given the central role these stabilizing, covalent bonds play in such organisms, the predictions available from GDAP provide a rich data source for designing site-directed mutants with more stable thermal profiles. The GDAP web application is a gateway to this information and can be used to understand the role disulfide bonds play in protein stability both in these unusual organisms and in sequences of interest to the individual researcher. The prediction server can be accessed at http://www.doe-mbi.ucla.edu/Services/GDAP.
Takeuchi, Takeshi; Koyanagi, Ryo; Gyoja, Fuki; Kanda, Miyuki; Hisata, Kanako; Fujie, Manabu; Goto, Hiroki; Yamasaki, Shinichi; Nagai, Kiyohito; Morino, Yoshiaki; Miyamoto, Hiroshi; Endo, Kazuyoshi; Endo, Hirotoshi; Nagasawa, Hiromichi; Kinoshita, Shigeharu; Asakawa, Shuichi; Watabe, Shugo; Satoh, Noriyuki; Kawashima, Takeshi
2016-01-01
Bivalve molluscs have flourished in marine environments, and many species constitute important aquatic resources. Recently, whole genome sequences from two bivalves, the pearl oyster, Pinctada fucata, and the Pacific oyster, Crassostrea gigas, have been decoded, making it possible to compare genomic sequences among molluscs, and to explore general and lineage-specific genetic features and trends in bivalves. In order to improve the quality of sequence data for these purposes, we have updated the entire P. fucata genome assembly. We present a new genome assembly of the pearl oyster, Pinctada fucata (version 2.0). To update the assembly, we conducted additional sequencing, obtaining accumulated sequence data amounting to 193× the P. fucata genome. Sequence redundancy in contigs that was caused by heterozygosity was removed in silico, which significantly improved subsequent scaffolding. Gene model version 2.0 was generated with the aid of manual gene annotations supplied by the P. fucata research community. Comparison of mollusc and other bilaterian genomes shows that gene arrangements of Hox, ParaHox, and Wnt clusters in the P. fucata genome are similar to those of other molluscs. Like the Pacific oyster, P. fucata possesses many genes involved in environmental responses and in immune defense. Phylogenetic analyses of heat shock protein70 and C1q domain-containing protein families indicate that extensive expansion of genes occurred independently in each lineage. Several gene duplication events prior to the split between the pearl oyster and the Pacific oyster are also evident. In addition, a number of tandem duplications of genes that encode shell matrix proteins are also well characterized in the P. fucata genome. Both the Pinctada and Crassostrea lineages have expanded specific gene families in a lineage-specific manner. Frequent duplication of genes responsible for shell formation in the P. fucata genome explains the diversity of mollusc shell structures. These duplications reveal dynamic genome evolution to forge the complex physiology that enables bivalves to employ a sessile lifestyle in the intertidal zone.
Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures.
Vallat, Brinda; Madrid-Aliste, Carlos; Fiser, Andras
2015-08-01
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.
Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D
1998-08-15
Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2.
Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D
1998-01-01
Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2. PMID:9693111
Gumucio, D L; Rood, K L; Gray, T A; Riordan, M F; Sartor, C I; Collins, F S
1988-01-01
The molecular mechanisms responsible for the human fetal-to-adult hemoglobin switch have not yet been elucidated. Point mutations identified in the promoter regions of gamma-globin genes from individuals with nondeletion hereditary persistence of fetal hemoglobin (HPFH) may mark cis-acting sequences important for this switch, and the trans-acting factors which interact with these sequences may be integral parts in the puzzle of gamma-globin gene regulation. We have used gel retardation and footprinting strategies to define nuclear proteins which bind to the normal gamma-globin promoter and to determine the effect of HPFH mutations on the binding of a subset of these proteins. We have identified five proteins in human erythroleukemia cells (K562 and HEL) which bind to the proximal promoter region of the normal gamma-globin gene. One factor, gamma CAAT, binds the duplicated CCAAT box sequences; the -117 HPFH mutation increases the affinity of interaction between gamma CAAT and its cognate site. Two proteins, gamma CAC1 and gamma CAC2, bind the CACCC sequence. These proteins require divalent cations for binding. The -175 HPFH mutation interferes with the binding of a fourth protein, gamma OBP, which binds an octamer sequence (ATGCAAAT) in the normal gamma-globin promoter. The HPFH phenotype of the -175 mutation indicates that the octamer-binding protein may play a negative regulatory role in this setting. A fifth protein, EF gamma a, binds to sequences which overlap the octamer-binding site. The erythroid-specific distribution of EF gamma a and its close approximation to an apparent repressor-binding site suggest that it may be important in gamma-globin regulation. Images PMID:2468996
Pagan, Rafael F; Massey, Steven E
2014-02-01
Proteins are regarded as being robust to the deleterious effects of mutations. Here, the neutral emergence of mutational robustness in a population of single domain proteins is explored using computer simulations. A pairwise contact model was used to calculate the ΔG of folding (ΔG folding) using the three dimensional protein structure of leech eglin C. A random amino acid sequence with low mutational robustness, defined as the average ΔΔG resulting from a point mutation (ΔΔG average), was threaded onto the structure. A population of 1,000 threaded sequences was evolved under selection for stability, using an upper and lower energy threshold. Under these conditions, mutational robustness increased over time in the most common sequence in the population. In contrast, when the wild type sequence was used it did not show an increase in robustness. This implies that the emergence of mutational robustness is sequence specific and that wild type sequences may be close to maximal robustness. In addition, an inverse relationship between ∆∆G average and protein stability is shown, resulting partly from a larger average effect of point mutations in more stable proteins. The emergence of mutational robustness was also observed in the Escherichia coli colE1 Rop and human CD59 proteins, implying that the property may be common in single domain proteins under certain simulation conditions. The results indicate that at least a portion of mutational robustness in small globular proteins might have arisen by a process of neutral emergence, and could be an example of a beneficial trait that has not been directly selected for, termed a "pseudaptation."
ERIC Educational Resources Information Center
Kugel, Jennifer F.
2008-01-01
An undergraduate biochemistry laboratory experiment that will teach the technique of fluorescence resonance energy transfer (FRET) while analyzing protein-induced DNA bending is described. The experiment uses the protein TATA binding protein (TBP), which is a general transcription factor that recognizes and binds specific DNA sequences known as…
Electrophoretic mobility shift scanning using an automated infrared DNA sequencer.
Sano, M; Ohyama, A; Takase, K; Yamamoto, M; Machida, M
2001-11-01
Electrophoretic mobility shift assay (EMSA) is widely used in the study of sequence-specific DNA-binding proteins, including transcription factors and mismatch binding proteins. We have established a non-radioisotope-based protocol for EMSA that features an automated DNA sequencer with an infrared fluorescent dye (IRDye) detection unit. Our modification of the elec- trophoresis unit, which includes cooling the gel plates with a reduced well-to-read length, has made it possible to detect shifted bands within 1 h. Further, we have developed a rapid ligation-based method for generating IRDye-labeled probes with an approximately 60% cost reduction. This method has the advantages of real-time scanning, stability of labeled probes, and better safety associated with nonradioactive methods of detection. Analysis of a promoter from an industrially important filamentous fungus, Aspergillus oryzae, in a prototype experiment revealed that the method we describe has potential for use in systematic scanning and identification of the functionally important elements to which cellular factors bind in a sequence-specific manner.
Harris, Katherine E; Aldred, Shelley Force; Davison, Laura M; Ogana, Heather Anne N; Boudreau, Andrew; Brüggemann, Marianne; Osborn, Michael; Ma, Biao; Buelow, Benjamin; Clarke, Starlynn C; Dang, Kevin H; Iyer, Suhasini; Jorgensen, Brett; Pham, Duy T; Pratap, Payal P; Rangaswamy, Udaya S; Schellenberger, Ute; van Schooten, Wim C; Ugamraj, Harshad S; Vafa, Omid; Buelow, Roland; Trinklein, Nathan D
2018-01-01
We created a novel transgenic rat that expresses human antibodies comprising a diverse repertoire of heavy chains with a single common rearranged kappa light chain (IgKV3-15-JK1). This fixed light chain animal, called OmniFlic, presents a unique system for human therapeutic antibody discovery and a model to study heavy chain repertoire diversity in the context of a constant light chain. The purpose of this study was to analyze heavy chain variable gene usage, clonotype diversity, and to describe the sequence characteristics of antigen-specific monoclonal antibodies (mAbs) isolated from immunized OmniFlic animals. Using next-generation sequencing antibody repertoire analysis, we measured heavy chain variable gene usage and the diversity of clonotypes present in the lymph node germinal centers of 75 OmniFlic rats immunized with 9 different protein antigens. Furthermore, we expressed 2,560 unique heavy chain sequences sampled from a diverse set of clonotypes as fixed light chain antibody proteins and measured their binding to antigen by ELISA. Finally, we measured patterns and overall levels of somatic hypermutation in the full B-cell repertoire and in the 2,560 mAbs tested for binding. The results demonstrate that OmniFlic animals produce an abundance of antigen-specific antibodies with heavy chain clonotype diversity that is similar to what has been described with unrestricted light chain use in mammals. In addition, we show that sequence-based discovery is a highly effective and efficient way to identify a large number of diverse monoclonal antibodies to a protein target of interest.
Automated design of degenerate codon libraries.
Mena, Marco A; Daugherty, Patrick S
2005-12-01
Degenerate codon libraries are frequently used in protein engineering and evolution studies but are often limited to targeting a small number of positions to adequately limit the search space. To mitigate this, codon degeneracy can be limited using heuristics or previous knowledge of the targeted positions. To automate design of libraries given a set of amino acid sequences, an algorithm (LibDesign) was developed that generates a set of possible degenerate codon libraries, their resulting size, and their score relative to a user-defined scoring function. A gene library of a specified size can then be constructed that is representative of the given amino acid distribution or that includes specific sequences or combinations thereof. LibDesign provides a new tool for automated design of high-quality protein libraries that more effectively harness existing sequence-structure information derived from multiple sequence alignment or computational protein design data.
de Graaf, M; Boven, E; Oosterhoff, D; van der Meulen-Muileman, I H; Huls, G A; Gerritsen, W R; Haisma, H J; Pinedo, H M
2002-03-04
Monoclonal antibodies against tumour-associated antigens could be useful to deliver enzymes selectively to the site of a tumour for activation of a non-toxic prodrug. A completely human fusion protein may be advantageous for repeated administration, as host immune responses may be avoided. We have constructed a fusion protein consisting of a human single chain Fv antibody, C28, against the epithelial cell adhesion molecule and the human enzyme beta-glucuronidase. The sequences encoding C28 and human enzyme beta-glucuronidase were joined by a sequence encoding a flexible linker, and were preceded by the IgGkappa signal sequence for secretion of the fusion protein. A CHO cell line was engineered to secrete C28-beta-glucuronidase fusion protein. Antibody specificity and enzyme activity were retained in the secreted fusion protein that had an apparent molecular mass of 100 kDa under denaturing conditions. The fusion protein was able to convert a non-toxic prodrug of doxorubicin, N-[4-doxorubicin-N-carbonyl(oxymethyl)phenyl]-O-beta-glucuronyl carbamate to doxorubicin, resulting in cytotoxicity. A bystander effect was demonstrated, as doxorubicin was detected in all cells after N-[4-doxorubicin-N-carbonyl(oxymethyl)phenyl]-O-beta-glucuronyl carbamate administration when only 10% of the cells expressed the fusion protein. This is the first fully human and functional fusion protein consisting of an scFv against epithelial cell adhesion molecule and human enzyme beta-glucuronidase for future use in tumour-specific activation of a non-toxic glucuronide prodrug. Copyright 2002 Cancer Research UK
Multiplex single-molecule interaction profiling of DNA-barcoded proteins.
Gu, Liangcai; Li, Chao; Aach, John; Hill, David E; Vidal, Marc; Church, George M
2014-11-27
In contrast with advances in massively parallel DNA sequencing, high-throughput protein analyses are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule protein detection using optical methods is limited by the number of spectrally non-overlapping chromophores. Here we introduce a single-molecular-interaction sequencing (SMI-seq) technology for parallel protein interaction profiling leveraging single-molecule advantages. DNA barcodes are attached to proteins collectively via ribosome display or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide thin film to construct a random single-molecule array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies) and analysed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimetre. Furthermore, protein interactions can be measured on the basis of the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor and antibody-binding profiling, are demonstrated. SMI-seq enables 'library versus library' screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.
Multiplex single-molecule interaction profiling of DNA barcoded proteins
Gu, Liangcai; Li, Chao; Aach, John; Hill, David E.; Vidal, Marc; Church, George M.
2014-01-01
In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity. PMID:25252978
Jail fever (epidemic typhus) outbreak in Burundi.
Raoult, D; Roux, V; Ndihokubwayo, J B; Bise, G; Baudon, D; Marte, G; Birtles, R
1997-01-01
We recently investigated a suspected outbreak of epidemic typhus in a jail in Burundi. We tested sera of nine patients by microimmunofluorescence for antibodies to Rickettsia prowazekii and Rickettsia typhi. We also amplified and sequenced from lice gene portions specific for two R. prowazekii proteins: the gene encoding for citrate synthase and the gene encoding for the rickettsial outer membrane protein. All patients exhibited antibodies specific for R. prowazekii. Specific gene sequences were amplified in two lice from one patient. The patients had typical clinical manifestations, and two died. Molecular techniques provided a convenient and reliable means of examining lice and confirming this outbreak. The jail-associated outbreak predates an extensive ongoing outbreak of louse-borne typhus in central eastern Africa after civil war and in refugee camps in Rwanda, Burundi (1), and Zaire.
Jail fever (epidemic typhus) outbreak in Burundi.
Raoult, D.; Roux, V.; Ndihokubwayo, J. B.; Bise, G.; Baudon, D.; Marte, G.; Birtles, R.
1997-01-01
We recently investigated a suspected outbreak of epidemic typhus in a jail in Burundi. We tested sera of nine patients by microimmunofluorescence for antibodies to Rickettsia prowazekii and Rickettsia typhi. We also amplified and sequenced from lice gene portions specific for two R. prowazekii proteins: the gene encoding for citrate synthase and the gene encoding for the rickettsial outer membrane protein. All patients exhibited antibodies specific for R. prowazekii. Specific gene sequences were amplified in two lice from one patient. The patients had typical clinical manifestations, and two died. Molecular techniques provided a convenient and reliable means of examining lice and confirming this outbreak. The jail-associated outbreak predates an extensive ongoing outbreak of louse-borne typhus in central eastern Africa after civil war and in refugee camps in Rwanda, Burundi (1), and Zaire. PMID:9284381
Molecular cloning of chitinase 33 (chit33) gene from Trichoderma atroviride
Matroudi, S.; Zamani, M.R.; Motallebi, M.
2008-01-01
In this study Trichoderma atroviride was selected as over producer of chitinase enzyme among 30 different isolates of Trichoderma sp. on the basis of chitinase specific activity. From this isolate the genomic and cDNA clones encoding chit33 have been isolated and sequenced. Comparison of genomic and cDNA sequences for defining gene structure indicates that this gene contains three short introns and also an open reading frame coding for a protein of 321 amino acids. The deduced amino acid sequence includes a 19 aa putative signal peptide. Homology between this sequence and other reported Trichoderma Chit33 proteins are discussed. The coding sequence of chit33 gene was cloned in pEt26b(+) expression vector and expressed in E. coli. PMID:24031242
Martin, Natalia; Patel, Satyakam; Segre, Julia A.
2004-01-01
Mammalian epidermis provides a permeability barrier between an organism and its environment. Under homeostatic conditions, epidermal cells produce structural proteins, which are cross-linked in an orderly fashion to form a cornified envelope (CE). However, under genetic or environmental stress, specific genes are induced to rapidly build a temporary barrier. Small proline-rich (SPRR) proteins are the primary constituents of the CE. Under stress the entire family of 14 Sprr genes is upregulated. The Sprr genes are clustered within the larger epidermal differentiation complex on mouse chromosome 3, human chromosome 1q21. The clustering of the Sprr genes and their upregulation under stress suggest that these genes may be coordinately regulated. To identify enhancer elements that regulate this stress response activation of the Sprr locus, we utilized bioinformatic tools and classical biochemical dissection. Long-range comparative sequence analysis identified conserved noncoding sequences (CNSs). Clusters of epidermal-specific DNaseI-hypersensitive sites (HSs) mapped to specific CNSs. Increased prevalence of these HSs in barrier-deficient epidermis provides in vivo evidence of the regulation of the Sprr locus by these conserved sequences. Individual components of these HSs were cloned, and one was shown to have strong enhancer activity specific to conditions when the Sprr genes are coordinately upregulated. PMID:15574822
An RRM–ZnF RNA recognition module targets RBM10 to exonic sequences to promote exon exclusion
Collins, Katherine M.; Kainov, Yaroslav A.; Christodolou, Evangelos; Ray, Debashish; Morris, Quaid; Hughes, Timothy; Taylor, Ian A.
2017-01-01
Abstract RBM10 is an RNA-binding protein that plays an essential role in development and is frequently mutated in the context of human disease. RBM10 recognizes a diverse set of RNA motifs in introns and exons and regulates alternative splicing. However, the molecular mechanisms underlying this seemingly relaxed sequence specificity are not understood and functional studies have focused on 3΄ intronic sites only. Here, we dissect the RNA code recognized by RBM10 and relate it to the splicing regulatory function of this protein. We show that a two-domain RRM1–ZnF unit recognizes a GGA-centered motif enriched in RBM10 exonic sites with high affinity and specificity and test that the interaction with these exonic sequences promotes exon skipping. Importantly, a second RRM domain (RRM2) of RBM10 recognizes a C-rich sequence, which explains its known interaction with the intronic 3΄ site of NUMB exon 9 contributing to regulation of the Notch pathway in cancer. Together, these findings explain RBM10's broad RNA specificity and suggest that RBM10 functions as a splicing regulator using two RNA-binding units with different specificities to promote exon skipping. PMID:28379442
An RRM-ZnF RNA recognition module targets RBM10 to exonic sequences to promote exon exclusion.
Collins, Katherine M; Kainov, Yaroslav A; Christodolou, Evangelos; Ray, Debashish; Morris, Quaid; Hughes, Timothy; Taylor, Ian A; Makeyev, Eugene V; Ramos, Andres
2017-06-20
RBM10 is an RNA-binding protein that plays an essential role in development and is frequently mutated in the context of human disease. RBM10 recognizes a diverse set of RNA motifs in introns and exons and regulates alternative splicing. However, the molecular mechanisms underlying this seemingly relaxed sequence specificity are not understood and functional studies have focused on 3΄ intronic sites only. Here, we dissect the RNA code recognized by RBM10 and relate it to the splicing regulatory function of this protein. We show that a two-domain RRM1-ZnF unit recognizes a GGA-centered motif enriched in RBM10 exonic sites with high affinity and specificity and test that the interaction with these exonic sequences promotes exon skipping. Importantly, a second RRM domain (RRM2) of RBM10 recognizes a C-rich sequence, which explains its known interaction with the intronic 3΄ site of NUMB exon 9 contributing to regulation of the Notch pathway in cancer. Together, these findings explain RBM10's broad RNA specificity and suggest that RBM10 functions as a splicing regulator using two RNA-binding units with different specificities to promote exon skipping. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Determining Zebrafish Epitope Reactivity to Commercially Available Antibodies.
Villarreal, Michael A; Biediger, Nicole M; Bonner, Natalie A; Miller, Jennifer N; Zepeda, Samantha K; Ricard, Benjamin J; García, Dana M; Lewis, Karen A
2017-08-01
Antibodies raised against mammalian proteins may exhibit cross-reactivity with zebrafish proteins, making these antibodies useful for fish studies. However, zebrafish may express multiple paralogues of similar sequence and size, making them difficult to distinguish by traditional Western blot analysis. To identify the zebrafish proteins that are recognized by an antimammalian antibody, we developed a system to screen putative epitopes by cloning the sequences between the yeast SUMO protein and a C-terminal 6xHis tag. The recombinant fusion protein was expressed in Escherichia coli and analyzed by Western blot to conclusively identify epitopes that exhibit cross-reactivity with the antibodies of interest. This approach can be used to determine the species cross-reactivity and epitope specificity of a wide variety of peptide antigen-derived antibodies.
A genomewide survey of basic helix–loop–helix factors in Drosophila
Moore, Adrian W.; Barbel, Sandra; Jan, Lily Yeh; Jan, Yuh Nung
2000-01-01
The basic helix–loop–helix (bHLH) transcription factors play important roles in the specification of tissue type during the development of animals. We have used the information contained in the recently published genomic sequence of Drosophila melanogaster to identify 12 additional bHLH proteins. By sequence analysis we have assigned these proteins to families defined by Atonal, Hairy-Enhancer of Split, Hand, p48, Mesp, MYC/USF, and the bHLH-Per, Arnt, Sim (PAS) domain. In addition, one single protein represents a unique family of bHLH proteins. mRNA in situ analysis demonstrates that the genes encoding these proteins are expressed in several tissue types but are particularly concentrated in the developing nervous system and mesoderm. PMID:10973473
Lu, Y; Xu, W; Kang, A; Luo, Y; Guo, F; Yang, R; Zhang, J; Huang, K
2007-09-01
The hygromycin B phosphotransferase gene (hpt) has been widely used in the process of plant genetic engineering to produce plants that can secrete the HPT protein. As part of a safety assessment, sufficient quantities of the protein were produced in Escherichia coli to conduct in vitro digestibility and animal studies. Western blotting analysis showed that the HPT protein was digested by simulated gastric fluid within 40 s. ELISA demonstrated that the protein did not induce detectable levels of specific IgE antibodies or histamine in test animals. Alignment of the amino acid sequence of HPT with those of known allergens did not produce evidence of sequence similarities between these allergens and the HPT protein. We conclude that HPT has a low probability to induce allergenicity.
Protocols for efficient simulations of long-time protein dynamics using coarse-grained CABS model.
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2014-01-01
Coarse-grained (CG) modeling is a well-acknowledged simulation approach for getting insight into long-time scale protein folding events at reasonable computational cost. Depending on the design of a CG model, the simulation protocols vary from highly case-specific-requiring user-defined assumptions about the folding scenario-to more sophisticated blind prediction methods for which only a protein sequence is required. Here we describe the framework protocol for the simulations of long-term dynamics of globular proteins, with the use of the CABS CG protein model and sequence data. The simulations can start from a random or a selected (e.g., native) structure. The described protocol has been validated using experimental data for protein folding model systems-the prediction results agreed well with the experimental results.
Dickman, M B; Ha, Y S; Yang, Z; Adams, B; Huang, C
2003-05-01
When certain phytopathogenic fungi contact plant surfaces, specialized infection structures (appressoria) are produced that facilitate penetration of the plant external barrier; the cuticle. Recognition of this hydrophobic host surface must be sensed by the fungus, initiating the appropriate signaling pathway or pathways for pathogenic development. Using polymerase chain reaction and primers designed from mammalian protein kinase C sequences (PKC), we have isolated, cloned, and characterized a protein kinase from Colletotrichum trifolii, causal agent of alfalfa anthracnose. Though sequence analysis indicated conserved sequences in mammalian PKC genes, we were unable to induce activity of the fungal protein using known activators of PKC. Instead, we show that the C. trifolii gene, designated LIPK (lipid-induced protein kinase) is induced specifically by purified plant cutin or long-chain fatty acids which are monomeric constituents of cutin. PKC inhibitors prevented appressorium formation and, to a lesser extent, spore germination. Overexpression of LIPK resulted in multiple, abnormally shaped appressoria. Gene replacement of lipk yielded strains which were unable to develop appressoria and were unable to infect intact host plant tissue. However, these mutants were able to colonize host tissue following artificial wounding, resulting in typical anthracnose lesions. Taken together, these data indicate a central role in triggering infection structure formation for this protein kinase, which is induced specifically by components of the plant cuticle. Thus, the fungus is able to sense and use host surface chemistry to induce a protein kinase-mediated pathway that is required for pathogenic development.
Pang, Siew Wai; Lahiri, Chandrajit; Poh, Chit Laa; Tan, Kuan Onn
2018-05-01
Paraneoplastic Ma Family (PNMA) comprises a growing number of family members which share relatively conserved protein sequences encoded by the human genome and is localized to several human chromosomes, including the X-chromosome. Based on sequence analysis, PNMA family members share sequence homology to the Gag protein of LTR retrotransposon, and several family members with aberrant protein expressions have been reported to be closely associated with the human Paraneoplastic Disorder (PND). In addition, gene mutations of specific members of PNMA family are known to be associated with human mental retardation or 3-M syndrome consisting of restrictive post-natal growth or dwarfism, and development of skeletal abnormalities. Other than sequence homology, the physiological function of many members in this family remains unclear. However, several members of this family have been characterized, including cell signalling events mediated by these proteins that are associated with apoptosis, and cancer in different cell types. Furthermore, while certain PNMA family members show restricted gene expression in the human brain and testis, other PNMA family members exhibit broader gene expression or preferential and selective protein interaction profiles, suggesting functional divergence within the family. Functional analysis of some members of this family have identified protein domains that are required for subcellular localization, protein-protein interactions, and cell signalling events which are the focus of this review paper. Copyright © 2018 Elsevier Inc. All rights reserved.
Schmidt Am Busch, Marcel; Sedano, Audrey; Simonson, Thomas
2010-05-05
Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
2011-01-01
Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry. PMID:21605466
The Enigmatic Origin of Papillomavirus Protein Domains
Kirsip, Heleri; Gaston, Kevin
2017-01-01
Almost a century has passed since the discovery of papillomaviruses. A few decades of research have given a wealth of information on the molecular biology of papillomaviruses. Several excellent studies have been performed looking at the long- and short-term evolution of these viruses. However, when and how papillomaviruses originate is still a mystery. In this study, we systematically searched the (sequenced) biosphere to find distant homologs of papillomaviral protein domains. Our data show that, even including structural information, which allows us to find deeper evolutionary relationships compared to sequence-only based methods, only half of the protein domains in papillomaviruses have relatives in the rest of the biosphere. We show that the major capsid protein L1 and the replication protein E1 have relatives in several viral families, sharing three protein domains with Polyomaviridae and Parvoviridae. However, only the E1 replication protein has connections with cellular organisms. Most likely, the papillomavirus ancestor is of marine origin, a biotope that is not very well sequenced at the present time. Nevertheless, there is no evidence as to how papillomaviruses originated and how they became vertebrate and epithelium specific. PMID:28832519
The Enigmatic Origin of Papillomavirus Protein Domains.
Puustusmaa, Mikk; Kirsip, Heleri; Gaston, Kevin; Abroi, Aare
2017-08-23
Almost a century has passed since the discovery of papillomaviruses. A few decades of research have given a wealth of information on the molecular biology of papillomaviruses. Several excellent studies have been performed looking at the long- and short-term evolution of these viruses. However, when and how papillomaviruses originate is still a mystery. In this study, we systematically searched the (sequenced) biosphere to find distant homologs of papillomaviral protein domains. Our data show that, even including structural information, which allows us to find deeper evolutionary relationships compared to sequence-only based methods, only half of the protein domains in papillomaviruses have relatives in the rest of the biosphere. We show that the major capsid protein L1 and the replication protein E1 have relatives in several viral families, sharing three protein domains with Polyomaviridae and Parvoviridae . However, only the E1 replication protein has connections with cellular organisms. Most likely, the papillomavirus ancestor is of marine origin, a biotope that is not very well sequenced at the present time. Nevertheless, there is no evidence as to how papillomaviruses originated and how they became vertebrate and epithelium specific.
Crystal structure of bacillus subtilis YdaF protein : a putative ribosomal N-acetyltransferase.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brunzelle, J. S.; Wu, R.; Korolev, S. V.
2004-12-01
Comparative sequence analysis suggests that the ydaF gene encodes a protein (YdaF) that functions as an N-acetyltransferase, more specifically, a ribosomal N-acetyltransferase. Sequence analysis using basic local alignment search tool (BLAST) suggests that YdaF belongs to a large family of proteins (199 proteins found in 88 unique species of bacteria, archaea, and eukaryotes). YdaF also belongs to the COG1670, which includes the Escherichia coli RimL protein that is known to acetylate ribosomal protein L12. N-acetylation (NAT) has been found in all kingdoms. NAT enzymes catalyze the transfer of an acetyl group from acetyl-CoA (AcCoA) to a primary amino group. Formore » example, NATs can acetylate the N-terminal {alpha}-amino group, the {epsilon}-amino group of lysine residues, aminoglycoside antibiotics, spermine/speridine, or arylalkylamines such as serotonin. The crystal structure of the alleged ribosomal NAT protein, YdaF, from Bacillus subtilis presented here was determined as a part of the Midwest Center for Structural Genomics. The structure maintains the conserved tertiary structure of other known NATs and a high sequence similarity in the presumed AcCoA binding pocket in spite of a very low overall level of sequence identity to other NATs of known structure.« less
Non-B-Form DNA Is Enriched at Centromeres
Henikoff, Steven
2018-01-01
Abstract Animal and plant centromeres are embedded in repetitive “satellite” DNA, but are thought to be epigenetically specified. To define genetic characteristics of centromeres, we surveyed satellite DNA from diverse eukaryotes and identified variation in <10-bp dyad symmetries predicted to adopt non-B-form conformations. Organisms lacking centromeric dyad symmetries had binding sites for sequence-specific DNA-binding proteins with DNA-bending activity. For example, human and mouse centromeres are depleted for dyad symmetries, but are enriched for non-B-form DNA and are associated with binding sites for the conserved DNA-binding protein CENP-B, which is required for artificial centromere function but is paradoxically nonessential. We also detected dyad symmetries and predicted non-B-form DNA structures at neocentromeres, which form at ectopic loci. We propose that centromeres form at non-B-form DNA because of dyad symmetries or are strengthened by sequence-specific DNA binding proteins. This may resolve the CENP-B paradox and provide a general basis for centromere specification. PMID:29365169
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yusim, Karina; Korber, Bette Tina; Brander, Christian
The scope and purpose of the HIV molecular immunology database: HIV Molecular Immunology is a companion volume to HIV Sequence Compendium. This publication, the 2015 edition, is the PDF version of the web-based HIV Immunology Database (http://www.hiv.lanl.gov/ content/immunology/). The web interface for this relational database has many search options, as well as interactive tools to help immunologists design reagents and interpret their results. In the HIV Immunology Database, HIV-specific B-cell and T-cell responses are summarized and annotated. Immunological responses are divided into three parts, CTL, T helper, and antibody. Within these parts, defined epitopes are organized by protein and bindingmore » sites within each protein, moving from left to right through the coding regions spanning the HIV genome. We include human responses to natural HIV infections, as well as vaccine studies in a range of animal models and human trials. Responses that are not specifically defined, such as responses to whole proteins or monoclonal antibody responses to discontinuous epitopes, are summarized at the end of each protein section. Studies describing general HIV responses to the virus, but not to any specific protein, are included at the end of each part. The annotation includes information such as cross-reactivity, escape mutations, antibody sequence, TCR usage, functional domains that overlap with an epitope, immune response associations with rates of progression and therapy, and how specific epitopes were experimentally defined. Basic information such as HLA specificities for T-cell epitopes, isotypes of monoclonal antibodies, and epitope sequences are included whenever possible. All studies that we can find that incorporate the use of a specific monoclonal antibody are included in the entry for that antibody. A single T-cell epitope can have multiple entries, generally one entry per study. Finally, maps of all defined linear epitopes relative to the HXB2 reference proteins are provided. Alignments of CTL, helper T-cell, and antibody epitopes are available through the search interface on our web site at http:// www.hiv.lanl.gov/content/immunology.« less
Roosendaal, E; Jacobs, A A; Rathman, P; Sondermeyer, C; Stegehuis, F; Oudega, B; de Graaf, F K
1987-09-01
Analysis of the nucleotide sequence of the distal part of the fan gene cluster encoding the proteins involved in the biosynthesis of the fibrillar adhesin, K99, revealed the presence of two structural genes, fanG and fanH. The amino acid sequence of the gene products (FanG and FanH) showed significant homology to the amino acid sequence of the fibrillar subunit protein (FanC). Introduction of a site-specific frameshift mutation in fanG or fanH resulted in a simultaneous decrease in fibrillae production and adhesive capacity. Analysis of subcellular fractions showed that, in contrast to the K99 fibrillar subunit (FanC), both the FanH and the FanG protein were loosely associated with the outer membrane, possibly on the periplasmic side, but were not components of the fimbriae themselves.
Hayashi, T; Makino, K; Ohnishi, M; Kurokawa, K; Ishii, K; Yokoyama, K; Han, C G; Ohtsubo, E; Nakayama, K; Murata, T; Tanaka, M; Tobe, T; Iida, T; Takami, H; Honda, T; Sasakawa, C; Ogasawara, N; Yasunaga, T; Kuhara, S; Shiba, T; Hattori, M; Shinagawa, H
2001-02-28
Escherichia coli O157:H7 is a major food-borne infectious pathogen that causes diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome. Here we report the complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655. The chromosome is 5.5 Mb in size, 859 Kb larger than that of K-12. We identified a 4.1-Mb sequence highly conserved between the two strains, which may represent the fundamental backbone of the E. coli chromosome. The remaining 1.4-Mb sequence comprises of O157:H7-specific sequences, most of which are horizontally transferred foreign DNAs. The predominant roles of bacteriophages in the emergence of O157:H7 is evident by the presence of 24 prophages and prophage-like elements that occupy more than half of the O157:H7-specific sequences. The O157:H7 chromosome encodes 1632 proteins and 20 tRNAs that are not present in K-12. Among these, at least 131 proteins are assumed to have virulence-related functions. Genome-wide codon usage analysis suggested that the O157:H7-specific tRNAs are involved in the efficient expression of the strain-specific genes. A complete set of the genes specific to O157:H7 presented here sheds new insight into the pathogenicity and the physiology of O157:H7, and will open a way to fully understand the molecular mechanisms underlying the O157:H7 infection.
Shelar, Ashish; Bansal, Manju
2014-12-01
α-Helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. © 2014 Wiley Periodicals, Inc.
Enhancer activity of Helitron in sericin-1 gene promoter from Bombyx mori.
Huang, Ke; Li, Chun-Feng; Wu, Jie; Wei, Jun-Hong; Zou, Yong; Han, Min-Jin; Zhou, Ze-Yang
2016-06-01
Sericin is a kind of water-soluble protein expressed specifically in the middle silk gland of Bombyx mori. When the sericin-1 gene promoter was cloned and a transgenic vector was constructed to express a foreign protein, a specific Helitron, Bmhel-8, was identified in the sericin-1 gene promoter sequence in some genotypes of Bombyx mori and Bombyx mandarina. Given that the Bmhel-8 Helitron transposon was present only in some genotypes, it could be the source of allelic variation in the sericin-1 promoter. The length of the sericin-1 promoter sequence is approximately 1063 or 643 bp. The larger size of the sequence or allele is ascribed to the presence of Bmhel-8. Silkworm genotypes can be homozygous for either the shorter or larger promoter sequence or heterozygous, containing both alleles. Bmhel-8 in the sericin-1 promoter exhibits enhancer activity, as demonstrated by a dual-luciferase reporter system in BmE cell lines. Furthermore, Bmhel-8 displays enhancer activity in a sericin-1 promoter-driven gene expression system but does not regulate the tissue-specific expression of sericin-1. © 2016 Institute of Zoology, Chinese Academy of Sciences.
Taketa, Kazuhisa; Azuma, Masaki; Yamamoto, Ritsu; Fujimoto, Seiichiro; Nishi, Shinzo
1996-01-01
A high serum α‐fetoprotein (AFP) level was found in a patient with endometrial adenocarcinoma of the uterus, which appeared to be hepatoid on histological examination. The AFP of this unusual patient was purified hy immunoaffinity chromatography and characterized. The electrophoretic profiles on sodium dodecyl sulfate‐polyacrylamide gel electrophoresis both before and after glycopeptidase F treatment were indistinguishable from those of a hepatoma AFP. This indicates that the patient's AFP was also composed of a single polypeptide chain of Mr 67,000 and an N‐linked sugar chain of Mr 3,000. Amino acid sequence analyses of this AFP, and of AFP from hepatoma and umbilical cord serum indicated that the N‐terminal sequences were essentially the same. The sequence, Arg‐Thr‐Leu‐His‐Arg‐Asn‐Glu‐Tyr‐Gly‐Ile, was slightly different from previous reports, but matched that deduced from the cDNA sequence. AFP isoforms due to microheterogeneity of the sugar chain were analyzed by lectin affinity electrophoresis using a series of lectins. The AFP isoform profiles were distinct from those of proteins derived from cord serum, hepatoma, yolk sac tumor and gastric cancer. The reverse‐transcription of RNA from the tumor tissue followed by a polymerase chain reaction using primers with AFP‐specific sequences gave a product of the size and nucleotide sequence expected for AFP. mRNAs possessing the requisite sequences for albumin and transferrin syntheses were also detected in the tumor. The expression of these hepatocyte‐specific proteins supported the hepatoid nature of this tumor. PMID:8766525
Prediction of TF target sites based on atomistic models of protein-DNA complexes
Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno
2008-01-01
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190
Hussey, Richard S; Huang, Guozhong; Allen, Rex
2011-01-01
Identifying parasitism genes encoding proteins secreted from a plant-parasitic nematode's esophageal gland cells and injected through its stylet into plant tissue is the key to understanding the molecular basis of nematode parasitism of plants. Parasitism genes have been cloned by directly microaspirating the cytoplasm from the esophageal gland cells of different parasitic stages of cyst or root-knot nematodes to provide mRNA to create a gland cell-specific cDNA library by long-distance reverse-transcriptase polymerase chain reaction. cDNA clones are sequenced and deduced protein sequences with a signal peptide for secretion are identified for high-throughput in situ hybridization to confirm gland-specific expression.
Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.
2003-01-01
STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333
Miotto, Olivo; Heiny, AT; Tan, Tin Wee; August, J Thomas; Brusic, Vladimir
2008-01-01
Background The identification of mutations that confer unique properties to a pathogen, such as host range, is of fundamental importance in the fight against disease. This paper describes a novel method for identifying amino acid sites that distinguish specific sets of protein sequences, by comparative analysis of matched alignments. The use of mutual information to identify distinctive residues responsible for functional variants makes this approach highly suitable for analyzing large sets of sequences. To support mutual information analysis, we developed the AVANA software, which utilizes sequence annotations to select sets for comparison, according to user-specified criteria. The method presented was applied to an analysis of influenza A PB2 protein sequences, with the objective of identifying the components of adaptation to human-to-human transmission, and reconstructing the mutation history of these components. Results We compared over 3,000 PB2 protein sequences of human-transmissible and avian isolates, to produce a catalogue of sites involved in adaptation to human-to-human transmission. This analysis identified 17 characteristic sites, five of which have been present in human-transmissible strains since the 1918 Spanish flu pandemic. Sixteen of these sites are located in functional domains, suggesting they may play functional roles in host-range specificity. The catalogue of characteristic sites was used to derive sequence signatures from historical isolates. These signatures, arranged in chronological order, reveal an evolutionary timeline for the adaptation of the PB2 protein to human hosts. Conclusion By providing the most complete elucidation to date of the functional components participating in PB2 protein adaptation to humans, this study demonstrates that mutual information is a powerful tool for comparative characterization of sequence sets. In addition to confirming previously reported findings, several novel characteristic sites within PB2 are reported. Sequence signatures generated using the characteristic sites catalogue characterize concisely the adaptation characteristics of individual isolates. Evolutionary timelines derived from signatures of early human influenza isolates suggest that characteristic variants emerged rapidly, and remained remarkably stable through subsequent pandemics. In addition, the signatures of human-infecting H5N1 isolates suggest that this avian subtype has low pandemic potential at present, although it presents more human adaptation components than most avian subtypes. PMID:18315849
Kanai, Akio; Oida, Hanako; Matsuura, Nana; Doi, Hirofumi
2003-01-01
We systematically screened a genomic DNA library to identify proteins of the hyperthermophilic archaeon Pyrococcus furiosus using an expression cloning method. One gene product, which we named FAU-1 (P. furiosus AU-binding), demonstrated the strongest binding activity of all the genomic library-derived proteins tested against an AU-rich RNA sequence. The protein was purified to near homogeneity as a 54 kDa single polypeptide, and the gene locus corresponding to this FAU-1 activity was also sequenced. The FAU-1 gene encoded a 472-amino-acid protein that was characterized by highly charged domains consisting of both acidic and basic amino acids. The N-terminal half of the gene had a degree of similarity (25%) with RNase E from Escherichia coli. Five rounds of RNA-binding-site selection and footprinting analysis showed that the FAU-1 protein binds specifically to the AU-rich sequence in a loop region of a possible RNA ligand. Moreover, we demonstrated that the FAU-1 protein acts as an oligomer, and mainly as a trimer. These results showed that the FAU-1 protein is a novel heat-stable protein with an RNA loop-binding characteristic. PMID:12614195
A coevolution analysis for identifying protein-protein interactions by Fourier transform.
Yin, Changchuan; Yau, Stephen S-T
2017-01-01
Protein-protein interactions (PPIs) play key roles in life processes, such as signal transduction, transcription regulations, and immune response, etc. Identification of PPIs enables better understanding of the functional networks within a cell. Common experimental methods for identifying PPIs are time consuming and expensive. However, recent developments in computational approaches for inferring PPIs from protein sequences based on coevolution theory avoid these problems. In the coevolution theory model, interacted proteins may show coevolutionary mutations and have similar phylogenetic trees. The existing coevolution methods depend on multiple sequence alignments (MSA); however, the MSA-based coevolution methods often produce high false positive interactions. In this paper, we present a computational method using an alignment-free approach to accurately detect PPIs and reduce false positives. In the method, protein sequences are numerically represented by biochemical properties of amino acids, which reflect the structural and functional differences of proteins. Fourier transform is applied to the numerical representation of protein sequences to capture the dissimilarities of protein sequences in biophysical context. The method is assessed for predicting PPIs in Ebola virus. The results indicate strong coevolution between the protein pairs (NP-VP24, NP-VP30, NP-VP40, VP24-VP30, VP24-VP40, and VP30-VP40). The method is also validated for PPIs in influenza and E.coli genomes. Since our method can reduce false positive and increase the specificity of PPI prediction, it offers an effective tool to understand mechanisms of disease pathogens and find potential targets for drug design. The Python programs in this study are available to public at URL (https://github.com/cyinbox/PPI).
A coevolution analysis for identifying protein-protein interactions by Fourier transform
Yin, Changchuan; Yau, Stephen S. -T.
2017-01-01
Protein-protein interactions (PPIs) play key roles in life processes, such as signal transduction, transcription regulations, and immune response, etc. Identification of PPIs enables better understanding of the functional networks within a cell. Common experimental methods for identifying PPIs are time consuming and expensive. However, recent developments in computational approaches for inferring PPIs from protein sequences based on coevolution theory avoid these problems. In the coevolution theory model, interacted proteins may show coevolutionary mutations and have similar phylogenetic trees. The existing coevolution methods depend on multiple sequence alignments (MSA); however, the MSA-based coevolution methods often produce high false positive interactions. In this paper, we present a computational method using an alignment-free approach to accurately detect PPIs and reduce false positives. In the method, protein sequences are numerically represented by biochemical properties of amino acids, which reflect the structural and functional differences of proteins. Fourier transform is applied to the numerical representation of protein sequences to capture the dissimilarities of protein sequences in biophysical context. The method is assessed for predicting PPIs in Ebola virus. The results indicate strong coevolution between the protein pairs (NP-VP24, NP-VP30, NP-VP40, VP24-VP30, VP24-VP40, and VP30-VP40). The method is also validated for PPIs in influenza and E.coli genomes. Since our method can reduce false positive and increase the specificity of PPI prediction, it offers an effective tool to understand mechanisms of disease pathogens and find potential targets for drug design. The Python programs in this study are available to public at URL (https://github.com/cyinbox/PPI). PMID:28430779
Biosynthesis of riboflavin: an unusual riboflavin synthase of Methanobacterium thermoautotrophicum.
Eberhardt, S; Korn, S; Lottspeich, F; Bacher, A
1997-01-01
Riboflavin synthase was purified by a factor of about 1,500 from cell extract of Methanobacterium thermoautotrophicum. The enzyme had a specific activity of about 2,700 nmol mg(-1) h(-1) at 65 degrees C, which is relatively low compared to those of riboflavin synthases of eubacteria and yeast. Amino acid sequences obtained after proteolytic cleavage had no similarity with known riboflavin synthases. The gene coding for riboflavin synthase (designated ribC) was subsequently cloned by marker rescue with a ribC mutant of Escherichia coli. The ribC gene of M. thermoautotrophicum specifies a protein of 153 amino acid residues. The predicted amino acid sequence agrees with the information gleaned from Edman degradation of the isolated protein and shows 67% identity with the sequence predicted for the unannotated reading frame MJ1184 of Methanococcus jannaschii. The ribC gene is adjacent to a cluster of four genes with similarity to the genes cbiMNQO of Salmonella typhimurium, which form part of the cob operon (this operon contains most of the genes involved in the biosynthesis of vitamin B12). The amino acid sequence predicted by the ribC gene of M. thermoautotrophicum shows no similarity whatsoever to the sequences of riboflavin synthases of eubacteria and yeast. Most notably, the M. thermoautotrophicum protein does not show the internal sequence homology characteristic of eubacterial and yeast riboflavin synthases. The protein of M. thermoautotrophicum can be expressed efficiently in a recombinant E. coli strain. The specific activity of the purified, recombinant protein is 1,900 nmol mg(-1) h(-1) at 65 degrees C. In contrast to riboflavin synthases from eubacteria and fungi, the methanobacterial enzyme has an absolute requirement for magnesium ions. The 5' phosphate of 6,7-dimethyl-8-ribityllumazine does not act as a substrate. The findings suggest that riboflavin synthase has evolved independently in eubacteria and methanobacteria. PMID:9139911
Wytynck, Pieter; Rougé, Pierre; Van Damme, Els J M
2017-11-01
Ribosome-inactivating proteins (RIPs) are cytotoxic enzymes capable of halting protein synthesis by irreversible modification of ribosomes. Although RIPs are widespread they are not ubiquitous in the plant kingdom. The physiological importance of RIPs is not fully elucidated, but evidence suggests a role in the protection of the plant against biotic and abiotic stresses. Searches in the rice genome revealed a large and highly complex family of proteins with a RIP domain. A comparative analysis retrieved 38 RIP sequences from the genome sequence of Oryza sativa subspecies japonica and 34 sequences from the subspecies indica. The RIP sequences are scattered over different chromosomes but are mostly found on the third chromosome. The phylogenetic tree revealed the pairwise clustering of RIPs from japonica and indica. Molecular modeling and sequence analysis yielded information on the catalytic site of the enzyme, and suggested that a large part of RIP domains probably possess N-glycosidase activity. Several RIPs are differentially expressed in plant tissues and in response to specific abiotic stresses. This study provides an overview of RIP motifs in rice and will help to understand their biological role(s) and evolutionary relationships. Copyright © 2017 Elsevier Ltd. All rights reserved.
Molecular Characterization of Bombyx mori Cytoplasmic Polyhedrosis Virus Genome Segment 4
Ikeda, Keiko; Nagaoka, Sumiharu; Winkler, Stefan; Kotani, Kumiko; Yagi, Hiroaki; Nakanishi, Kae; Miyajima, Shigetoshi; Kobayashi, Jun; Mori, Hajime
2001-01-01
The complete nucleotide sequence of the genome segment 4 (S4) of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) was determined. The 3,259-nucleotide sequence contains a single long open reading frame which spans nucleotides 14 to 3187 and which is predicted to encode a protein with a molecular mass of about 130 kDa. Western blot analysis showed that S4 encodes BmCPV protein VP3, which is one of the outer components of the BmCPV virion. Sequence analysis of the deduced amino acid sequence of BmCPV VP3 revealed possible sequence homology with proteins from rice ragged stunt virus (RRSV) S2, Nilaparvata lugens reovirus S4, and Fiji disease fijivirus S4. This may suggest that plant reoviruses originated from insect viruses and that RRSV emerged more recently than other plant reoviruses. A chimeric protein consisting of BmCPV VP3 and green fluorescent protein (GFP) was constructed and expressed with BmCPV polyhedrin using a baculovirus expression vector. The VP3-GFP chimera was incorporated into BmCPV polyhedra and released under alkaline conditions. The results indicate that specific interactions occur between BmCPV polyhedrin and VP3 which might facilitate BmCPV virion occlusion into the polyhedra. PMID:11134312
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.
Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K
2012-04-01
The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
Dykeman, Eric C; Stockley, Peter G; Twarock, Reidun
2013-09-09
The current paradigm for assembly of single-stranded RNA viruses is based on a mechanism involving non-sequence-specific packaging of genomic RNA driven by electrostatic interactions. Recent experiments, however, provide compelling evidence for sequence specificity in this process both in vitro and in vivo. The existence of multiple RNA packaging signals (PSs) within viral genomes has been proposed, which facilitates assembly by binding coat proteins in such a way that they promote the protein-protein contacts needed to build the capsid. The binding energy from these interactions enables the confinement or compaction of the genomic RNAs. Identifying the nature of such PSs is crucial for a full understanding of assembly, which is an as yet untapped potential drug target for this important class of pathogens. Here, for two related bacterial viruses, we determine the sequences and locations of their PSs using Hamiltonian paths, a concept from graph theory, in combination with bioinformatics and structural studies. Their PSs have a common secondary structure motif but distinct consensus sequences and positions within the respective genomes. Despite these differences, the distributions of PSs in both viruses imply defined conformations for the packaged RNA genomes in contact with the protein shell in the capsid, consistent with a recent asymmetric structure determination of the MS2 virion. The PS distributions identified moreover imply a preferred, evolutionarily conserved assembly pathway with respect to the RNA sequence with potentially profound implications for other single-stranded RNA viruses known to have RNA PSs, including many animal and human pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Liu, Yu-Cheng; Yang, Meng-Han; Lin, Win-Li; Huang, Chien-Kang; Oyang, Yen-Jen
2009-12-03
Proteins are dynamic macromolecules which may undergo conformational transitions upon changes in environment. As it has been observed in laboratories that protein flexibility is correlated to essential biological functions, scientists have been designing various types of predictors for identifying structurally flexible regions in proteins. In this respect, there are two major categories of predictors. One category of predictors attempts to identify conformationally flexible regions through analysis of protein tertiary structures. Another category of predictors works completely based on analysis of the polypeptide sequences. As the availability of protein tertiary structures is generally limited, the design of predictors that work completely based on sequence information is crucial for advances of molecular biology research. In this article, we propose a novel approach to design a sequence-based predictor for identifying conformationally ambivalent regions in proteins. The novelty in the design stems from incorporating two classifiers based on two distinctive supervised learning algorithms that provide complementary prediction powers. Experimental results show that the overall performance delivered by the hybrid predictor proposed in this article is superior to the performance delivered by the existing predictors. Furthermore, the case study presented in this article demonstrates that the proposed hybrid predictor is capable of providing the biologists with valuable clues about the functional sites in a protein chain. The proposed hybrid predictor provides the users with two optional modes, namely, the high-sensitivity mode and the high-specificity mode. The experimental results with an independent testing data set show that the proposed hybrid predictor is capable of delivering sensitivity of 0.710 and specificity of 0.608 under the high-sensitivity mode, while delivering sensitivity of 0.451 and specificity of 0.787 under the high-specificity mode. Though experimental results show that the hybrid approach designed to exploit the complementary prediction powers of distinctive supervised learning algorithms works more effectively than conventional approaches, there exists a large room for further improvement with respect to the achieved performance. In this respect, it is of interest to investigate the effects of exploiting additional physiochemical properties that are related to conformational ambivalence. Furthermore, it is of interest to investigate the effects of incorporating lately-developed machine learning approaches, e.g. the random forest design and the multi-stage design. As conformational transition plays a key role in carrying out several essential types of biological functions, the design of more advanced predictors for identifying conformationally ambivalent regions in proteins deserves our continuous attention.
Regulatory elements in vivo in the promoter of the abscisic acid responsive gene rab17 from maize.
Busk, P K; Jensen, A B; Pagès, M
1997-06-01
The rab17 gene from maize is transcribed in late embryonic development and is responsive to abscisic acid and water stress in embryo and vegetative tissues. In vivo footprinting and transient transformation of rab17 were performed in embryos and vegetative tissues to characterize the cis-elements involved in regulation of the gene. By in vivo footprinting, protein binding was observed to nine elements in the promoter, which correspond to five putative ABREs (abscisic acid responsive elements) and four other sequences. The footprints indicated that distinct proteins interact with these elements in the two developmental stages. In transient transformation, six of the elements were important for high level expression of the rab17 promoter in embryos, whereas only three elements were important in leaves. The cis-acting sequences can be divided in embryo-specific, ABA-specific and leaf-specific elements on the basis of protein binding and the ability to confer expression of rab17. We found one positive, new element, called GRA, with the sequence CACTGGCCGCCC. This element was important for transcription in leaves but not in embryos. Two other non-ABRE elements that stimulated transcription from the rab17 promoter resemble previously described abscisic acid and drought-inducible elements. There were differences in protein binding and function of the five ABREs in the rab17 promoter. The possible reasons for these differences are discussed. The in vivo data obtained suggest that an embryo-specific pathway regulates transcription of the rab genes during development, whereas another pathway is responsible for induction in response to ABA and drought in vegetative tissues.
Kristie, T M; LeBowitz, J H; Sharp, P A
1989-01-01
The herpes simplex virus transactivator, alpha TIF, stimulates transcription of the alpha/immediate early genes via a cis-acting site containing an octamer element and a conserved flanking sequence. The alpha TIF protein, produced in a baculovirus expression system, nucleates the formation of at least two DNA--protein complexes on this regulatory element. Both of these complexes contain the ubiquitous Oct-1 protein, whose POU domain alone is sufficient to allow assembly of the alpha TIF-dependent complexes. A second member of the POU domain family, the lymphoid specific Oct-2 protein, can also be assembled into similar complexes at high concentrations of alpha TIF protein. These complexes contain at least two cellular proteins in addition to Oct-1. One of these proteins is present in both insect and HeLa cells and probably recognizes sequences in the cis element. The second cellular protein, only present in HeLa cells, probably binds by protein-protein interactions. Images PMID:2556266
Kristie, T M; LeBowitz, J H; Sharp, P A
1989-12-20
The herpes simplex virus transactivator, alpha TIF, stimulates transcription of the alpha/immediate early genes via a cis-acting site containing an octamer element and a conserved flanking sequence. The alpha TIF protein, produced in a baculovirus expression system, nucleates the formation of at least two DNA--protein complexes on this regulatory element. Both of these complexes contain the ubiquitous Oct-1 protein, whose POU domain alone is sufficient to allow assembly of the alpha TIF-dependent complexes. A second member of the POU domain family, the lymphoid specific Oct-2 protein, can also be assembled into similar complexes at high concentrations of alpha TIF protein. These complexes contain at least two cellular proteins in addition to Oct-1. One of these proteins is present in both insect and HeLa cells and probably recognizes sequences in the cis element. The second cellular protein, only present in HeLa cells, probably binds by protein-protein interactions.
Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank
2013-02-01
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Understanding the mechanisms of protein-DNA interactions
NASA Astrophysics Data System (ADS)
Lavery, Richard
2004-03-01
Structural, biochemical and thermodynamic data on protein-DNA interactions show that specific recognition cannot be reduced to a simple set of binary interactions between the partners (such as hydrogen bonds, ion pairs or steric contacts). The mechanical properties of the partners also play a role and, in the case of DNA, variations in both conformation and flexibility as a function of base sequence can be a significant factor in guiding a protein to the correct binding site. All-atom molecular modeling offers a means of analyzing the role of different binding mechanisms within protein-DNA complexes of known structure. This however requires estimating the binding strengths for the full range of sequences with which a given protein can interact. Since this number grows exponentially with the length of the binding site it is necessary to find a method to accelerate the calculations. We have achieved this by using a multi-copy approach (ADAPT) which allows us to build a DNA fragment with a variable base sequence. The results obtained with this method correlate well with experimental consensus binding sequences. They enable us to show that indirect recognition mechanisms involving the sequence dependent properties of DNA play a significant role in many complexes. This approach also offers a means of predicting protein binding sites on the basis of binding energies, which is complementary to conventional lexical techniques.
Rampello, Anthony J; Glynn, Steven E
2017-03-24
The i-AAA protease is a component of the mitochondrial quality control machinery that regulates respiration, mitochondrial dynamics, and protein import. The protease is required to select specific substrates for degradation from among the diverse complement of proteins present in mitochondria, yet the rules that govern this selection are unclear. Here, we reconstruct the yeast i-AAA protease, Yme1p, to examine the in vitro degradation of two intermembrane space chaperone subunits, Tim9 and Tim10. Yme1p degrades Tim10 more rapidly than Tim9 despite high sequence and structural similarity, and loss of Tim10 is accelerated by the disruption of conserved disulfide bonds within the substrate. An unstructured N-terminal region of Tim10 is necessary and sufficient to target the substrate to the protease through recognition of a short phenylalanine-rich motif, and the presence of similar motifs in other small Tim proteins predicts robust degradation by the protease. Together, these results identify the first specific degron sequence within a native i-AAA protease substrate. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reiche, B.; Frank, R.; Deutscher, J.
1988-08-23
Enzyme III/sup mtl/ is part of the mannitol phosphotransferase system of Staphylococcus aureus and Staphylococcus carnosus and is phosphorylated by phosphoenolpyruvate in a reaction sequence requiring enzyme I (phosphoenolpyruvate-protein phosphotransferase) and the histidine-containing protein HPr. In this paper, the authors report the isolation of III/sup mtl/ from both S. aureus and S. carnosus and the characterization of the active center. After phosphorylation of III/sup mtl/ with (/sup 32/P)PEP, enzyme I, and HPr, the phosphorylated protein was cleaved with endoproteinase GLu(C). The amino acid sequence of the S. aureus peptide carrying the phosphoryl group was found to be Gln-Val-Val-Ser-Thr-Phe-Met-Gly-Asn-Gly-Leu-Ala-Ile-Pro-His-Gly-Thr-Asp-Asp. The correspondingmore » peptide from S. carnosus shows an equal sequence except that the first residue is Ala instead of Gln. These peptides both contain a single histidyl residue which they assume to carry the phosphoryl group. All proteins of the PTS so far investigated indeed carry the phosphoryl group attached to a histidyl residue. According to sodium dodecyl sulfate gels, the molecular weight of the III/sup mtl/ proteins was found to be 15,000. They have also determined the N-terminal sequence of both proteins. Comparison of the III/sup mtl/ peptide sequences and the C-terminal part of the enzyme II/sup mtl/ of Escherichia coli reveals considerable sequence homology, which supports the suggestion that II/sup mtl/ of E. coli is a fusion protein of a soluble III protein with a membrane-bound enzyme II.« less
Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs
Lin, Tien-Ho; Bar-Joseph, Ziv
2011-01-01
Abstract Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. PMID:21999284
Nirasawa, Satoru; Nakahara, Kazuhiko; Takahashi, Saori
2018-02-27
Paenidase is the first microorganism-derived D-aspartyl endopeptidase that specifically recognizes an internal D-Asp residue to cleave [D-Asp]-X peptide bonds. Using peptide sequences obtained from the protein, we performed PCR with degenerate primers to amplify the paenidase I-encoding gene. Nucleotide sequencing revealed that mature paenidase I consists of 322 amino acid residues and that the protein is encoded as a pro-protein with a 197-amino-acid N-terminal extension compared to the mature protein. Paenidase I exhibits amino acid sequence similarity to several penicillin-binding proteins. In addition, paenidase I was classified into peptidase family S12 based on a MEROPS database search. Family S12 contains serine-type D-Ala-D-Ala carboxypeptidases that have three active site residues (Ser, Lys, and Tyr) in the conserved motifs Ser-Xaa-Thr-Lys and Tyr-Xaa-Asn. These motifs were conserved in the primary structure of paenidase I, and the role of these residues was confirmed by site-directed mutagenesis.
An Autonomous BMP2 Regulatory Element in Mesenchymal Cells
Kruithof, Boudewijn P.T.; Fritz, David T.; Liu, Yijun; Garsetti, Diane E.; Frank, David B.; Pregizer, Steven K.; Gaussin, Vinciane; Mortlock, Douglas P.; Rogers, Melissa B.
2014-01-01
BMP2 is a morphogen that controls mesenchymal cell differentiation and behavior. For example, BMP2 concentration controls the differentiation of mesenchymal precursors into myocytes, adipocytes, chondrocytes, and osteoblasts. Sequences within the 3′untranslated region (UTR) of the Bmp2 mRNA mediate a post-transcriptional block of protein synthesis. Interaction of cell and developmental stage-specific trans-regulatory factors with the 3′UTR is a nimble and versatile mechanism for modulating this potent morphogen in different cell types. We show here, that an ultra-conserved sequence in the 3′UTR functions independently of promoter, coding region, and 3′UTR context in primary and immortalized tissue culture cells and in transgenic mice. Our findings indicate that the ultra-conserved sequence is an autonomously functioning post-transcriptional element that may be used to modulate the level of BMP2 and other proteins while retaining tissue specific regulatory elements. PMID:21268088
Prediction of molecular mimicry candidates in human pathogenic bacteria.
Doxey, Andrew C; McConkey, Brendan J
2013-08-15
Molecular mimicry of host proteins is a common strategy adopted by bacterial pathogens to interfere with and exploit host processes. Despite the availability of pathogen genomes, few studies have attempted to predict virulence-associated mimicry relationships directly from genomic sequences. Here, we analyzed the proteomes of 62 pathogenic and 66 non-pathogenic bacterial species, and screened for the top pathogen-specific or pathogen-enriched sequence similarities to human proteins. The screen identified approximately 100 potential mimicry relationships including well-characterized examples among the top-scoring hits (e.g., RalF, internalin, yopH, and others), with about 1/3 of predicted relationships supported by existing literature. Examination of homology to virulence factors, statistically enriched functions, and comparison with literature indicated that the detected mimics target key host structures (e.g., extracellular matrix, ECM) and pathways (e.g., cell adhesion, lipid metabolism, and immune signaling). The top-scoring and most widespread mimicry pattern detected among pathogens consisted of elevated sequence similarities to ECM proteins including collagens and leucine-rich repeat proteins. Unexpectedly, analysis of the pathogen counterparts of these proteins revealed that they have evolved independently in different species of bacterial pathogens from separate repeat amplifications. Thus, our analysis provides evidence for two classes of mimics: complex proteins such as enzymes that have been acquired by eukaryote-to-pathogen horizontal transfer, and simpler repeat proteins that have independently evolved to mimic the host ECM. Ultimately, computational detection of pathogen-specific and pathogen-enriched similarities to host proteins provides insights into potentially novel mimicry-mediated virulence mechanisms of pathogenic bacteria.
Prediction of molecular mimicry candidates in human pathogenic bacteria
Doxey, Andrew C; McConkey, Brendan J
2013-01-01
Molecular mimicry of host proteins is a common strategy adopted by bacterial pathogens to interfere with and exploit host processes. Despite the availability of pathogen genomes, few studies have attempted to predict virulence-associated mimicry relationships directly from genomic sequences. Here, we analyzed the proteomes of 62 pathogenic and 66 non-pathogenic bacterial species, and screened for the top pathogen-specific or pathogen-enriched sequence similarities to human proteins. The screen identified approximately 100 potential mimicry relationships including well-characterized examples among the top-scoring hits (e.g., RalF, internalin, yopH, and others), with about 1/3 of predicted relationships supported by existing literature. Examination of homology to virulence factors, statistically enriched functions, and comparison with literature indicated that the detected mimics target key host structures (e.g., extracellular matrix, ECM) and pathways (e.g., cell adhesion, lipid metabolism, and immune signaling). The top-scoring and most widespread mimicry pattern detected among pathogens consisted of elevated sequence similarities to ECM proteins including collagens and leucine-rich repeat proteins. Unexpectedly, analysis of the pathogen counterparts of these proteins revealed that they have evolved independently in different species of bacterial pathogens from separate repeat amplifications. Thus, our analysis provides evidence for two classes of mimics: complex proteins such as enzymes that have been acquired by eukaryote-to-pathogen horizontal transfer, and simpler repeat proteins that have independently evolved to mimic the host ECM. Ultimately, computational detection of pathogen-specific and pathogen-enriched similarities to host proteins provides insights into potentially novel mimicry-mediated virulence mechanisms of pathogenic bacteria. PMID:23715053
Stefanowicz, Karolina; Lannoo, Nausicaä; Proost, Paul; Van Damme, Els J M
2012-01-01
The Arabidopsis thaliana genome contains a small group of bipartite F-box proteins, consisting of an N-terminal F-box domain and a C-terminal domain sharing sequence similarity with Nictaba, the jasmonate-induced glycan-binding protein (lectin) from tobacco. Based on the high sequence similarity between the C-terminal domain of these proteins and Nictaba, the hypothesis was put forward that the so-called F-box-Nictaba proteins possess carbohydrate-binding activity and accordingly can be considered functional homologs of the mammalian sugar-binding F-box or Fbs proteins which are involved in proteasomal degradation of glycoproteins. To obtain experimental evidence for the carbohydrate-binding activity and specificity of the A. thaliana F-box-Nictaba proteins, both the complete F-box-Nictaba sequence of one selected Arabidopsis F-box protein (in casu At2g02360) as well as the Nictaba-like domain only were expressed in Pichia pastoris and analyzed by affinity chromatography, agglutination assays and glycan micro-array binding assays. These results demonstrated that the C-terminal Nictaba-like domain provides the F-box-protein with a carbohydrate-binding activity that is specifically directed against N- and O-glycans containing N-acetyllactosamine (Galβ1-3GlcNAc and Galβ1-4GlcNAc) and poly-N-acetyllactosamine ([Galβ1-4GlcNAc]n) as well as Lewis A (Galβ1-3(Fucα1-4)GlcNAc), Lewis X (Galβ1-4(Fucα1-3)GlcNAc, Lewis Y (Fucα1-2Galβ1-4(Fucα1-3)GlcNAc) and blood type B (Galα1-3(Fucα1-2)Galβ1-3GlcNAc) motifs. Based on these findings one can reasonably conclude that at least the A. thaliana F-box-Nictaba protein encoded by At2g02360 can act as a carbohydrate-binding protein. The results from the glycan array assays revealed differences in sugar-binding specificity between the F-box protein and Nictaba, indicating that the same carbohydrate-binding motif can accommodate unrelated oligosaccharides.
2012-01-01
Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Identification of the neurofibromatosis type 1 gene product
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gutmann, D.H.; Wood, D.L.; Collins, F.S.
The gene for neurofibromatosis type 1 (NF1) was recently identified by positional cloning. The complete cDNA encodes a polypeptide of 2818 amino acids. To study the NF1 gene product, antibodies were raised against both fusion proteins and synthetic peptides. Initial characterization of two anti-peptide antibodies and one fusion-protein antibody demonstrated a specific protein of {approx}250 kDa by both immunoprecipitation and immunoblotting. This protein was found in all tissues and cell lines examined and is detected in human, rat, and mouse tissues. To demonstrate that these antibodies specifically recognize the NF1 protein, additional fusion proteins containing the sequence specific to themore » synthetic peptide were generated. Both peptide antisera recognize the proper specific fusion proteins so generated. Immunoprecipitates using the peptide antisera were shown to recognize the same protein detected by immunoblotting with either the other peptide antiserum or the fusion-protein antiserum. Immunoblotting using antiserum specific to spatially distinct epitopes conducted on tissue homogenates demonstrated the NF1 protein in all adult tissues. Based on the homology between the NF1 gene product and members of the GTPase-activating protein (GAP) superfamily, the name NF1-GAP-related protein (NF1GRP) is suggested.« less
Specific and non-specific interactions of ParB with DNA: implications for chromosome segregation
Taylor, James A.; Pastrana, Cesar L.; Butterer, Annika; Pernstich, Christian; Gwynn, Emma J.; Sobott, Frank; Moreno-Herrero, Fernando; Dillingham, Mark S.
2015-01-01
The segregation of many bacterial chromosomes is dependent on the interactions of ParB proteins with centromere-like DNA sequences called parS that are located close to the origin of replication. In this work, we have investigated the binding of Bacillus subtilis ParB to DNA in vitro using a variety of biochemical and biophysical techniques. We observe tight and specific binding of a ParB homodimer to the parS sequence. Binding of ParB to non-specific DNA is more complex and displays apparent positive co-operativity that is associated with the formation of larger, poorly defined, nucleoprotein complexes. Experiments with magnetic tweezers demonstrate that non-specific binding leads to DNA condensation that is reversible by protein unbinding or force. The condensed DNA structure is not well ordered and we infer that it is formed by many looping interactions between neighbouring DNA segments. Consistent with this view, ParB is also able to stabilize writhe in single supercoiled DNA molecules and to bridge segments from two different DNA molecules in trans. The experiments provide no evidence for the promotion of non-specific DNA binding and/or condensation events by the presence of parS sequences. The implications of these observations for chromosome segregation are discussed. PMID:25572315
Tokuhiro, Keizo; Miyagawa, Yasushi; Yamada, Shuichi; Hirose, Mika; Ohta, Hiroshi; Nishimune, Yoshitake; Tanaka, Hiromitsu
2007-03-01
Haspin is a unique protein kinase expressed predominantly in haploid male germ cells. The genomic structure of haspin (Gsg2) has revealed it to be intronless, and the entire transcription unit is in an intron of the integrin alphaE (Itgae) gene. Transcription occurs from a bidirectional promoter that also generates an alternatively spliced integrin alphaE-derived mRNA (Aed). In mice, the testis-specific alternative splicing of Aed is expressed bidirectionally downstream from the Gsg2 transcription initiation site, and a segment consisting of 26 bp transcribes both genomic DNA strands between Gsg2 and the Aed transcription initiation sites. To investigate the mechanisms for this unique gene regulation, we cloned and characterized the Gsg2 promoter region. The 193-bp genomic fragment from the 5' end of the Gsg2 and Aed genes, fused with EGFP and DsRed genes, drove the expression of both proteins in haploid germ cells of transgenic mice. This promoter element contained only a GC-rich sequence, and not the previously reported DNA sequences known to bind various transcription factors--with the exception of E2F1, TCFAP2A1 (AP2), and SP1. Here, we show that the 193-bp DNA sequence is sufficient for the specific, bidirectional, and synchronous expression in germ cells in the testis. We also demonstrate the existence of germ cell nuclear factors specifically bound to the promoter sequence. This activity may be regulated by binding to the promoter sequence with germ cell-specific nuclear complex(es) without regulation via DNA methylation.
Paca-Uccaralertkun, S; Zhao, L J; Adya, N; Cross, J V; Cullen, B R; Boros, I M; Giam, C Z
1994-01-01
The human T-cell lymphotropic virus type I (HTLV-I) transactivator, Tax, the ubiquitous transcriptional factor cyclic AMP (cAMP) response element-binding protein (CREB protein), and the 21-bp repeats in the HTLV-I transcriptional enhancer form a ternary nucleoprotein complex (L. J. Zhao and C. Z. Giam, Proc. Natl. Acad. Sci. USA 89:7070-7074, 1992). Using an antibody directed against the COOH-terminal region of Tax along with purified Tax and CREB proteins, we selected DNA elements bound specifically by the Tax-CREB complex in vitro. Two distinct but related groups of sequences containing the cAMP response element (CRE) flanked by long runs of G and C residues in the 5' and 3' regions, respectively, were preferentially recognized by Tax-CREB. In contrast, CREB alone binds only to CRE motifs (GNTGACG[T/C]) without neighboring G- or C-rich sequences. The Tax-CREB-selected sequences bear a striking resemblance to the 5' or 3' two-thirds of the HTLV-I 21-bp repeats and are highly inducible by Tax. Gel electrophoretic mobility shift assays, DNA transfection, and DNase I footprinting analyses indicated that the G- and C-rich sequences flanking the CRE motif are crucial for Tax-CREB-DNA ternary complex assembly and Tax transactivation but are not in direct contact with the Tax-CREB complex. These data show that Tax recruits CREB to form a multiprotein complex that specifically recognizes the viral 21-bp repeats. The expanded DNA binding specificity of Tax-CREB and the obligatory role the ternary Tax-CREB-DNA complex plays in transactivation reveal a novel mechanism for regulating the transcriptional activity of leucine zipper proteins like CREB.
Mimtags: the use of phage display technology to produce novel protein-specific probes.
Ahmed, Nayyar; Dhanapala, Pathum; Sadli, Nadia; Barrow, Colin J; Suphioglu, Cenk
2014-03-01
In recent times the use of protein-specific probes in the field of proteomics has undergone evolutionary changes leading to the discovery of new probing techniques. Protein-specific probes serve two main purposes: epitope mapping and detection assays. One such technique is the use of phage display in the random selection of peptide mimotopes (mimtags) that can tag epitopes of proteins, replacing the use of monoclonal antibodies in detection systems. In this study, phage display technology was used to screen a random peptide library with a biologically active purified human interleukin-4 receptor (IL-4R) and interleukin-13 (IL-13) to identify mimtag candidates that interacted with these proteins. Once identified, the mimtags were commercially synthesised, biotinylated and used for in vitro immunoassays. We have used phage display to identify M13 phage clones that demonstrated specific binding to IL-4R and IL-13 cytokine. A consensus in binding sequences was observed and phage clones characterised had identical peptide sequence motifs. Only one was synthesised for use in further immunoassays, demonstrating significant binding to either IL-4R or IL-13. We have successfully shown the use of phage display to identify and characterise mimtags that specifically bind to their target epitope. Thus, this new method of probing proteins can be used in the future as a novel tool for immunoassay and detection technique, which is cheaper and more rapidly produced and therefore a better alternative to the use of monoclonal antibodies. Copyright © 2014 Elsevier B.V. All rights reserved.
Conservation and variability of West Nile virus proteins.
Koo, Qi Ying; Khan, Asif M; Jung, Keun-Ok; Ramdas, Shweta; Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir; Salmon, Jerome; August, J Thomas
2009-01-01
West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of < or = 1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (< or = 10% of the WNV sequences analyzed). Eighty-eight fragments of length 9-29 amino acids, representing approximately 34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivirus infections, and for studies of homologous sequences among other flaviviruses.
Nong, Rachel Yuan; Wu, Di; Yan, Junhong; Hammond, Maria; Gu, Gucci Jijuan; Kamali-Moghaddam, Masood; Landegren, Ulf; Darmanis, Spyros
2013-06-01
Solid-phase proximity ligation assays share properties with the classical sandwich immunoassays for protein detection. The proteins captured via antibodies on solid supports are, however, detected not by single antibodies with detectable functions, but by pairs of antibodies with attached DNA strands. Upon recognition by these sets of three antibodies, pairs of DNA strands brought in proximity are joined by ligation. The ligated reporter DNA strands are then detected via methods such as real-time PCR or next-generation sequencing (NGS). We describe how to construct assays that can offer improved detection specificity by virtue of recognition by three antibodies, as well as enhanced sensitivity owing to reduced background and amplified detection. Finally, we also illustrate how the assays can be applied for parallel detection of proteins, taking advantage of the oligonucleotide ligation step to avoid background problems that might arise with multiplexing. The protocol for the singleplex solid-phase proximity ligation assay takes ~5 h. The multiplex version of the assay takes 7-8 h depending on whether quantitative PCR (qPCR) or sequencing is used as the readout. The time for the sequencing-based protocol includes the library preparation but not the actual sequencing, as times may vary based on the choice of sequencing platform.
Sequence similarity is more relevant than species specificity in probabilistic backtranslation.
Ferro, Alfredo; Giugno, Rosalba; Pigola, Giuseppe; Pulvirenti, Alfredo; Di Pietro, Cinzia; Purrello, Michele; Ragusa, Marco
2007-02-21
Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Pi, J; Wookey, P J; Pittard, A J
1991-01-01
The phenylalanine-specific permease gene (pheP) of Escherichia coli has been cloned and sequenced. The gene was isolated on a 6-kb Sau3AI fragment from a chromosomal library, and its presence was verified by complementation of a mutant lacking the functional phenylalanine-specific permease. Subcloning from this fragment localized the pheP gene on a 2.7-kb HindIII-HindII fragment. The nucleotide sequence of this 2.7-kb region was determined. An open reading frame was identified which extends from a putative start point of translation (GTG at position 636) to a termination signal (TAA at position 2010). The assignment of the GTG as the initiation codon was verified by site-directed mutagenesis of the initiation codon and by introducing a chain termination mutation into the pheP-lacZ fusion construct. A single initiation site of transcription 30 bp upstream of the start point of translation was identified by the primer extension analysis. The pheP structural gene consists of 1,374 nucleotides specifying a protein of 458 amino acid residues. The PheP protein is very hydrophobic (71% nonpolar residues). A topological model predicted from the sequence analysis defines 12 transmembrane segments. This protein is highly homologous with the AroP (general aromatic transport) system of E. coli (59.6% identity) and to a lesser extent with the yeast permeases CAN1 (arginine), PUT4 (proline), and HIP1 (histidine) of Saccharomyces cerevisiae. Images PMID:1711024
Zhao, Xiaowei; Ning, Qiao; Ai, Meiyue; Chai, Haiting; Yang, Guifu
2016-06-07
As a selective and reversible protein post-translational modification, S-glutathionylation generates mixed disulfides between glutathione (GSH) and cysteine residues, and plays an important role in regulating protein activity, stability, and redox regulation. To fully understand S-glutathionylation mechanisms, identification of substrates and specific S-Glutathionylated sites is crucial. Experimental identification of S-glutathionylated sites is labor-intensive and time consuming, so establishing an effective computational method is much desirable due to their convenient and fast speed. Therefore, in this study, a new bioinformatics tool named SSGlu (Species-Specific identification of Protein S-glutathionylation Sites) was developed to identify species-specific protein S-glutathionylated sites, utilizing support vector machines that combine multiple sequence-derived features with a two-step feature selection. By 5-fold cross validation, the performance of SSGlu was measured with an AUC of 0.8105 and 0.8041 for Homo sapiens and Mus musculus, respectively. Additionally, SSGlu was compared with the existing methods, and the higher MCC and AUC of SSGlu demonstrated that SSGlu was very promising to predict S-glutathionylated sites. Furthermore, a site-specific analysis showed that S-glutathionylation intimately correlated with the features derived from its surrounding sites. The conclusions derived from this study might help to understand more of the S-glutathionylation mechanism and guide the related experimental validation. For public access, SSGlu is freely accessible at http://59.73.198.144:8080/SSGlu/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Aggarwal, A; Adam, R D; Nash, T E
1989-01-01
The amino acid sequence of a 29.4-kilodalton [corrected] structural protein located in the ventral disk and axostyle of Giardia lamblia was determined. Clone lambda M16 from a mung bean expression library in lambda gt11 expressed a fusion protein recognized by three different isolate-specific antisera and sera from G. lamblia-infected gerbils. One of the three EcoRI fragments (M16; 1.26 kilobases) encoded the recognized protein. Sequence analysis revealed a single open reading frame of 813 base pairs. Two areas showed conservation of the positions of some amino acids. The abundance of arginine, glutamic acid, and threonine was increased. Two potential alpha-helical regions were deduced in the regions of repeats. Antisera to the M16 fusion protein reacted specifically with internal components of the ventral disk and axostyle, as well as Giardia fractions enriched for ventral disk structural proteins. An identical protein was recognized in different isolates by anti-M16, and a single identical band was recognized in Southern blots using the M16 1.26-kilobase fragment as a probe. Therefore, the 29.4-kilodaltion [corrected] protein appears to be highly conserved compared with variant surface proteins. Images PMID:2925253