Sample records for protein promoter sequences

  1. Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach

    PubMed Central

    Meyer, Pablo; Siwo, Geoffrey; Zeevi, Danny; Sharon, Eilon; Norel, Raquel; Segal, Eran; Stolovitzky, Gustavo; Siwo, Geoffrey; Rider, Andrew K.; Tan, Asako; Pinapati, Richard S.; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael T.; Tung, Yi-An; Chen, Yong-Syuan; Chen, Mei-Ju May; Chen, Chien-Yu; Knight, Jason M.; Sahraeian, Sayed Mohammad Ebrahim; Esfahani, Mohammad Shahrokh; Dreos, Rene; Bucher, Philipp; Maier, Ezekiel; Saeys, Yvan; Szczurek, Ewa; Myšičková, Alena; Vingron, Martin; Klein, Holger; Kiełbasa, Szymon M.; Knisley, Jeff; Bonnell, Jeff; Knisley, Debra; Kursa, Miron B.; Rudnicki, Witold R.; Bhattacharjee, Madhuchhanda; Sillanpää, Mikko J.; Yeung, James; Meysman, Pieter; Rodríguez, Aminael Sánchez; Engelen, Kristof; Marchal, Kathleen; Huang, Yezhou; Mordelet, Fantine; Hartemink, Alexander; Pinello, Luca; Yuan, Guo-Cheng

    2013-01-01

    The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites. PMID:23950146

  2. A gene-specific non-enhancer sequence is critical for expression from the promoter of the small heat shock protein gene αB-crystallin

    PubMed Central

    2014-01-01

    Background Deciphering of the information content of eukaryotic promoters has remained confined to universal landmarks and conserved sequence elements such as enhancers and transcription factor binding motifs, which are considered sufficient for gene activation and regulation. Gene-specific sequences, interspersed between the canonical transacting factor binding sites or adjoining them within a promoter, are generally taken to be devoid of any regulatory information and have therefore been largely ignored. An unanswered question therefore is, do gene-specific sequences within a eukaryotic promoter have a role in gene activation? Here, we present an exhaustive experimental analysis of a gene-specific sequence adjoining the heat shock element (HSE) in the proximal promoter of the small heat shock protein gene, αB-crystallin (cryab). These sequences are highly conserved between the rodents and the humans. Results Using human retinal pigment epithelial cells in culture as the host, we have identified a 10-bp gene-specific promoter sequence (GPS), which, unlike an enhancer, controls expression from the promoter of this gene, only when in appropriate position and orientation. Notably, the data suggests that GPS in comparison with the HSE works in a context-independent fashion. Additionally, when moved upstream, about a nucleosome length of DNA (−154 bp) from the transcription start site (TSS), the activity of the promoter is markedly inhibited, suggesting its involvement in local promoter access. Importantly, we demonstrate that deletion of the GPS results in complete loss of cryab promoter activity in transgenic mice. Conclusions These data suggest that gene-specific sequences such as the GPS, identified here, may have critical roles in regulating gene-specific activity from eukaryotic promoters. PMID:24589182

  3. Cross-Specificities between cII-like Proteins and pRE-like Promoters of Lambdoid Bacteriophages

    PubMed Central

    Wulff, Daniel L.; Mahoney, Michael E.

    1987-01-01

    We have investigated the activation of transcription from the pRE promoters of phages λ, 21 and P22 by the λ and 21 cII proteins and the P22 c1 (cII-like) protein, using an in vivo system in which cII protein from a derepressed prophage activates transcription from a pRE DNA fragment on a multicopy plasmid. We find that each protein is highly specific for its own cognate pRE promoter, although measureable cross-reactions are observed. The primary recognition sequence for cII protein on λ pRE is a pair of TTGC repeat sequences in the sequence 5'-TTGCN 6TTGC-3' at the -35 region of the promoter. This same sequence is found in 21 pRE, while P22 pRE has the sequence 5'-TTGCN6TTGT-3', which is the same as that of λctr1, a pRE+ variant of λ. λctr1 pRE is half as active as λ + pRE when assayed with either the λ cII or the P22 c1 proteins. Therefore, the single base change in the P22 repeat sequence cannot explain why the P22 c1 protein is much more active with P22 pRE than λ p RE. The dya5 mutation, a G→A change at position -43 of pRE, makes pRE a stronger promoter when assayed with either the λ or 21 cII proteins or the P22 c1 protein. We conclude that efficient activation of a cII-dependent promoter by a cII protein requires sequence information in addition to the TTGC repeat sequences. We do not know the characteristics of the proteins which are responsible for the specificity of each protein for its own cognate promoter. However, λdya8, which has a Glu27→Lys alteration in the λ cII protein and a cII+ phenotype, results in a mutant cII protein that is much more highly specific than wild-type cII protein for its own cognate λ p RE promoter. This is especially remarkable because the dya8 amino acid alteration makes the helix-2 region (the region of the protein predicted to make contact with the phosphodiester backbone of the DNA) of λ cII protein conform exactly with the helix-2 region of the P22 c1 protein in both charge and charge distribution. PMID

  4. Effective Feature Selection for Classification of Promoter Sequences.

    PubMed

    K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

    2016-01-01

    Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  5. Avoidance of truncated proteins from unintended ribosome binding sites within heterologous protein coding sequences.

    PubMed

    Whitaker, Weston R; Lee, Hanson; Arkin, Adam P; Dueber, John E

    2015-03-20

    Genetic sequences ported into non-native hosts for synthetic biology applications can gain unexpected properties. In this study, we explored sequences functioning as ribosome binding sites (RBSs) within protein coding DNA sequences (CDSs) that cause internal translation, resulting in truncated proteins. Genome-wide prediction of bacterial RBSs, based on biophysical calculations employed by the RBS calculator, suggests a selection against internal RBSs within CDSs in Escherichia coli, but not those in Saccharomyces cerevisiae. Based on these calculations, silent mutations aimed at removing internal RBSs can effectively reduce truncation products from internal translation. However, a solution for complete elimination of internal translation initiation is not always feasible due to constraints of available coding sequences. Fluorescence assays and Western blot analysis showed that in genes with internal RBSs, increasing the strength of the intended upstream RBS had little influence on the internal translation strength. Another strategy to minimize truncated products from an internal RBS is to increase the relative strength of the upstream RBS with a concomitant reduction in promoter strength to achieve the same protein expression level. Unfortunately, lower transcription levels result in increased noise at the single cell level due to stochasticity in gene expression. At the low expression regimes desired for many synthetic biology applications, this problem becomes particularly pronounced. We found that balancing promoter strengths and upstream RBS strengths to intermediate levels can achieve the target protein concentration while avoiding both excessive noise and truncated protein.

  6. Protein Hydrolysates as Promoters of Non-Haem Iron Absorption

    PubMed Central

    Li, Yanan; Jiang, Han; Huang, Guangrong

    2017-01-01

    Iron (Fe) is an essential micronutrient for human growth and health. Organic iron is an excellent iron supplement due to its bioavailability. Both amino acids and peptides improve iron bioavailability and absorption and are therefore valuable components of iron supplements. This review focuses on protein hydrolysates as potential promoters of iron absorption. The ability of protein hydrolysates to chelate iron is thought to be a key attribute for the promotion of iron absorption. Iron-chelatable protein hydrolysates are categorized by their absorption forms: amino acids, di- and tri-peptides and polypeptides. Their structural characteristics, including their size and amino acid sequence, as well as the presence of special amino acids, influence their iron chelation abilities and bioavailabilities. Protein hydrolysates promote iron absorption by keeping iron soluble, reducing ferric iron to ferrous iron, and promoting transport across cell membranes into the gut. We also discuss the use and relative merits of protein hydrolysates as iron supplements. PMID:28617327

  7. Isolation and characterization of the promoter sequence of a cassava gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in storage roots.

    PubMed

    de Souza, C R; Aragão, F J; Moreira, E C O; Costa, C N M; Nascimento, S B; Carvalho, L J

    2009-03-24

    Cassava is one of the most important tropical food crops for more than 600 million people worldwide. Transgenic technologies can be useful for increasing its nutritional value and its resistance to viral diseases and insect pests. However, tissue-specific promoters that guarantee correct expression of transgenes would be necessary. We used inverse polymerase chain reaction to isolate a promoter sequence of the Mec1 gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in cassava storage roots. In silico analysis revealed putative cis-acting regulatory elements within this promoter sequence, including root-specific elements that may be required for its expression in vascular tissues. Transient expression experiments showed that the Mec1 promoter is functional, since this sequence was able to drive GUS expression in bean embryonic axes. Results from our computational analysis can serve as a guide for functional experiments to identify regions with tissue-specific Mec1 promoter activity. The DNA sequence that we identified is a new promoter that could be a candidate for genetic engineering of cassava roots.

  8. Special AT-rich sequence binding protein 1 promotes tumor growth and metastasis of esophageal squamous cell carcinoma.

    PubMed

    Ma, Jun; Wu, Kaiming; Zhao, Zhenxian; Miao, Rong; Xu, Zhe

    2017-03-01

    Esophageal squamous cell carcinoma is one of the most aggressive malignancies worldwide. Special AT-rich sequence binding protein 1 is a nuclear matrix attachment region binding protein which participates in higher order chromatin organization and tissue-specific gene expression. However, the role of special AT-rich sequence binding protein 1 in esophageal squamous cell carcinoma remains unknown. In this study, western blot and quantitative real-time polymerase chain reaction analysis were performed to identify differentially expressed special AT-rich sequence binding protein 1 in a series of esophageal squamous cell carcinoma tissue samples. The effects of special AT-rich sequence binding protein 1 silencing by two short-hairpin RNAs on cell proliferation, migration, and invasion were assessed by the CCK-8 assay and transwell assays in esophageal squamous cell carcinoma in vitro. Special AT-rich sequence binding protein 1 was significantly upregulated in esophageal squamous cell carcinoma tissue samples and cell lines. Silencing of special AT-rich sequence binding protein 1 inhibited the proliferation of KYSE450 and EC9706 cells which have a relatively high level of special AT-rich sequence binding protein 1, and the ability of migration and invasion of KYSE450 and EC9706 cells was distinctly suppressed. Special AT-rich sequence binding protein 1 could be a potential target for the treatment of esophageal squamous cell carcinoma and inhibition of special AT-rich sequence binding protein 1 may provide a new strategy for the prevention of esophageal squamous cell carcinoma invasion and metastasis.

  9. Interplay between Chaperones and Protein Disorder Promotes the Evolution of Protein Networks

    PubMed Central

    Pechmann, Sebastian; Frydman, Judith

    2014-01-01

    Evolution is driven by mutations, which lead to new protein functions but come at a cost to protein stability. Non-conservative substitutions are of interest in this regard because they may most profoundly affect both function and stability. Accordingly, organisms must balance the benefit of accepting advantageous substitutions with the possible cost of deleterious effects on protein folding and stability. We here examine factors that systematically promote non-conservative mutations at the proteome level. Intrinsically disordered regions in proteins play pivotal roles in protein interactions, but many questions regarding their evolution remain unanswered. Similarly, whether and how molecular chaperones, which have been shown to buffer destabilizing mutations in individual proteins, generally provide robustness during proteome evolution remains unclear. To this end, we introduce an evolutionary parameter λ that directly estimates the rate of non-conservative substitutions. Our analysis of λ in Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens sequences reveals how co- and post-translationally acting chaperones differentially promote non-conservative substitutions in their substrates, likely through buffering of their destabilizing effects. We further find that λ serves well to quantify the evolution of intrinsically disordered proteins even though the unstructured, thus generally variable regions in proteins are often flanked by very conserved sequences. Crucially, we show that both intrinsically disordered proteins and highly re-wired proteins in protein interaction networks, which have evolved new interactions and functions, exhibit a higher λ at the expense of enhanced chaperone assistance. Our findings thus highlight an intricate interplay of molecular chaperones and protein disorder in the evolvability of protein networks. Our results illuminate the role of chaperones in enabling protein evolution, and underline the importance of the cellular

  10. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  11. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.

    1987-01-01

    The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homologymore » (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.« less

  12. Sequence repeats and protein structure

    NASA Astrophysics Data System (ADS)

    Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

    2012-11-01

    Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.

  13. COUP-TF (chicken ovalbumin upstream promoter transcription factor)-interacting protein 1 (CTIP1) is a sequence-specific DNA binding protein.

    PubMed Central

    Avram, Dorina; Fields, Andrew; Senawong, Thanaset; Topark-Ngarm, Acharawan; Leid, Mark

    2002-01-01

    Chicken ovalbumin upstream promoter transcription factor (COUP-TF)-interacting proteins 1 and 2 [CTIP1/Evi9/B cell leukaemia (Bcl) l1a and CTIP2/Bcl11b respectively] are highly related C(2)H(2) zinc finger proteins that are abundantly expressed in brain and the immune system, and are associated with immune system malignancies. A selection procedure was employed to isolate high-affinity DNA binding sites for CTIP1. The core binding site on DNA identified in these studies, 5'-GGCCGG-3' (upper strand), is highly related to the canonical GC box and was bound by a CTIP1 oligomeric complex(es) in vitro. Furthermore, both CTIP1 and CTIP2 repressed transcription of a reporter gene harbouring a multimerized CTIP binding site, and this repression was neither reversed by trichostatin A (an inhibitor of known class I and II histone deacetylases) nor stimulated by co-transfection of a COUP-TF family member. These results demonstrate that CTIP1 is a sequence-specific DNA binding protein and a bona fide transcriptional repressor that is capable of functioning independently of COUP-TF family members. These findings may be relevant to the physiological and/or pathological action(s) of CTIPs in cells that do not express COUP-TF family members, such as cells of the haematopoietic and immune systems. PMID:12196208

  14. Cloning, sequencing, and expression of dnaK-operon proteins from the thermophilic bacterium Thermus thermophilus.

    PubMed

    Osipiuk, J; Joachimiak, A

    1997-09-12

    We propose that the dnaK operon of Thermus thermophilus HB8 is composed of three functionally linked genes: dnaK, grpE, and dnaJ. The dnaK and dnaJ gene products are most closely related to their cyanobacterial homologs. The DnaK protein sequence places T. thermophilus in the plastid Hsp70 subfamily. In contrast, the grpE translated sequence is most similar to GrpE from Clostridium acetobutylicum, a Gram-positive anaerobic bacterium. A single promoter region, with homology to the Escherichia coli consensus promoter sequences recognized by the sigma70 and sigma32 transcription factors, precedes the postulated operon. This promoter is heat-shock inducible. The dnaK mRNA level increased more than 30 times upon 10 min of heat shock (from 70 degrees C to 85 degrees C). A strong transcription terminating sequence was found between the dnaK and grpE genes. The individual genes were cloned into pET expression vectors and the thermophilic proteins were overproduced at high levels in E. coli and purified to homogeneity. The recombinant T. thermophilus DnaK protein was shown to have a weak ATP-hydrolytic activity, with an optimum at 90 degrees C. The ATPase was stimulated by the presence of GrpE and DnaJ. Another open reading frame, coding for ClpB heat-shock protein, was found downstream of the dnaK operon.

  15. Varicella Zoster Virus Promoter Sequences

    DTIC Science & Technology

    1994-01-01

    Fragments from each of the recombinant plasmids were excised to determine the optimal sequences used for promoter function. The ExonucleaseIII/ Mung ...activation o f these two herpesvirus l ate promoters by vzv is str i kingly similar . 4 . Definition of the functional promo ter for the early/late ILl

  16. Conserved regulatory elements of the promoter sequence of the gene rpoH of enteric bacteria

    PubMed Central

    Ramírez-Santos, Jesús; Collado-Vides, Julio; García-Varela, Martin; Gómez-Eichelmann, M. Carmen

    2001-01-01

    The rpoH regulatory region of different members of the enteric bacteria family was sequenced or downloaded from GenBank and compared. In addition, the transcriptional start sites of rpoH of Yersinia frederiksenii and Proteus mirabilis, two distant members of this family, were determined. Sequences similar to the σ70 promoters P1, P4 and P5, to the σE promoter P3 and to boxes DnaA1, DnaA2, cAMP receptor protein (CRP) boxes CRP1, CRP2 and box CytR present in Escherichia coli K12, were identified in sequences of closely related bacteria such as: E.coli, Shigella flexneri, Salmonella enterica serovar Typhimurium, Citrobacter freundii, Enterobacter cloacae and Klebsiella pneumoniae. In more distant bacteria, Y.frederiksenii and P.mirabilis, the rpoH regulatory region has a distal P1-like σ70 promoter and two proximal promoters: a heat-induced σE-like promoter and a σ70 promoter. Sequences similar to the regulatory boxes were not identified in these bacteria. This study suggests that the general pattern of transcription of the rpoH gene in enteric bacteria includes a distal σ70 promoter, >200 nt upstream of the initiation codon, and two proximal promoters: a heat-induced σE-like promoter and a σ70 promoter. A second proximal σ70 promoter under catabolite-regulation is probably present only in bacteria closely related to E.coli. PMID:11139607

  17. Promoter Sequences Prediction Using Relational Association Rule Mining

    PubMed Central

    Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely

    2012-01-01

    In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233

  18. A two-step recognition of signal sequences determines the translocation efficiency of proteins.

    PubMed Central

    Belin, D; Bost, S; Vassalli, J D; Strub, K

    1996-01-01

    The cytosolic and secreted, N-glycosylated, forms of plasminogen activator inhibitor-2 (PAI-2) are generated by facultative translocation. To study the molecular events that result in the bi-topological distribution of proteins, we determined in vitro the capacities of several signal sequences to bind the signal recognition particle (SRP) during targeting, and to promote vectorial transport of murine PAI-2 (mPAI-2). Interestingly, the six signal sequences we compared (mPAI-2 and three mutated derivatives thereof, ovalbumin and preprolactin) were found to have the differential activities in the two events. For example, the mPAI-2 signal sequence first binds SRP with moderate efficiency and secondly promotes the vectorial transport of only a fraction of the SRP-bound nascent chains. Our results provide evidence that the translocation efficiency of proteins can be controlled by the recognition of their signal sequences at two steps: during SRP-mediated targeting and during formation of a committed translocation complex. This second recognition may occur at several time points during the insertion/translocation step. In conclusion, signal sequences have a more complex structure than previously anticipated, allowing for multiple and independent interactions with the translocation machinery. Images PMID:8599930

  19. A two-step recognition of signal sequences determines the translocation efficiency of proteins.

    PubMed

    Belin, D; Bost, S; Vassalli, J D; Strub, K

    1996-02-01

    The cytosolic and secreted, N-glycosylated, forms of plasminogen activator inhibitor-2 (PAI-2) are generated by facultative translocation. To study the molecular events that result in the bi-topological distribution of proteins, we determined in vitro the capacities of several signal sequences to bind the signal recognition particle (SRP) during targeting, and to promote vectorial transport of murine PAI-2 (mPAI-2). Interestingly, the six signal sequences we compared (mPAI-2 and three mutated derivatives thereof, ovalbumin and preprolactin) were found to have the differential activities in the two events. For example, the mPAI-2 signal sequence first binds SRP with moderate efficiency and secondly promotes the vectorial transport of only a fraction of the SRP-bound nascent chains. Our results provide evidence that the translocation efficiency of proteins can be controlled by the recognition of their signal sequences at two steps: during SRP-mediated targeting and during formation of a committed translocation complex. This second recognition may occur at several time points during the insertion/translocation step. In conclusion, signal sequences have a more complex structure than previously anticipated, allowing for multiple and independent interactions with the translocation machinery.

  20. Recognising promoter sequences using an artificial immune system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cooke, D.E.; Hunt, J.E.

    1995-12-31

    We have developed an artificial immune system (AIS) which is based on the human immune system. The AIS possesses an adaptive learning mechanism which enables antibodies to emerge which can be used for classification tasks. In this paper, we describe how the AIS has been used to evolve antibodies which can classify promoter containing and promoter negative DNA sequences. The DNA sequences used for teaching were 57 nucleotides in length and contained procaryotic promoters. The system classified previously unseen DNA sequences with an accuracy of approximately 90%.

  1. Protein Interaction Profile Sequencing (PIP-seq).

    PubMed

    Foley, Shawn W; Gregory, Brian D

    2016-10-10

    Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  2. Graphene Nanopores for Protein Sequencing.

    PubMed

    Wilson, James; Sloman, Leila; He, Zhiren; Aksimentiev, Aleksei

    2016-07-19

    An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)-parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide's charge density or increasing the peptide's hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible.

  3. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    PubMed Central

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  4. Predicting protein crystallization propensity from protein sequence

    PubMed Central

    2011-01-01

    The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein’s propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein’s propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor. PMID:20177794

  5. Shotgun Protein Sequencing with Meta-contig Assembly*

    PubMed Central

    Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

    2012-01-01

    Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278

  6. Shotgun protein sequencing with meta-contig assembly.

    PubMed

    Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

    2012-10-01

    Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.

  7. PUTATIVE GENE PROMOTER SEQUENCES IN THE CHLORELLA VIRUSES

    PubMed Central

    Fitzgerald, Lisa A.; Boucher, Philip T.; Yanai-Balser, Giane; Suhre, Karsten; Graves, Michael V.; Van Etten, James L.

    2008-01-01

    Three short (7 to 9 nucleotides) highly conserved nucleotide sequences were identified in the putative promoter regions (150 bp upstream and 50 bp downstream of the ATG translation start site) of three members of the genus Chlorovirus, family Phycodnaviridae. Most of these sequences occurred in similar locations within the defined promoter regions. The sequence and location of the motifs were often conserved among homologous ORFs within the Chlorovirus family. One of these conserved sequences (AATGACA) is predominately associated with genes expressed early in virus replication. PMID:18768195

  8. Metamorphic Proteins: Emergence of Dual Protein Folds from One Primary Sequence.

    PubMed

    Lella, Muralikrishna; Mahalakshmi, Radhakrishnan

    2017-06-20

    Every amino acid exhibits a different propensity for distinct structural conformations. Hence, decoding how the primary amino acid sequence undergoes the transition to a defined secondary structure and its final three-dimensional fold is presently considered predictable with reasonable certainty. However, protein sequences that defy the first principles of secondary structure prediction (they attain two different folds) have recently been discovered. Such proteins, aptly named metamorphic proteins, decrease the conformational constraint by increasing flexibility in the secondary structure and thereby result in efficient functionality. In this review, we discuss the major factors driving the conformational switch related both to protein sequence and to structure using illustrative examples. We discuss the concept of an evolutionary transition in sequence and structure, the functional impact of the tertiary fold, and the pressure of intrinsic and external factors that give rise to metamorphic proteins. We mainly focus on the major components of protein architecture, namely, the α-helix and β-sheet segments, which are involved in conformational switching within the same or highly similar sequences. These chameleonic sequences are widespread in both cytosolic and membrane proteins, and these folds are equally important for protein structure and function. We discuss the implications of metamorphic proteins and chameleonic peptide sequences in de novo peptide design.

  9. Regulation of expression of the ada gene controlling the adaptive response. Interactions with the ada promoter of the Ada protein and RNA polymerase.

    PubMed

    Sakumi, K; Sekiguchi, M

    1989-01-20

    The Ada protein of Escherichia coli catalyzes transfer of methyl groups from methylated DNA to its own molecule, and the methylated form of Ada protein promotes transcription of its own gene, ada. Using an in vitro reconstituted system, we found that both the sigma factor and the methylated Ada protein are required for transcription of the ada gene. To elucidate molecular mechanisms involved in the regulation of the ada transcription, we investigated interactions of the non-methylated and methylated forms of Ada protein and the RNA polymerase holo enzyme (the core enzyme and sigma factor) with a DNA fragment carrying the ada promoter region. Footprinting analyses revealed that the methylated Ada protein binds to a region from positions -63 to -31, which includes the ada regulatory sequence AAAGCGCA. No firm binding was observed with the non-methylated Ada protein, although some DNase I-hypersensitive sites were produced in the promoter by both types of Ada protein. RNA polymerase did bind to the promoter once the methylated Ada protein had bound to the upstream sequence. To correlate these phenomena with the process in vivo, we used the DNAs derived from promoter-defective mutants. No binding of Ada protein nor of RNA polymerase occurred with a mutant DNA having a C to G substitution at position -47 within the ada regulatory sequence. In the case of a -35 box mutant with a T to A change at position -34, the methylated Ada protein did bind to the ada regulatory sequence, yet there was no RNA polymerase binding. Thus, the binding of the methylated Ada protein to the upstream region apparently facilitates binding of the RNA polymerase to the proper region of the promoter. The Ada protein possesses two known methyl acceptor sites, Cys69 and Cys321. The role of methylation of each cysteine residue was investigated using mutant forms of the Ada protein. The Ada protein with the cysteine residue at position 69 replaced by alanine was incapable of binding to the ada

  10. Folding and Stabilization of Native-Sequence-Reversed Proteins

    PubMed Central

    Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong

    2016-01-01

    Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844

  11. Folding and Stabilization of Native-Sequence-Reversed Proteins

    NASA Astrophysics Data System (ADS)

    Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong

    2016-04-01

    Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.

  12. Establishing homologies in protein sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

    1983-01-01

    Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.

  13. Mammalian amyloidogenic proteins promote prion nucleation in yeast.

    PubMed

    Chandramowlishwaran, Pavithra; Sun, Meng; Casey, Kristin L; Romanyuk, Andrey V; Grizel, Anastasiya V; Sopova, Julia V; Rubel, Aleksandr A; Nussbaum-Krammer, Carmen; Vorberg, Ina M; Chernoff, Yury O

    2018-03-02

    Fibrous cross-β aggregates (amyloids) and their transmissible forms (prions) cause diseases in mammals (including humans) and control heritable traits in yeast. Initial nucleation of a yeast prion by transiently overproduced prion-forming protein or its (typically, QN-rich) prion domain is efficient only in the presence of another aggregated (in most cases, QN-rich) protein. Here, we demonstrate that a fusion of the prion domain of yeast protein Sup35 to some non-QN-rich mammalian proteins, associated with amyloid diseases, promotes nucleation of Sup35 prions in the absence of pre-existing aggregates. In contrast, both a fusion of the Sup35 prion domain to a multimeric non-amyloidogenic protein and the expression of a mammalian amyloidogenic protein that is not fused to the Sup35 prion domain failed to promote prion nucleation, further indicating that physical linkage of a mammalian amyloidogenic protein to the prion domain of a yeast protein is required for the nucleation of a yeast prion. Biochemical and cytological approaches confirmed the nucleation of protein aggregates in the yeast cell. Sequence alterations antagonizing or enhancing amyloidogenicity of human amyloid-β (associated with Alzheimer's disease) and mouse prion protein (associated with prion diseases), respectively, antagonized or enhanced nucleation of a yeast prion by these proteins. The yeast-based prion nucleation assay, developed in our work, can be employed for mutational dissection of amyloidogenic proteins. We anticipate that it will aid in the identification of chemicals that influence initial amyloid nucleation and in searching for new amyloidogenic proteins in a variety of proteomes. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  14. Protein Sequencing with Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Ziady, Assem G.; Kinter, Michael

    The recent introduction of electrospray ionization techniques that are suitable for peptides and whole proteins has allowed for the design of mass spectrometric protocols that provide accurate sequence information for proteins. The advantages gained by these approaches over traditional Edman Degradation sequencing include faster analysis and femtomole, sometimes attomole, sensitivity. The ability to efficiently identify proteins has allowed investigators to conduct studies on their differential expression or modification in response to various treatments or disease states. In this chapter, we discuss the use of electrospray tandem mass spectrometry, a technique whereby protein-derived peptides are subjected to fragmentation in the gas phase, revealing sequence information for the protein. This powerful technique has been instrumental for the study of proteins and markers associated with various disorders, including heart disease, cancer, and cystic fibrosis. We use the study of protein expression in cystic fibrosis as an example.

  15. EHV-1 EICP22 protein sequences that mediate its physical interaction with the immediate-early protein are not sufficient to enhance the trans-activation activity of the IE protein.

    PubMed

    Derbigny, Wilbert A; Kim, Seong K; Jang, Hyung K; O'Callaghan, Dennis J

    2002-03-20

    The early 293 amino acid EICP22 protein (EICP22P) of equine herpesvirus 1 localizes within the nucleus and functions as an accessory regulatory protein (J. Virol. 68 (1994) 4329). Transient transfection assays indicated that although the EICP22P by itself only minimally trans-activates EHV-1 promoters, the EICP22P functions synergistically with the immediate-early protein (IEP) to enhance expression of EHV-1 early genes (J. Virol. 71 (1997) 1004). We previously showed that the EICP22 protein enhances the DNA-binding activity of the EHV-1 IEP and that it also physically interacts with the IEP (J. Virol. 74 (2000) 1425). In this communication, we employed transient trans-activation assays utilizing EICP22P deletion mutants to address whether the sequences required for EICP22P-IEP physical interactions are essential for EICP22P's ability to interact synergistically with the IEP. Assays employing various classes of the EHV-1 promoters fused to the chloramphenicol acetyl-transferase (CAT) reporter gene indicated that: (1) neither full length nor any of the EICP22P mutants tested was able to overcome repression of the IE promoter elicited by the IEP, (2) the full-length EICP22P interacted synergistically with the IEP to trans-activate the early and late promoters tested, and (3) all of the EICP22P mutants, including those that were able to physically interact with IEP and itself, failed to function synergistically with the IEP to trans-activate representative EHV-1 early and late promoters. The results suggest that EICP22P sequences required for its interaction with the IE protein are not sufficient to mediate its synergistic effect on the trans-activation function of the IEP. The possible explanations as to why sequences in addition to those that mediate EICP22P-IEP interaction and EICP22P self-interactions are essential for the synergistic function of EICP22P are discussed.

  16. The limits of protein sequence comparison?

    PubMed Central

    Pearson, William R; Sierk, Michael L

    2010-01-01

    Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194

  17. Genetic dissection of the consensus sequence for the class 2 and class 3 flagellar promoters

    PubMed Central

    Wozniak, Christopher E.; Hughes, Kelly T.

    2008-01-01

    Summary Computational searches for DNA binding sites often utilize consensus sequences. These search models make assumptions that the frequency of a base pair in an alignment relates to the base pair’s importance in binding and presume that base pairs contribute independently to the overall interaction with the DNA binding protein. These two assumptions have generally been found to be accurate for DNA binding sites. However, these assumptions are often not satisfied for promoters, which are involved in additional steps in transcription initiation after RNA polymerase has bound to the DNA. To test these assumptions for the flagellar regulatory hierarchy, class 2 and class 3 flagellar promoters were randomly mutagenized in Salmonella. Important positions were then saturated for mutagenesis and compared to scores calculated from the consensus sequence. Double mutants were constructed to determine how mutations combined for each promoter type. Mutations in the binding site for FlhD4C2, the activator of class 2 promoters, better satisfied the assumptions for the binding model than did mutations in the class 3 promoter, which is recognized by the σ28 transcription factor. These in vivo results indicate that the activator sites within flagellar promoters can be modeled using simple assumptions but that the DNA sequences recognized by the flagellar sigma factor require more complex models. PMID:18486950

  18. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  19. MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

    PubMed

    Suckau, Detlev; Resemann, Anja

    2009-12-01

    The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.

  20. Nuclear proteins that bind the human gamma-globin gene promoter: alterations in binding produced by point mutations associated with hereditary persistence of fetal hemoglobin.

    PubMed Central

    Gumucio, D L; Rood, K L; Gray, T A; Riordan, M F; Sartor, C I; Collins, F S

    1988-01-01

    The molecular mechanisms responsible for the human fetal-to-adult hemoglobin switch have not yet been elucidated. Point mutations identified in the promoter regions of gamma-globin genes from individuals with nondeletion hereditary persistence of fetal hemoglobin (HPFH) may mark cis-acting sequences important for this switch, and the trans-acting factors which interact with these sequences may be integral parts in the puzzle of gamma-globin gene regulation. We have used gel retardation and footprinting strategies to define nuclear proteins which bind to the normal gamma-globin promoter and to determine the effect of HPFH mutations on the binding of a subset of these proteins. We have identified five proteins in human erythroleukemia cells (K562 and HEL) which bind to the proximal promoter region of the normal gamma-globin gene. One factor, gamma CAAT, binds the duplicated CCAAT box sequences; the -117 HPFH mutation increases the affinity of interaction between gamma CAAT and its cognate site. Two proteins, gamma CAC1 and gamma CAC2, bind the CACCC sequence. These proteins require divalent cations for binding. The -175 HPFH mutation interferes with the binding of a fourth protein, gamma OBP, which binds an octamer sequence (ATGCAAAT) in the normal gamma-globin promoter. The HPFH phenotype of the -175 mutation indicates that the octamer-binding protein may play a negative regulatory role in this setting. A fifth protein, EF gamma a, binds to sequences which overlap the octamer-binding site. The erythroid-specific distribution of EF gamma a and its close approximation to an apparent repressor-binding site suggest that it may be important in gamma-globin regulation. Images PMID:2468996

  1. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

    PubMed

    Sim, Mikang; Kim, Jaebum

    2015-02-01

    The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Use of designed sequences in protein structure recognition.

    PubMed

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  4. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  5. Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins.

    PubMed

    Bandeira, Nuno; Clauser, Karl R; Pevzner, Pavel A

    2007-07-01

    Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.

  6. Neutrality and evolvability of designed protein sequences

    NASA Astrophysics Data System (ADS)

    Bhattacherjee, Arnab; Biswas, Parbati

    2010-07-01

    The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.

  7. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    PubMed

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  8. Genome Sequence of Bacillus megaterium Strain YC4-R4, a Plant Growth-Promoting Rhizobacterium Isolated from a High-Salinity Environment.

    PubMed

    Vílchez, Juan Ignacio; Tang, Qiming; Kaushal, Richa; Wang, Wei; Lv, Suhui; He, Danxia; Chu, Zhaoqing; Zhang, Heng; Liu, Renyi; Zhang, Huiming

    2018-06-21

    Here, we report the complete genome sequence for Bacillus megaterium strain YC4-R4, a highly salt-tolerant rhizobacterium that promotes growth in plants. The sequencing process was performed by combining pyrosequencing and single-molecule sequencing techniques. The complete genome is estimated to be approximately 5.44 Mb, containing a total of 5,673 predicted protein-coding DNA sequences (CDSs). Copyright © 2018 Vílchez et al.

  9. Promoter activity of polypyrimidine tract-binding protein genes of potato responds to environmental cues.

    PubMed

    Butler, Nathaniel M; Hannapel, David J

    2012-12-01

    Polypyrimidine tract-binding (PTB) proteins are RNA-binding proteins that target specific RNAs for post-transcriptional processing by binding cytosine/uracil motifs. PTBs have established functions in a range of RNA processes including splicing, translation, stability and long-distance transport. Six PTB-like genes identified in potato have been grouped into two clades based on homology to other known plant PTBs. StPTB1 and StPTB6 are closely related to a PTB protein discovered in pumpkin, designated CmRBP50, and contain four canonical RNA-recognition motifs. CmRBP50 is expressed in phloem tissues and functions as the core protein of a phloem-mobile RNA/protein complex. Sequence from the potato genome database was used to clone the upstream sequence of these two PTB genes and analyzed to identify conserved cis-elements. The promoter of StPTB6 was enriched for regulatory elements for light and sucrose induction and defense. Upstream sequence of both PTB genes was fused to β-glucuronidase and monitored in transgenic potato lines. In whole plants, the StPTB1 promoter was most active in leaf veins and petioles, whereas StPTB6 was most active in leaf mesophyll. Both genes are active in new tubers and tuber sprouts. StPTB6 expression was induced in stems and stolon sections in response to sucrose and in leaves or petioles in response to light, heat, drought and mechanical wounding. These results show that CmRBP50-like genes of potato exhibit distinct expression patterns and respond to both developmental and environmental cues.

  10. Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

    PubMed Central

    Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

    2014-01-01

    An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841

  11. The Complete Genome Sequence of the Plant Growth-Promoting Bacterium Pseudomonas sp. UW4

    PubMed Central

    Duan, Jin; Jiang, Wei; Cheng, Zhenyu; Heikkila, John J.; Glick, Bernard R.

    2013-01-01

    The plant growth-promoting bacterium (PGPB) Pseudomonas sp. UW4, previously isolated from the rhizosphere of common reeds growing on the campus of the University of Waterloo, promotes plant growth in the presence of different environmental stresses, such as flooding, high concentrations of salt, cold, heavy metals, drought and phytopathogens. In this work, the genome sequence of UW4 was obtained by pyrosequencing and the gaps between the contigs were closed by directed PCR. The P. sp. UW4 genome contains a single circular chromosome that is 6,183,388 bp with a 60.05% G+C content. The bacterial genome contains 5,423 predicted protein-coding sequences that occupy 87.2% of the genome. Nineteen genomic islands (GIs) were predicted and thirty one complete putative insertion sequences were identified. Genes potentially involved in plant growth promotion such as indole-3-acetic acid (IAA) biosynthesis, trehalose production, siderophore production, acetoin synthesis, and phosphate solubilization were determined. Moreover, genes that contribute to the environmental fitness of UW4 were also observed including genes responsible for heavy metal resistance such as nickel, copper, cadmium, zinc, molybdate, cobalt, arsenate, and chromate. Whole-genome comparison with other completely sequenced Pseudomonas strains and phylogeny of four concatenated “housekeeping” genes (16S rRNA, gyrB, rpoB and rpoD) of 128 Pseudomonas strains revealed that UW4 belongs to the fluorescens group, jessenii subgroup. PMID:23516524

  12. Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2011-09-20

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  13. Differential effects of simple repeating DNA sequences on gene expression from the SV40 early promoter.

    PubMed

    Amirhaeri, S; Wohlrab, F; Wells, R D

    1995-02-17

    The influence of simple repeat sequences, cloned into different positions relative to the SV40 early promoter/enhancer, on the transient expression of the chloramphenicol acetyltransferase (CAT) gene was investigated. Insertion of (G)29.(C)29 in either orientation into the 5'-untranslated region of the CAT gene reduced expression in CV-1 cells 50-100 fold when compared with controls with random sequence inserts. Analysis of CAT-specific mRNA levels demonstrated that the effect was due to a reduction of CAT mRNA production rather than to posttranscriptional events. In contrast, insertion of the same insert in either orientation upstream of the promoter-enhancer or downstream of the gene stimulated gene expression 2-3-fold. These effects could be reversed by cotransfection of a competitor plasmid carrying (G)25.(C)25 sequences. The results suggest that a G.C-binding transcription factor modulates gene expression in this system and that promoter strength can be regulated by providing protein-binding sites in trans. Although constructs containing longer tracts of alternating (C-G), (T-G), or (A-T) sequences inhibited CAT expression when inserted in the 5'-untranslated region of the CAT gene, the amount of CAT mRNA was unaffected. Hence, these inhibitions must be due to posttranscriptional events, presumably at the level of translation. These effects of microsatellite sequences on gene expression are discussed with respect to recent data on related simple repeat sequences which cause several human genetic diseases.

  14. Sequencing proteins with transverse ionic transport in nanochannels.

    PubMed

    Boynton, Paul; Di Ventra, Massimiliano

    2016-05-03

    De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.

  15. ``Sequence space soup'' of proteins and copolymers

    NASA Astrophysics Data System (ADS)

    Chan, Hue Sun; Dill, Ken A.

    1991-09-01

    To study the protein folding problem, we use exhaustive computer enumeration to explore ``sequence space soup,'' an imaginary solution containing the ``native'' conformations (i.e., of lowest free energy) under folding conditions, of every possible copolymer sequence. The model is of short self-avoiding chains of hydrophobic (H) and polar (P) monomers configured on the two-dimensional square lattice. By exhaustive enumeration, we identify all native structures for every possible sequence. We find that random sequences of H/P copolymers will bear striking resemblance to known proteins: Most sequences under folding conditions will be approximately as compact as known proteins, will have considerable amounts of secondary structure, and it is most probable that an arbitrary sequence will fold to a number of lowest free energy conformations that is of order one. In these respects, this simple model shows that proteinlike behavior should arise simply in copolymers in which one monomer type is highly solvent averse. It suggests that the structures and uniquenesses of native proteins are not consequences of having 20 different monomer types, or of unique properties of amino acid monomers with regard to special packing or interactions, and thus that simple copolymers might be designable to collapse to proteinlike structures and properties. A good strategy for designing a sequence to have a minimum possible number of native states is to strategically insert many P monomers. Thus known proteins may be marginally stable due to a balance: More H residues stabilize the desired native state, but more P residues prevent simultaneous stabilization of undesired native states.

  16. Computationally mapping sequence space to understand evolutionary protein engineering.

    PubMed

    Armstrong, Kathryn A; Tidor, Bruce

    2008-01-01

    Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.

  17. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

    PubMed Central

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  18. Sequences in the intergenic spacer influence RNA Pol I transcription from the human rRNA promoter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, W.M.; Sylvester, J.E.

    1994-09-01

    In most eucaryotic species, ribosomal genes are tandemly repeated about 100-5000 times per haploid genome. The 43 Kb human rDNA repeat consists of a 13 Kb coding region for the 18S, 5.8S, 28S ribosomal RNAs (rRNAs) and transcribed spacers separated by a 30 Kb intergenic spacer. For species such as frog, mouse and rat, sequences in the intergenic spacer other than the gene promoter have been shown to modulate transcription of the ribosomal gene. These sequences are spacer promoters, enhancers and the terminator for spacer transcription. We are addressing whether the human ribosomal gene promoter is similarly influenced. In-vitro transcriptionmore » run-off assays have revealed that the 4.5 kb region (CBE), directly upstream of the gene promoter, has cis-stimulation and trans-competition properties. This suggests that the CBE fragment contains an enhancer(s) for ribosomal gene transcription. Further experiments have shown that a fragment ({approximately}1.6 kb) within the CBE fragment also has trans-competition function. Deletion subclones of this region are being tested to delineate the exact sequences responsible for these modulating activities. Previous sequence analysis and functional studies have revealed that CBE contains regions of DNA capable of adopting alternative structures such as bent DNA, Z-DNA, and triple-stranded DNA. Whether these structures are required for modulating transcription remains to be determined as does the specific DNA-protein interaction involved.« less

  19. Sequence space and the ongoing expansion of the protein universe.

    PubMed

    Povolotskaya, Inna S; Kondrashov, Fyodor A

    2010-06-17

    The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

  20. Dissecting the relationship between protein structure and sequence variation

    NASA Astrophysics Data System (ADS)

    Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

    2015-03-01

    Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.

  1. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the

  2. Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?

    PubMed

    Sridhar, Settu; Guruprasad, Kunchur

    2014-01-01

    We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to 'swap' certain short peptide sequences in naturally occurring proteins with their corresponding 'inverted' peptides and generate 'artificial' proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5-12 and 18 amino acid residues. Our analysis illustrates with examples that such 'artificial' proteins may be generated by identifying peptides with 'similar structural environment' and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides.

  3. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  4. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein

  5. Dynamics of domain coverage of the protein sequence universe.

    PubMed

    Rekapalli, Bhanu; Wuichet, Kristin; Peterson, Gregory D; Zhulin, Igor B

    2012-11-16

    The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its "dark matter". Here we suggest that true size of "dark matter" is much larger than stated by current definitions. We propose an approach to reducing the size of "dark matter" by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of "dark matter"; however, its absolute size increases substantially with the growth of sequence data.

  6. An RRM–ZnF RNA recognition module targets RBM10 to exonic sequences to promote exon exclusion

    PubMed Central

    Collins, Katherine M.; Kainov, Yaroslav A.; Christodolou, Evangelos; Ray, Debashish; Morris, Quaid; Hughes, Timothy; Taylor, Ian A.

    2017-01-01

    Abstract RBM10 is an RNA-binding protein that plays an essential role in development and is frequently mutated in the context of human disease. RBM10 recognizes a diverse set of RNA motifs in introns and exons and regulates alternative splicing. However, the molecular mechanisms underlying this seemingly relaxed sequence specificity are not understood and functional studies have focused on 3΄ intronic sites only. Here, we dissect the RNA code recognized by RBM10 and relate it to the splicing regulatory function of this protein. We show that a two-domain RRM1–ZnF unit recognizes a GGA-centered motif enriched in RBM10 exonic sites with high affinity and specificity and test that the interaction with these exonic sequences promotes exon skipping. Importantly, a second RRM domain (RRM2) of RBM10 recognizes a C-rich sequence, which explains its known interaction with the intronic 3΄ site of NUMB exon 9 contributing to regulation of the Notch pathway in cancer. Together, these findings explain RBM10's broad RNA specificity and suggest that RBM10 functions as a splicing regulator using two RNA-binding units with different specificities to promote exon skipping. PMID:28379442

  7. An RRM-ZnF RNA recognition module targets RBM10 to exonic sequences to promote exon exclusion.

    PubMed

    Collins, Katherine M; Kainov, Yaroslav A; Christodolou, Evangelos; Ray, Debashish; Morris, Quaid; Hughes, Timothy; Taylor, Ian A; Makeyev, Eugene V; Ramos, Andres

    2017-06-20

    RBM10 is an RNA-binding protein that plays an essential role in development and is frequently mutated in the context of human disease. RBM10 recognizes a diverse set of RNA motifs in introns and exons and regulates alternative splicing. However, the molecular mechanisms underlying this seemingly relaxed sequence specificity are not understood and functional studies have focused on 3΄ intronic sites only. Here, we dissect the RNA code recognized by RBM10 and relate it to the splicing regulatory function of this protein. We show that a two-domain RRM1-ZnF unit recognizes a GGA-centered motif enriched in RBM10 exonic sites with high affinity and specificity and test that the interaction with these exonic sequences promotes exon skipping. Importantly, a second RRM domain (RRM2) of RBM10 recognizes a C-rich sequence, which explains its known interaction with the intronic 3΄ site of NUMB exon 9 contributing to regulation of the Notch pathway in cancer. Together, these findings explain RBM10's broad RNA specificity and suggest that RBM10 functions as a splicing regulator using two RNA-binding units with different specificities to promote exon skipping. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  9. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    PubMed

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  10. Unique nonstructural proteins of Pneumonia Virus of Mice (PVM) promote degradation of interferon (IFN) pathway components and IFN-stimulated gene proteins.

    PubMed

    Dhar, Jayeeta; Barik, Sailen

    2016-12-01

    Pneumonia Virus of Mice (PVM) is the only virus that shares the Pneumovirus genus of the Paramyxoviridae family with Respiratory Syncytial Virus (RSV). A deadly mouse pathogen, PVM has the potential to serve as a robust animal model of RSV infection, since human RSV does not fully replicate the human pathology in mice. Like RSV, PVM also encodes two nonstructural proteins that have been implicated to suppress the IFN pathway, but surprisingly, they exhibit no sequence similarity with their RSV equivalents. The molecular mechanism of PVM NS function, therefore, remains unknown. Here, we show that recombinant PVM NS proteins degrade the mouse counterparts of the IFN pathway components. Proteasomal degradation appears to be mediated by ubiquitination promoted by PVM NS proteins. Interestingly, NS proteins of PVM lowered the levels of several ISG (IFN-stimulated gene) proteins as well. These results provide a molecular foundation for the mechanisms by which PVM efficiently subverts the IFN response of the murine cell. They also reveal that in spite of their high sequence dissimilarity, the two pneumoviral NS proteins are functionally and mechanistically similar.

  11. KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

    PubMed

    Laetsch, Dominik R; Blaxter, Mark L

    2017-10-05

    The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.

  12. [Screening specific recognition motif of RNA-binding proteins by SELEX in combination with next-generation sequencing technique].

    PubMed

    Zhang, Lu; Xu, Jinhao; Ma, Jinbiao

    2016-07-25

    RNA-binding protein exerts important biological function by specifically recognizing RNA motif. SELEX (Systematic evolution of ligands by exponential enrichment), an in vitro selection method, can obtain consensus motif with high-affinity and specificity for many target molecules from DNA or RNA libraries. Here, we combined SELEX with next-generation sequencing to study the protein-RNA interaction in vitro. A pool of RNAs with 20 bp random sequences were transcribed by T7 promoter, and target protein was inserted into plasmid containing SBP-tag, which can be captured by streptavidin beads. Through only one cycle, the specific RNA motif can be obtained, which dramatically improved the selection efficiency. Using this method, we found that human hnRNP A1 RRMs domain (UP1 domain) bound RNA motifs containing AGG and AG sequences. The EMSA experiment indicated that hnRNP A1 RRMs could bind the obtained RNA motif. Taken together, this method provides a rapid and effective method to study the RNA binding specificity of proteins.

  13. Dynamics of domain coverage of the protein sequence universe

    PubMed Central

    2012-01-01

    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439

  14. Sequence-similar, structure-dissimilar protein pairs in the PDB.

    PubMed

    Kosloff, Mickey; Kolodny, Rachel

    2008-05-01

    It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We

  15. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  16. A novel class of plant-specific zinc-dependent DNA-binding protein that binds to A/T-rich DNA sequences

    PubMed Central

    Nagano, Yukio; Furuhashi, Hirofumi; Inaba, Takehito; Sasaki, Yukiko

    2001-01-01

    Complementary DNA encoding a DNA-binding protein, designated PLATZ1 (plant AT-rich sequence- and zinc-binding protein 1), was isolated from peas. The amino acid sequence of the protein is similar to those of other uncharacterized proteins predicted from the genome sequences of higher plants. However, no paralogous sequences have been found outside the plant kingdom. Multiple alignments among these paralogous proteins show that several cysteine and histidine residues are invariant, suggesting that these proteins are a novel class of zinc-dependent DNA-binding proteins with two distantly located regions, C-x2-H-x11-C-x2-C-x(4–5)-C-x2-C-x(3–7)-H-x2-H and C-x2-C-x(10–11)-C-x3-C. In an electrophoretic mobility shift assay, the zinc chelator 1,10-o-phenanthroline inhibited DNA binding, and two distant zinc-binding regions were required for DNA binding. A protein blot with 65ZnCl2 showed that both regions are required for zinc-binding activity. The PLATZ1 protein non-specifically binds to A/T-rich sequences, including the upstream region of the pea GTPase pra2 and plastocyanin petE genes. Expression of the PLATZ1 repressed those of the reporter constructs containing the coding sequence of luciferase gene driven by the cauliflower mosaic virus (CaMV) 35S90 promoter fused to the tandem repeat of the A/T-rich sequences. These results indicate that PLATZ1 is a novel class of plant-specific zinc-dependent DNA-binding protein responsible for A/T-rich sequence-mediated transcriptional repression. PMID:11600698

  17. Coarse-grained sequences for protein folding and design

    PubMed Central

    Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa

    2003-01-01

    We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815

  18. Coarse-grained sequences for protein folding and design.

    PubMed

    Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa

    2003-09-16

    We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.

  19. Protein Structure Determination using Metagenome sequence data

    PubMed Central

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David

    2017-01-01

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891

  20. Mercury BLASTP: Accelerating Protein Sequence Alignment

    PubMed Central

    Jacob, Arpith; Lancaster, Joseph; Buhler, Jeremy; Harris, Brandon; Chamberlain, Roger D.

    2008-01-01

    Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11-15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results. PMID:19492068

  1. Prevalence of transcription promoters within archaeal operons and coding sequences.

    PubMed

    Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

    2009-01-01

    Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.

  2. Screening a yeast promoter library leads to the isolation of the RP29/L32 and SNR17B/RPL37A divergent promoters and the discovery of a gene encoding ribosomal protein L37.

    PubMed

    Santangelo, G M; Tornow, J; McLaughlin, C S; Moldave, K

    1991-08-30

    Two promoters (A7 and A23), isolated at random from the Saccharomyces cerevisiae genome by virtue of their capacity to activate transcription, are identical to known intergenic bidirectional promoters. Sequence analysis of the genomic DNA adjacent to the A7 promoter identified a split gene encoding ribosomal (r) protein L37, which is homologous to the tRNA-binding r-proteins, L35a (from human and rat) and L32 (from frogs).

  3. Transcriptional activation signals found in the Epstein-Barr virus (EBV) latency C promoter are conserved in the latency C promoter sequences from baboon and Rhesus monkey EBV-like lymphocryptoviruses (cercopithicine herpesviruses 12 and 15).

    PubMed

    Fuentes-Pananá, E M; Swaminathan, S; Ling, P D

    1999-01-01

    The Epstein-Barr virus (EBV) EBNA2 protein is a transcriptional activator that controls viral latent gene expression and is essential for EBV-driven B-cell immortalization. EBNA2 is expressed from the viral C promoter (Cp) and regulates its own expression by activating Cp through interaction with the cellular DNA binding protein CBF1. Through regulation of Cp and EBNA2 expression, EBV controls the pattern of latent protein expression and the type of latency established. To gain further insight into the important regulatory elements that modulate Cp usage, we isolated and sequenced the Cp regions corresponding to nucleotides 10251 to 11479 of the EBV genome (-1079 to +144 relative to the transcription initiation site) from the EBV-like lymphocryptoviruses found in baboons (herpesvirus papio; HVP) and Rhesus macaques (RhEBV). Sequence comparison of the approximately 1,230-bp Cp regions from these primate viruses revealed that EBV and HVP Cp sequences are 64% conserved, EBV and RhEBV Cp sequences are 66% conserved, and HVP and RhEBV Cp sequences are 65% conserved relative to each other. Approximately 50% of the residues are conserved among all three sequences, yet all three viruses have retained response elements for glucocorticoids, two positionally conserved CCAAT boxes, and positionally conserved TATA boxes. The putative EBNA2 100-bp enhancers within these promoters contain 54 conserved residues, and the binding sites for CBF1 and CBF2 are well conserved. Cp usage in the HVP- and RhEBV-transformed cell lines was detected by S1 nuclease protection analysis. Transient-transfection analysis showed that promoters of both HVP and RhEBV are responsive to EBNA2 and that they bind CBF1 and CBF2 in gel mobility shift assays. These results suggest that similar mechanisms for regulation of latent gene expression are conserved among the EBV-related lymphocryptoviruses found in nonhuman primates.

  4. Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity.

    PubMed

    Mulligan, M E; Hawley, D K; Entriken, R; McClure, W R

    1984-01-11

    We describe a simple algorithm for computing a homology score for Escherichia coli promoters based on DNA sequence alone. The homology score was related to 31 values, measured in vitro, of RNA polymerase selectivity, which we define as the product KBk2, the apparent second order rate constant for open complex formation. We found that promoter strength could be predicted to within a factor of +/-4.1 in KBk2 over a range of 10(4) in the same parameter. The quantitative evaluation was linked to an automated (Apple II) procedure for searching and evaluating possible promoters in DNA sequence files.

  5. Induction and maintenance of DNA methylation in plant promoter sequences by apple latent spherical virus-induced transcriptional gene silencing

    PubMed Central

    Kon, Tatsuya; Yoshikawa, Nobuyuki

    2014-01-01

    Apple latent spherical virus (ALSV) is an efficient virus-induced gene silencing vector in functional genomics analyses of a broad range of plant species. Here, an Agrobacterium-mediated inoculation (agroinoculation) system was developed for the ALSV vector, and virus-induced transcriptional gene silencing (VITGS) is described in plants infected with the ALSV vector. The cDNAs of ALSV RNA1 and RNA2 were inserted between the cauliflower mosaic virus 35S promoter and the NOS-T sequences in a binary vector pCAMBIA1300 to produce pCALSR1 and pCALSR2-XSB or pCALSR2-XSB/MN. When these vector constructs were agroinoculated into Nicotiana benthamiana plants with a construct expressing a viral silencing suppressor, the infection efficiency of the vectors was 100%. A recombinant ALSV vector carrying part of the 35S promoter sequence induced transcriptional gene silencing of the green fluorescent protein gene in a line of N. benthamiana plants, resulting in the disappearance of green fluorescence of infected plants. Bisulfite sequencing showed that cytosine residues at CG and CHG sites of the 35S promoter sequence were highly methylated in the silenced generation zero plants infected with the ALSV carrying the promoter sequence as well as in progeny. The ALSV-mediated VITGS state was inherited by progeny for multiple generations. In addition, induction of VITGS of an endogenous gene (chalcone synthase-A) was demonstrated in petunia plants infected with an ALSV vector carrying the native promoter sequence. These results suggest that ALSV-based vectors can be applied to study DNA methylation in plant genomes, and provide a useful tool for plant breeding via epigenetic modification. PMID:25426109

  6. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  7. Structure and Sequence Search on Aptamer-Protein Docking

    NASA Astrophysics Data System (ADS)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  8. AlignMe—a membrane protein sequence alignment web server

    PubMed Central

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  9. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246

  10. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  11. Evaluating the protein coding potential of exonized transposable element sequences

    PubMed Central

    Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

    2007-01-01

    Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to

  12. Conserved interdomain linker promotes phase separation of the multivalent adaptor protein Nck

    PubMed Central

    Banjade, Sudeep; Wu, Qiong; Mittal, Anuradha; Peeples, William B.; Pappu, Rohit V.; Rosen, Michael K.

    2015-01-01

    The organization of membranes, the cytosol, and the nucleus of eukaryotic cells can be controlled through phase separation of lipids, proteins, and nucleic acids. Collective interactions of multivalent molecules mediated by modular binding domains can induce gelation and phase separation in several cytosolic and membrane-associated systems. The adaptor protein Nck has three SRC-homology 3 (SH3) domains that bind multiple proline-rich segments in the actin regulatory protein neuronal Wiskott-Aldrich syndrome protein (N-WASP) and an SH2 domain that binds to multiple phosphotyrosine sites in the adhesion protein nephrin, leading to phase separation. Here, we show that the 50-residue linker between the first two SH3 domains of Nck enhances phase separation of Nck/N-WASP/nephrin assemblies. Two linear motifs within this element, as well as its overall positively charged character, are important for this effect. The linker increases the driving force for self-assembly of Nck, likely through weak interactions with the second SH3 domain, and this effect appears to promote phase separation. The linker sequence is highly conserved, suggesting that the sequence determinants of the driving forces for phase separation may be generally important to Nck functions. Our studies demonstrate that linker regions between modular domains can contribute to the driving forces for self-assembly and phase separation of multivalent proteins. PMID:26553976

  13. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    PubMed

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  14. Complete genome sequence of the rapeseed plant-growth promoting Serratia plymuthica strain AS9

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neupane, Saraswoti; Hogberg, Nils; Alstrom, Sadhna

    2012-01-01

    Serratia plymuthica are plant-associated, plant beneficial species belonging to the family Enterobacteriaceae. The members of the genus Serratia are ubiquitous in nature and their life style varies from endophytic to free-living. S. plymuthica AS9 is of special interest for its ability to inhibit fungal pathogens of rapeseed and to promote plant growth. The genome of S. plymuthica AS9 comprises a 5,442,880 bp long circular chromosome that consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome is part of the project entitled Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogensmore » awarded through the 2010 DOE-JGI Community Sequencing Program (CSP2010).« less

  15. Rat leucine-rich protein binds and activates the promoter of the beta isoform of Ca2+/calmodulin-dependent protein kinase II gene.

    PubMed

    Ochiai, Nagahiro; Masumoto, Shuji; Sakagami, Hiroyuki; Yoshimura, Yoshiyuki; Yamauchi, Takashi

    2007-05-01

    We previously found the neuronal cell-type specific promoter and binding partner of the beta isoform of Ca(2+)/calmodulin-dependent protein kinase II (beta CaM kinase II) in rat brain [Donai, H., Morinaga, H., Yamauchi, T., 2001. Genomic organization and neuronal cell type specific promoter activity of beta isoform of Ca(2+)/calmodulin-dependent protein kinase II of rat brain. Mol. Brain Res. 94, 35-47]. In the present study, we purified a protein that binds specifically a promoter region of beta CaM kinase II gene from a nuclear extract of the rat cerebellum using DEAE-cellulose column chromatography, ammonium sulfate fractionation, gel filtration and polyacrylamide gel electrophoresis. The purified protein was identified as rat leucine-rich protein 157 (rLRP157) using tandem mass spectrometry. Then, we prepared its cDNA by reverse transcriptase-polymerase chain reaction (RT-PCR) from poly(A)(+)RNA of rat cerebellum. The rLRP157 cDNA was introduced into mouse neuroblastomaxrat glioma hybrid NG108-15 cells, and cells stably expressing rLRP157 (NG/LRP cells) were isolated. Binding of rLRP157 with the promoter sequence was confirmed by electrophoretic mobility shift assay using nuclear extract of NG/LRP cells. A luciferase reporter gene containing a promoter of beta CaM kinase II was transiently expressed in NG/LRP cells. Under the conditions, the promoter activity was enhanced about 2.6-fold in NG/LRP cells as compared with wild-type cells. The expression of rLRP157 mRNA was paralleled with that of beta CaM kinase II in the adult and embryo rat brain detected by in situ hybridization. Nuclear localization of rLRP157 was confirmed using GFP-rLRP157 fusion protein investigated under a confocal microscope. These results indicate that rLRP157 is one of the proteins binding to, and regulating the activity of, the promoter of beta CaM kinase II.

  16. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Promoters and proteins from Clostridium thermocellum and uses thereof

    DOEpatents

    Wu, J. H. David; Newcomb, Michael

    2012-11-13

    The present invention relates to an inducible and a high expression nucleic acid promoter isolated from Clostridium thermocellum. These promoters are useful for directing expression of a protein or polypeptide encoded by a nucleic acid molecule operably associated with the nucleic acid promoters. The present invention also relates to nucleic acid constructs including the C. thermocellum promoters, and expression vectors and hosts containing such nucleic acid constructs. The present invention also relates to protein isolated from Clostridium thermocellum, including a repressor protein. The present invention also provides methods of using the isolated promoters and proteins from Clostridium thermocellum, including methods for directing inducible in vitro and in vivo expression of a protein or polypeptide in a host, and methods of producing ethanol from a cellulosic biomass.

  18. ATP hydrolysis provides functions that promote rejection of pairings between different copies of long repeated sequences

    PubMed Central

    Danilowicz, Claudia; Hermans, Laura; Coljee, Vincent; Prévost, Chantal

    2017-01-01

    Abstract During DNA recombination and repair, RecA family proteins must promote rapid joining of homologous DNA. Repeated sequences with >100 base pair lengths occupy more than 1% of bacterial genomes; however, commitment to strand exchange was believed to occur after testing ∼20–30 bp. If that were true, pairings between different copies of long repeated sequences would usually become irreversible. Our experiments reveal that in the presence of ATP hydrolysis even 75 bp sequence-matched strand exchange products remain quite reversible. Experiments also indicate that when ATP hydrolysis is present, flanking heterologous dsDNA regions increase the reversibility of sequence matched strand exchange products with lengths up to ∼75 bp. Results of molecular dynamics simulations provide insight into how ATP hydrolysis destabilizes strand exchange products. These results inspired a model that shows how pairings between long repeated sequences could be efficiently rejected even though most homologous pairings form irreversible products. PMID:28854739

  19. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    PubMed

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  20. Cyclosporin A and FK-506 both affect DNA binding of regulatory nuclear proteins to the human interleukin-2 promoter.

    PubMed

    Baumann, G; Geisse, S; Sullivan, M

    1991-03-01

    The structurally unrelated immunosuppressive drugs cyclosporin A (Sandimmun) and FK-506 both interfere with the process of T-cell proliferation by blocking the transcription of the T-cell growth factor interleukin-2 (IL-2). Here we demonstrate that the transcriptional activation of this gene requires the binding of regulatory nuclear proteins to a promoter element with sequence similarity to the consensus binding site for NF-kappa B-related transcription factors. We present evidence that the binding by regulatory nuclear proteins to the kappa B element of the IL-2 promoter is affected negatively by cyclosporin A and FK-506 at concentrations paralleling their immunosuppressive activity in vivo. The decrease in DNA-protein complex formation induced by the immunosuppressive drugs correlates with a decrease in IL-2 production. FK-506 is 10 to 100 times more potent than cyclosporin A in its ability to inhibit sequence-specific DNA binding and IL-2 production. Our findings suggest that the actions of both drugs converge at the level of DNA-protein interaction.

  1. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    PubMed

    Smith, Colin A; Kortemme, Tanja

    2011-01-01

    Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  2. A sequence in the rat Pit-1 gene promoter confers synergistic activation by glucocorticoids and protein kinase-C.

    PubMed

    Jong, M T; Raaka, B M; Samuels, H H

    1994-10-01

    The 5'-flanking region of the gene for Pit-1, a pituitary-specific transcription factor, was isolated from a rat liver genomic library and sequenced. Expression of a reporter construct containing Pit-1 promoter sequences linked to the bacterial chloramphenicol acetyltransferase (CAT) gene was assessed by transient transfection in rat pituitary GH4C1 cells. Treatment of transfected cells with either dexamethasone (DEX) for 48 h or the phorbol ester 12-O-tetradecanoylphorbol 13-acetate (TPA) for the final 20 h of the 48-h posttransfection period had minimal effects on CAT expression. However, CAT activity was elevated about 20-fold when transfected cells were treated with both DEX and TPA. This apparent synergistic activation was lost when DEX treatment was also limited to the final 20 h of the 48-h posttransfection period, suggesting that a time-dependent accumulation of a DEX-induced gene product might be involved. This putative DEX-induced product appeared to be relatively stable, because synergistic activation was observed in cells treated with DEX alone for 36 h, followed by a 10-h incubation without DEX before the addition of TPA. The Pit-1 gene promoter region between -210 and -142 from the transcription start site conferred synergistic regulation by DEX and TPA when placed upstream of position -105 in the herpes viral thymidine kinase promoter.(ABSTRACT TRUNCATED AT 250 WORDS)

  3. Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

    PubMed

    Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

    2016-03-02

    Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence.

    PubMed

    Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin

    2011-08-21

    Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems. Copyright © 2011 Elsevier Ltd. All rights reserved.

  5. Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences

    PubMed Central

    Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens

    2015-01-01

    Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100

  6. CaMV-35S promoter sequence-specific DNA methylation in lettuce.

    PubMed

    Okumura, Azusa; Shimada, Asahi; Yamasaki, Satoshi; Horino, Takuya; Iwata, Yuji; Koizumi, Nozomu; Nishihara, Masahiro; Mishiba, Kei-ichiro

    2016-01-01

    We found 35S promoter sequence-specific DNA methylation in lettuce. Additionally, transgenic lettuce plants having a modified 35S promoter lost methylation, suggesting the modified sequence is subjected to the methylation machinery. We previously reported that cauliflower mosaic virus 35S promoter-specific DNA methylation in transgenic gentian (Gentiana triflora × G. scabra) plants occurs irrespective of the copy number and the genomic location of T-DNA, and causes strong gene silencing. To confirm whether 35S-specific methylation can occur in other plant species, transgenic lettuce (Lactuca sativa L.) plants with a single copy of the 35S promoter-driven sGFP gene were produced and analyzed. Among 10 lines of transgenic plants, 3, 4, and 3 lines showed strong, weak, and no expression of sGFP mRNA, respectively. Bisulfite genomic sequencing of the 35S promoter region showed hypermethylation at CpG and CpWpG (where W is A or T) sites in 9 of 10 lines. Gentian-type de novo methylation pattern, consisting of methylated cytosines at CpHpH (where H is A, C, or T) sites, was also observed in the transgenic lettuce lines, suggesting that lettuce and gentian share similar methylation machinery. Four of five transgenic lettuce lines having a single copy of a modified 35S promoter, which was modified in the proposed core target of de novo methylation in gentian, exhibited 35S hypomethylation, indicating that the modified sequence may be the target of the 35S-specific methylation machinery.

  7. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  8. Sequence signatures of allosteric proteins towards rational design.

    PubMed

    Namboodiri, Saritha; Verma, Chandra; Dhar, Pawan K; Giuliani, Alessandro; Nair, Achuthsankar S

    2010-12-01

    Allostery is the phenomenon of changes in the structure and activity of proteins that appear as a consequence of ligand binding at sites other than the active site. Studying mechanistic basis of allostery leading to protein design with predetermined functional endpoints is an important unmet need of synthetic biology. Here, we screened the amino acid sequence landscape in search of sequence-signatures of allostery using Recurrence Quantitative Analysis (RQA) method. A characteristic vector, comprised of 10 features extracted from RQA was defined for amino acid sequences. Using Principal Component Analysis, four factors were found to be important determinants of allosteric behavior. Our sequence-based predictor method shows 82.6% accuracy, 85.7% sensitivity and 77.9% specificity with the current dataset. Further, we show that Laminarity-Mean-hydrophobicity representing repeated hydrophobic patches is the most crucial indicator of allostery. To our best knowledge this is the first report that describes sequence determinants of allostery based on hydrophobicity. As an outcome of these findings, we plan to explore possibility of inducing allostery in proteins.

  9. The complete sequence and promoter activity of the human A-raf-1 gene (ARAF1)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, J.E.; Beck, T.W.; Brennscheidt, U.

    1994-03-01

    The raf proto-oncogenes encode cytoplasmic protein serine/threonine kinases, which play a critical role in cell growth and development. One of these, A-raf-1 (human gene symbol, ARAF1), which is predominantly expressed in mouse urogenital tissues, has been mapped to an evolutionarily conserved linkage group composed of ARAF1, SYN1, TIMP, and properdin located at human chromosome Xp11.2. The authors have isolated human genomic DNA clones containing the expressed gene (ARAF1) on the X chromosome and a pseudogene (ARAF2) on chromosome 7p12-q11.21. Analysis of the nucleotide sequence from the ARAF1 genomic clones demonstrated that it consists of 16 exons encoded by minimally 10,776more » nucleotides. The major transcriptional start site (+1) was determined by RNase protection and primer extension assays. Promoter activity was confirmed by functional assays using DNA fragments fused to a CAT reporter gene. The ARAF1 minimal promoter, located between nucleotides -59 and +93, has a low G + C content and lacks consensus TATA and Inr sequences but shows sequence similarity at position -1 to the E box that is known to interact with USF and TFII-I transcription factors. 65 refs., 7 figs., 1 tab.« less

  10. Transcriptional Activation Signals Found in the Epstein-Barr Virus (EBV) Latency C Promoter Are Conserved in the Latency C Promoter Sequences from Baboon and Rhesus Monkey EBV-Like Lymphocryptoviruses (Cercopithicine Herpesviruses 12 and 15)

    PubMed Central

    Fuentes-Pananá, Ezequiel M.; Swaminathan, Sankar; Ling, Paul D.

    1999-01-01

    The Epstein-Barr virus (EBV) EBNA2 protein is a transcriptional activator that controls viral latent gene expression and is essential for EBV-driven B-cell immortalization. EBNA2 is expressed from the viral C promoter (Cp) and regulates its own expression by activating Cp through interaction with the cellular DNA binding protein CBF1. Through regulation of Cp and EBNA2 expression, EBV controls the pattern of latent protein expression and the type of latency established. To gain further insight into the important regulatory elements that modulate Cp usage, we isolated and sequenced the Cp regions corresponding to nucleotides 10251 to 11479 of the EBV genome (−1079 to +144 relative to the transcription initiation site) from the EBV-like lymphocryptoviruses found in baboons (herpesvirus papio; HVP) and Rhesus macaques (RhEBV). Sequence comparison of the approximately 1,230-bp Cp regions from these primate viruses revealed that EBV and HVP Cp sequences are 64% conserved, EBV and RhEBV Cp sequences are 66% conserved, and HVP and RhEBV Cp sequences are 65% conserved relative to each other. Approximately 50% of the residues are conserved among all three sequences, yet all three viruses have retained response elements for glucocorticoids, two positionally conserved CCAAT boxes, and positionally conserved TATA boxes. The putative EBNA2 100-bp enhancers within these promoters contain 54 conserved residues, and the binding sites for CBF1 and CBF2 are well conserved. Cp usage in the HVP- and RhEBV-transformed cell lines was detected by S1 nuclease protection analysis. Transient-transfection analysis showed that promoters of both HVP and RhEBV are responsive to EBNA2 and that they bind CBF1 and CBF2 in gel mobility shift assays. These results suggest that similar mechanisms for regulation of latent gene expression are conserved among the EBV-related lymphocryptoviruses found in nonhuman primates. PMID:9847397

  11. Molecular Design of Performance Proteins With Repetitive Sequences

    NASA Astrophysics Data System (ADS)

    Vendrely, Charlotte; Ackerschott, Christian; Römer, Lin; Scheibel, Thomas

    Most performance proteins responsible for the mechanical stability of cells and organisms reveal highly repetitive sequences. Mimicking such performance proteins is of high interest for the design of nanostructured biomaterials. In this article, flagelliform silk is exemplary introduced to describe a general principle for designing genes of repetitive performance proteins for recombinant expression in Escherichia coli . In the first step, repeating amino acid sequence motifs are reversely transcripted into DNA cassettes, which can in a second step be seamlessly ligated, yielding a designed gene. Recombinant expression thereof leads to proteins mimicking the natural ones. The recombinant proteins can be assembled into nanostructured materials in a controlled manner, allowing their use in several applications.

  12. Increasing Sequence Diversity with Flexible Backbone Protein Design: The Complete Redesign of a Protein Hydrophobic Core

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murphy, Grant S.; Mills, Jeffrey L.; Miley, Michael J.

    2015-10-15

    Protein design tests our understanding of protein stability and structure. Successful design methods should allow the exploration of sequence space not found in nature. However, when redesigning naturally occurring protein structures, most fixed backbone design algorithms return amino acid sequences that share strong sequence identity with wild-type sequences, especially in the protein core. This behavior places a restriction on functional space that can be explored and is not consistent with observations from nature, where sequences of low identity have similar structures. Here, we allow backbone flexibility during design to mutate every position in the core (38 residues) of a four-helixmore » bundle protein. Only small perturbations to the backbone, 12 {angstrom}, were needed to entirely mutate the core. The redesigned protein, DRNN, is exceptionally stable (melting point >140C). An NMR and X-ray crystal structure show that the side chains and backbone were accurately modeled (all-atom RMSD = 1.3 {angstrom}).« less

  13. Integration of promoters, inverted repeat sequences and proteomic data into a model for high silencing efficiency of coeliac disease related gliadins in bread wheat

    PubMed Central

    2013-01-01

    Background Wheat gluten has unique nutritional and technological characteristics, but is also a major trigger of allergies and intolerances. One of the most severe diseases caused by gluten is coeliac disease. The peptides produced in the digestive tract by the incomplete digestion of gluten proteins trigger the disease. The majority of the epitopes responsible reside in the gliadin fraction of gluten. The location of the multiple gliadin genes in blocks has to date complicated their elimination by classical breeding techniques or by the use of biotechnological tools. As an approach to silence multiple gliadin genes we have produced 38 transgenic lines of bread wheat containing combinations of two endosperm-specific promoters and three different inverted repeat sequences to silence three fractions of gliadins by RNA interference. Results The effects of the RNA interference constructs on the content of the gluten proteins, total protein and starch, thousand seed weights and SDSS quality tests of flour were analyzed in these transgenic lines in two consecutive years. The characteristics of the inverted repeat sequences were the main factor that determined the efficiency of silencing. The promoter used had less influence on silencing, although a synergy in silencing efficiency was observed when the two promoters were used simultaneously. Genotype and the environment also influenced silencing efficiency. Conclusions We conclude that to obtain wheat lines with an optimum reduction of toxic gluten epitopes one needs to take into account the factors of inverted repeat sequences design, promoter choice and also the wheat background used. PMID:24044767

  14. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

    PubMed Central

    Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  15. Molecular sled sequences are common in mammalian proteins.

    PubMed

    Xiong, Kan; Blainey, Paul C

    2016-03-18

    Recent work revealed a new class of molecular machines called molecular sleds, which are small basic molecules that bind and slide along DNA with the ability to carry cargo along DNA. Here, we performed biochemical and single-molecule flow stretching assays to investigate the basis of sliding activity in molecular sleds. In particular, we identified the functional core of pVIc, the first molecular sled characterized; peptide functional groups that control sliding activity; and propose a model for the sliding activity of molecular sleds. We also observed widespread DNA binding and sliding activity among basic polypeptide sequences that implicate mammalian nuclear localization sequences and many cell penetrating peptides as molecular sleds. These basic protein motifs exhibit weak but physiologically relevant sequence-nonspecific DNA affinity. Our findings indicate that many mammalian proteins contain molecular sled sequences and suggest the possibility that substantial undiscovered sliding activity exists among nuclear mammalian proteins. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Adaptive Local Realignment of Protein Sequences.

    PubMed

    DeBlasio, Dan; Kececioglu, John

    2018-06-11

    While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.

  17. Evolution of EF-hand calcium-modulated proteins. III. Exon sequences confirm most dendrograms based on protein sequences: calmodulin dendrograms show significant lack of parallelism

    NASA Technical Reports Server (NTRS)

    Nakayama, S.; Kretsinger, R. H.

    1993-01-01

    In the first report in this series we presented dendrograms based on 152 individual proteins of the EF-hand family. In the second we used sequences from 228 proteins, containing 835 domains, and showed that eight of the 29 subfamilies are congruent and that the EF-hand domains of the remaining 21 subfamilies have diverse evolutionary histories. In this study we have computed dendrograms within and among the EF-hand subfamilies using the encoding DNA sequences. In most instances the dendrograms based on protein and on DNA sequences are very similar. Significant differences between protein and DNA trees for calmodulin remain unexplained. In our fourth report we evaluate the sequences and the distribution of introns within the EF-hand family and conclude that exon shuffling did not play a significant role in its evolution.

  18. CAMELOT: A machine learning approach for coarse-grained simulations of aggregation of block-copolymeric protein sequences

    PubMed Central

    Ruff, Kiersten M.; Harmon, Tyler S.; Pappu, Rohit V.

    2015-01-01

    We report the development and deployment of a coarse-graining method that is well suited for computer simulations of aggregation and phase separation of protein sequences with block-copolymeric architectures. Our algorithm, named CAMELOT for Coarse-grained simulations Aided by MachinE Learning Optimization and Training, leverages information from converged all atom simulations that is used to determine a suitable resolution and parameterize the coarse-grained model. To parameterize a system-specific coarse-grained model, we use a combination of Boltzmann inversion, non-linear regression, and a Gaussian process Bayesian optimization approach. The accuracy of the coarse-grained model is demonstrated through direct comparisons to results from all atom simulations. We demonstrate the utility of our coarse-graining approach using the block-copolymeric sequence from the exon 1 encoded sequence of the huntingtin protein. This sequence comprises of 17 residues from the N-terminal end of huntingtin (N17) followed by a polyglutamine (polyQ) tract. Simulations based on the CAMELOT approach are used to show that the adsorption and unfolding of the wild type N17 and its sequence variants on the surface of polyQ tracts engender a patchy colloid like architecture that promotes the formation of linear aggregates. These results provide a plausible explanation for experimental observations, which show that N17 accelerates the formation of linear aggregates in block-copolymeric N17-polyQ sequences. The CAMELOT approach is versatile and is generalizable for simulating the aggregation and phase behavior of a range of block-copolymeric protein sequences. PMID:26723608

  19. Deep sequencing methods for protein engineering and design.

    PubMed

    Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

    2017-08-01

    The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. High level activity of the mouse CCAAT/enhancer binding protein (C/EBP alpha) gene promoter involves autoregulation and several ubiquitous transcription factors.

    PubMed Central

    Legraverend, C; Antonson, P; Flodby, P; Xanthopoulos, K G

    1993-01-01

    The promoter region of the mouse CCAAT-Enhancer Binding Protein (C/EBP alpha) gene is capable of directing high levels of expression of reporter constructs in various cell lines, albeit even in cells that do not express their endogenous C/EBP alpha gene. To understand the molecular mechanisms underlying this ubiquitous expression, we have characterized the promoter region of the mouse C/EBP alpha gene by a variety of in vitro and in vivo methods. We show that three sites related in sequence to USF, BTE and C/EBP binding sites and present in promoter region -350/+3, are recognized by proteins from rat liver nuclear extracts. The sequence of the C/EBP alpha promoter that includes the USF binding site is also capable of forming stable complexes with purified Myc+Max heterodimers and mutation of this site drastically reduces transcription of C/EBP alpha promoter luciferase constructs both in liver and non liver cell lines. In addition, we identify three novel protein-binding sites two of which display similarity to NF-1 and a NF kappa B binding sites. The region located between nucleotides -197 and -178 forms several heat-stable complexes with liver nuclear proteins in vitro which are recognized mainly by antibodies specific for C/EBP alpha. Furthermore, transient expression of C/EBP alpha and to a lesser extent C/EBP beta expression vectors, results in transactivation of a cotransfected C/EBP alpha promoter-luciferase reporter construct. These experiments support the notion that the C/EBP alpha gene is regulated by C/EBP alpha but other C/EBP-related proteins may also be involved. Images PMID:8493090

  1. Two distinct promoter architectures centered on dynamic nucleosomes control ribosomal protein gene transcription.

    PubMed

    Knight, Britta; Kubik, Slawomir; Ghosh, Bhaswar; Bruzzone, Maria Jessica; Geertz, Marcel; Martin, Victoria; Dénervaud, Nicolas; Jacquet, Philippe; Ozkan, Burak; Rougemont, Jacques; Maerkl, Sebastian J; Naef, Félix; Shore, David

    2014-08-01

    In yeast, ribosome production is controlled transcriptionally by tight coregulation of the 138 ribosomal protein genes (RPGs). RPG promoters display limited sequence homology, and the molecular basis for their coregulation remains largely unknown. Here we identify two prevalent RPG promoter types, both characterized by upstream binding of the general transcription factor (TF) Rap1 followed by the RPG-specific Fhl1/Ifh1 pair, with one type also binding the HMG-B protein Hmo1. We show that the regulatory properties of the two promoter types are remarkably similar, suggesting that they are determined to a large extent by Rap1 and the Fhl1/Ifh1 pair. Rapid depletion experiments allowed us to define a hierarchy of TF binding in which Rap1 acts as a pioneer factor required for binding of all other TFs. We also uncovered unexpected features underlying recruitment of Fhl1, whose forkhead DNA-binding domain is not required for binding at most promoters, and Hmo1, whose binding is supported by repeated motifs. Finally, we describe unusually micrococcal nuclease (MNase)-sensitive nucleosomes at all RPG promoters, located between the canonical +1 and -1 nucleosomes, which coincide with sites of Fhl1/Ifh1 and Hmo1 binding. We speculate that these "fragile" nucleosomes play an important role in regulating RPG transcriptional output. © 2014 Knight et al.; Published by Cold Spring Harbor Laboratory Press.

  2. Two distinct promoter architectures centered on dynamic nucleosomes control ribosomal protein gene transcription

    PubMed Central

    Knight, Britta; Kubik, Slawomir; Ghosh, Bhaswar; Bruzzone, Maria Jessica; Geertz, Marcel; Martin, Victoria; Dénervaud, Nicolas; Jacquet, Philippe; Ozkan, Burak; Rougemont, Jacques; Maerkl, Sebastian J.; Naef, Félix

    2014-01-01

    In yeast, ribosome production is controlled transcriptionally by tight coregulation of the 138 ribosomal protein genes (RPGs). RPG promoters display limited sequence homology, and the molecular basis for their coregulation remains largely unknown. Here we identify two prevalent RPG promoter types, both characterized by upstream binding of the general transcription factor (TF) Rap1 followed by the RPG-specific Fhl1/Ifh1 pair, with one type also binding the HMG-B protein Hmo1. We show that the regulatory properties of the two promoter types are remarkably similar, suggesting that they are determined to a large extent by Rap1 and the Fhl1/Ifh1 pair. Rapid depletion experiments allowed us to define a hierarchy of TF binding in which Rap1 acts as a pioneer factor required for binding of all other TFs. We also uncovered unexpected features underlying recruitment of Fhl1, whose forkhead DNA-binding domain is not required for binding at most promoters, and Hmo1, whose binding is supported by repeated motifs. Finally, we describe unusually micrococcal nuclease (MNase)-sensitive nucleosomes at all RPG promoters, located between the canonical +1 and −1 nucleosomes, which coincide with sites of Fhl1/Ifh1 and Hmo1 binding. We speculate that these “fragile” nucleosomes play an important role in regulating RPG transcriptional output. PMID:25085421

  3. Interactions of HIPPI, a molecular partner of Huntingtin interacting protein HIP1, with the specific motif present at the putative promoter sequence of the caspase-1, caspase-8 and caspase-10 genes.

    PubMed

    Majumder, P; Choudhury, A; Banerjee, M; Lahiri, A; Bhattacharyya, N P

    2007-08-01

    To investigate the mechanism of increased expression of caspase-1 caused by exogenous Hippi, observed earlier in HeLa and Neuro2A cells, in this work we identified a specific motif AAAGACATG (- 101 to - 93) at the caspase-1 gene upstream sequence where HIPPI could bind. Various mutations in this specific sequence compromised the interaction, showing the specificity of the interactions. In the luciferase reporter assay, when the reporter gene was driven by caspase-1 gene upstream sequences (- 151 to - 92) with the mutation G to T at position - 98, luciferase activity was decreased significantly in green fluorescent protein-Hippi-expressing HeLa cells in comparison to that obtained with the wild-type caspase-1 gene 60 bp upstream sequence, indicating the biological significance of such binding. It was observed that the C-terminal 'pseudo' death effector domain of HIPPI interacted with the 60 bp (- 151 to - 92) upstream sequence of the caspase-1 gene containing the motif. We further observed that expression of caspase-8 and caspase-10 was increased in green fluorescent protein-Hippi-expressing HeLa cells. In addition, HIPPI interacted in vitro with putative promoter sequences of these genes, containing a similar motif. In summary, we identified a novel function of HIPPI; it binds to specific upstream sequences of the caspase-1, caspase-8 and caspase-10 genes and alters the expression of the genes. This result showed the motif-specific interaction of HIPPI with DNA, and indicates that it could act as transcription regulator.

  4. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  5. Determinants of the rate of protein sequence evolution

    PubMed Central

    Zhang, Jianzhi; Yang, Jian-Rong

    2015-01-01

    The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what constitutes functional constraint has remained unclear. The increasing availability of genomic data has allowed for much needed empirical examinations on the nature of functional constraint. These studies found that the evolutionary rate of a protein is predominantly influenced by its expression level rather than functional importance. A combination of theoretical and empirical analyses have identified multiple mechanisms behind these observations and demonstrated a prominent role that selection against errors in molecular and cellular processes plays in protein evolution. PMID:26055156

  6. Mining for class-specific motifs in protein sequence classification

    PubMed Central

    2013-01-01

    Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as

  7. Single-molecule protein sequencing through fingerprinting: computational assessment

    NASA Astrophysics Data System (ADS)

    Yao, Yao; Docter, Margreet; van Ginkel, Jetty; de Ridder, Dick; Joo, Chirlmin

    2015-10-01

    Proteins are vital in all biological systems as they constitute the main structural and functional components of cells. Recent advances in mass spectrometry have brought the promise of complete proteomics by helping draft the human proteome. Yet, this commonly used protein sequencing technique has fundamental limitations in sensitivity. Here we propose a method for single-molecule (SM) protein sequencing. A major challenge lies in the fact that proteins are composed of 20 different amino acids, which demands 20 molecular reporters. We computationally demonstrate that it suffices to measure only two types of amino acids to identify proteins and suggest an experimental scheme using SM fluorescence. When achieved, this highly sensitive approach will result in a paradigm shift in proteomics, with major impact in the biological and medical sciences.

  8. Self-organized neural maps of human protein sequences.

    PubMed Central

    Ferrán, E. A.; Pflugfelder, B.; Ferrara, P.

    1994-01-01

    We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen's unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large-scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2-dimensional topologically ordered map of 15 x 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time-consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU-hours [CPU-h]), and another one of 30 epochs (6.7 CPU-h). A further reduction of learning-computing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11 x 11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU-seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis. PMID:8019421

  9. High-Resolution Sequence-Function Mapping of Full-Length Proteins

    PubMed Central

    Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.

    2015-01-01

    Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064

  10. Genome Sequence of the Plant Growth Promoting Endophytic Bacterium Enterobacter sp. 638

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taghavi, S.; van der Lelie, D.; Hoffman, A.

    2010-05-13

    Enterobacter sp. 638 is an endophytic plant growth promoting gamma-proteobacterium that was isolated from the stem of poplar (Populus trichocarpa x deltoides cv. H11-11), a potentially important biofuel feed stock plant. The Enterobacter sp. 638 genome sequence reveals the presence of a 4,518,712 bp chromosome and a 157,749 bp plasmid (pENT638-1). Genome annotation and comparative genomics allowed the identification of an extended set of genes specific to the plant niche adaptation of this bacterium. This includes genes that code for putative proteins involved in survival in the rhizosphere (to cope with oxidative stress or uptake of nutrients released by plantmore » roots), root adhesion (pili, adhesion, hemagglutinin, cellulose biosynthesis), colonization/establishment inside the plant (chemiotaxis, flagella, cellobiose phosphorylase), plant protection against fungal and bacterial infections (siderophore production and synthesis of the antimicrobial compounds 4-hydroxybenzoate and 2-phenylethanol), and improved poplar growth and development through the production of the phytohormones indole acetic acid, acetoin, and 2,3-butanediol. Metabolite analysis confirmed by quantitative RT-PCR showed that, the production of acetoin and 2,3-butanediol is induced by the presence of sucrose in the growth medium. Interestingly, both the genetic determinants required for sucrose metabolism and the synthesis of acetoin and 2,3-butanediol are clustered on a genomic island. These findings point to a close interaction between Enterobacter sp. 638 and its poplar host, where the availability of sucrose, a major plant sugar, affects the synthesis of plant growth promoting phytohormones by the endophytic bacterium. The availability of the genome sequence, combined with metabolome and transcriptome analysis, will provide a better understanding of the synergistic interactions between poplar and its growth promoting endophyte Enterobacter sp. 638. This information can be further

  11. Ubiquitin-like protein UBL5 promotes the functional integrity of the Fanconi anemia pathway.

    PubMed

    Oka, Yasuyoshi; Bekker-Jensen, Simon; Mailand, Niels

    2015-05-12

    Ubiquitin and ubiquitin-like proteins (UBLs) function in a wide array of cellular processes. UBL5 is an atypical UBL that does not form covalent conjugates with cellular proteins and which has a known role in modulating pre-mRNA splicing. Here, we report an unexpected involvement of human UBL5 in promoting the function of the Fanconi anemia (FA) pathway for repair of DNA interstrand crosslinks (ICLs), mediated by a specific interaction with the central FA pathway component FANCI. UBL5-deficient cells display spliceosome-independent reduction of FANCI protein stability, defective FANCI function in response to DNA damage and hypersensitivity to ICLs. By mapping the sequence determinants underlying UBL5-FANCI binding, we generated separation-of-function mutants to demonstrate that key aspects of FA pathway function, including FANCI-FANCD2 heterodimerization, FANCD2 and FANCI monoubiquitylation and maintenance of chromosome stability after ICLs, are compromised when the UBL5-FANCI interaction is selectively inhibited by mutations in either protein. Together, our findings establish UBL5 as a factor that promotes the functionality of the FA DNA repair pathway. © 2015 The Authors.

  12. Integrating mRNA and Protein Sequencing Enables the Detection and Quantitative Profiling of Natural Protein Sequence Variants of Populus trichocarpa.

    PubMed

    Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L

    2015-12-04

    Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.

  13. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences.

    PubMed

    Yu, Jia-Feng; Dou, Xiang-Hua; Wang, Hong-Bo; Sun, Xiao; Zhao, Hui-Ying; Wang, Ji-Hua

    2015-06-22

    The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .

  14. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  15. Biophysical and structural considerations for protein sequence evolution

    PubMed Central

    2011-01-01

    Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites. Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model. PMID:22171550

  16. Regulation of HFE expression by Poly(ADP-ribose) polymerase-1 (PARP1) through an inverted repeat DNA sequence in the distal promoter

    PubMed Central

    Rodova, Marianna; Rudolph, Angela; Chipps, Elizabeth; Islam, M. Rafiq

    2013-01-01

    Hereditary hemochromatosis (HH) is a common autosomal recessive disorder of iron overload among Caucasians of northern European descent. Over 85% of all cases with HH are due to mutations in the hemochromatosis protein (HFE) involved in iron metabolism. Although the importance in iron homeostasis is well recognized, the mechanism of sensing and regulating iron absorption by HFE, especially in the absence of iron response element in its gene, is not fully understood. In this report, we have identified an inverted repeat sequence (ATGGTcttACCTA) within 1700 bp (−1675/+35) of the HFE promoter capable to form cruciform structure that binds PARP1 and strongly represses HFE promoter. Knockdown of PARP1 increases HFE mRNA and protein. Similarly, hemin or FeCl3 treatments resulted in increase in HFE expression by reducing nuclear PARP1 pool via its apoptosis induced cleavage, leading to upregulation of the iron regulatory hormone hepcidin mRNA. Thus, PARP1 binding to the inverted repeat sequence on the HFE promoter may serve as a novel iron sensing mechanism as increased iron level can trigger PARP1 cleavage and relief of HFE transcriptional repression. PMID:24184271

  17. Regulation of HFE expression by poly(ADP-ribose) polymerase-1 (PARP1) through an inverted repeat DNA sequence in the distal promoter.

    PubMed

    Pelham, Christopher; Jimenez, Tamara; Rodova, Marianna; Rudolph, Angela; Chipps, Elizabeth; Islam, M Rafiq

    2013-12-01

    Hereditary hemochromatosis (HH) is a common autosomal recessive disorder of iron overload among Caucasians of northern European descent. Over 85% of all cases with HH are due to mutations in the hemochromatosis protein (HFE) involved in iron metabolism. Although the importance in iron homeostasis is well recognized, the mechanism of sensing and regulating iron absorption by HFE, especially in the absence of iron response element in its gene, is not fully understood. In this report, we have identified an inverted repeat sequence (ATGGTcttACCTA) within 1700bp (-1675/+35) of the HFE promoter capable to form cruciform structure that binds PARP1 and strongly represses HFE promoter. Knockdown of PARP1 increases HFE mRNA and protein. Similarly, hemin or FeCl3 treatments resulted in increase in HFE expression by reducing nuclear PARP1 pool via its apoptosis induced cleavage, leading to upregulation of the iron regulatory hormone hepcidin mRNA. Thus, PARP1 binding to the inverted repeat sequence on the HFE promoter may serve as a novel iron sensing mechanism as increased iron level can trigger PARP1 cleavage and relief of HFE transcriptional repression. © 2013.

  18. Meta sequence analysis of human blood peptides and their parent proteins.

    PubMed

    Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

    2010-04-18

    Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.

  19. Regulated expression of the human cytomegalovirus pp65 gene: Octamer sequence in the promoter is required for activation by viral gene products

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Depto, A.S.; Stenberg, R.M.

    1989-03-01

    To better understand the regulation of late gene expression in human cytomegalovirus (CMV)-infected cells, the authors examined expression of the gene that codes for the 65-kilodalton lower-matrix phosphoprotein (pp65). Analysis of RNA isolated at 72 h from cells infected with CMV Towne or ts66, a DNA-negative temperature-sensitive mutant, supported the fact that pp65 is expressed at low levels prior to viral DNA replication but maximally expressed after the initiation of viral DNA replication. To investigate promoter activation in a transient expression assay, the pp65 promoter was cloned into the indicator plasmid containing the gene for chloramphenicol acetyltransferase (CAT). Transfection ofmore » the promoter-CAT construct and subsequent superinfection with CMV resulted in activation of the promoter at early times after infection. Cotransfection with plasmids capable of expressing immediate-early (IE) proteins demonstrated that the promoter was activated by IE proteins and that both IE regions 1 and 2 were necessary. These studies suggest that interactions between IE proteins and this octamer sequence may be important for the regulation and expression of this CMV gene.« less

  20. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  1. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  2. A crustacean Ca2+-binding protein with a glutamate-rich sequence promotes CaCO3 crystallization.

    PubMed

    Endo, Hirotoshi; Takagi, Yasuaki; Ozaki, Noriaki; Kogure, Toshihiro; Watanabe, Toshiki

    2004-11-15

    The DD4 mRNA of the penaeid prawn Penaeus japonicus was shown previously to be expressed in the epidermis adjacent to the exoskeleton specifically during the post-moult period, when calcification of the exoskeleton took place. The encoded protein possessed a Ca2+-binding site, suggesting its involvement in the calcification of the exoskeleton. In the present study, an additional ORF (open reading frame) of 289 amino acids was identified at the 5' end of the previous ORF. The newly identified part of the encoded protein included a region of approx. 120 amino acids that was highly rich in glutamate residues, and contained one or more Ca2+-binding sites. In an immunohistochemical study, signals were detected within calcified regions in the endocuticular layer of the exoskeleton. Bacterially expressed partial segments of the protein induced CaCO3 crystallization in vitro. Finally, a reverse transcription-PCR study showed that the expression was limited to an early part of the post-moult period, preceding significant calcification of the exoskeleton. These observations argue for the possibility that the encoded protein, renamed crustocalcin (CCN), promotes formation of CaCO3 crystals in the exoskeleton by inducing nucleation.

  3. GASP: Gapped Ancestral Sequence Prediction for proteins

    PubMed Central

    Edwards, Richard J; Shields, Denis C

    2004-01-01

    Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199

  4. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    PubMed Central

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  5. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    PubMed

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  6. A novel method based on new adaptive LVQ neural network for predicting protein-protein interactions from protein sequences.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2013-11-07

    Protein-Protein interaction (PPI) is one of the most important data in understanding the cellular processes. Many interesting methods have been proposed in order to predict PPIs. However, the methods which are based on the sequence of proteins as a prior knowledge are more universal. In this paper, a sequence-based, fast, and adaptive PPI prediction method is introduced to assign two proteins to an interaction class (yes, no). First, in order to improve the presentation of the sequences, twelve physicochemical properties of amino acid have been used by different representation methods to transform the sequence of protein pairs into different feature vectors. Then, for speeding up the learning process and reducing the effect of noise PPI data, principal component analysis (PCA) is carried out as a proper feature extraction algorithm. Finally, a new and adaptive Learning Vector Quantization (LVQ) predictor is designed to deal with different models of datasets that are classified into balanced and imbalanced datasets. The accuracy of 93.88%, 90.03%, and 89.72% has been found on S. cerevisiae, H. pylori, and independent datasets, respectively. The results of various experiments indicate the efficiency and validity of the method. © 2013 Published by Elsevier Ltd.

  7. Triple helix-forming oligonucleotide corresponding to the polypyrimidine sequence in the rat alpha 1(I) collagen promoter specifically inhibits factor binding and transcription.

    PubMed

    Kovacs, A; Kandala, J C; Weber, K T; Guntaka, R V

    1996-01-19

    Type I and III fibrillar collagens are the major structural proteins of the extracellular matrix found in various organs including the myocardium. Abnormal and progressive accumulation of fibrillar type I collagen in the interstitial spaces compromises organ function and therefore, the study of transcriptional regulation of this gene and specific targeting of its expression is of major interest. Transient transfection of adult cardiac fibroblasts indicate that the polypurine-polypyrimidine sequence of alpha 1(I) collagen promoter between nucleotides - 200 and -140 represents an overall positive regulatory element. DNase I footprinting and electrophoretic mobility shift assays suggest that multiple factors bind to different elements of this promoter region. We further demonstrate that the unique polypyrimidine sequence between -172 and -138 of the promoter represents a suitable target for a single-stranded polypurine oligonucleotide (TFO) to form a triple helix DNA structure. Modified electrophoretic mobility shift assays show that this TFO specifically inhibits the protein-DNA interaction within the target region. In vitro transcription assays and transient transfection experiments demonstrate that the transcriptional activity of the promoter is inhibited by this oligonucleotide. We propose that TFOs represent a therapeutic potential to specifically influence the expression of alpha 1(I) collagen gene in various disease states where abnormal type I collagen accumulation is known to occur.

  8. Comparative analysis of the prion protein gene sequences in African lion.

    PubMed

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  9. Genome Sequence of the Plant Growth Promoting Endophytic Bacterium Enterobacter sp. 638

    PubMed Central

    Taghavi, Safiyh; van der Lelie, Daniel; Hoffman, Adam; Zhang, Yian-Biao; Walla, Michael D.; Vangronsveld, Jaco; Newman, Lee; Monchy, Sébastien

    2010-01-01

    Enterobacter sp. 638 is an endophytic plant growth promoting gamma-proteobacterium that was isolated from the stem of poplar (Populus trichocarpa×deltoides cv. H11-11), a potentially important biofuel feed stock plant. The Enterobacter sp. 638 genome sequence reveals the presence of a 4,518,712 bp chromosome and a 157,749 bp plasmid (pENT638-1). Genome annotation and comparative genomics allowed the identification of an extended set of genes specific to the plant niche adaptation of this bacterium. This includes genes that code for putative proteins involved in survival in the rhizosphere (to cope with oxidative stress or uptake of nutrients released by plant roots), root adhesion (pili, adhesion, hemagglutinin, cellulose biosynthesis), colonization/establishment inside the plant (chemiotaxis, flagella, cellobiose phosphorylase), plant protection against fungal and bacterial infections (siderophore production and synthesis of the antimicrobial compounds 4-hydroxybenzoate and 2-phenylethanol), and improved poplar growth and development through the production of the phytohormones indole acetic acid, acetoin, and 2,3-butanediol. Metabolite analysis confirmed by quantitative RT–PCR showed that, the production of acetoin and 2,3-butanediol is induced by the presence of sucrose in the growth medium. Interestingly, both the genetic determinants required for sucrose metabolism and the synthesis of acetoin and 2,3-butanediol are clustered on a genomic island. These findings point to a close interaction between Enterobacter sp. 638 and its poplar host, where the availability of sucrose, a major plant sugar, affects the synthesis of plant growth promoting phytohormones by the endophytic bacterium. The availability of the genome sequence, combined with metabolome and transcriptome analysis, will provide a better understanding of the synergistic interactions between poplar and its growth promoting endophyte Enterobacter sp. 638. This information can be further exploited to

  10. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leung, Elo; Huang, Amy; Cadag, Eithon

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  11. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE PAGES

    Leung, Elo; Huang, Amy; Cadag, Eithon; ...

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  12. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are

  13. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of

  14. Gammaretroviral pol sequences act in cis to direct polysome loading and NXF1/NXT-dependent protein production by gag-encoded RNA.

    PubMed

    Bartels, Hanni; Luban, Jeremy

    2014-09-12

    All retroviruses synthesize essential proteins via alternatively spliced mRNAs. Retrovirus genera, though, exploit different mechanisms to coordinate the synthesis of proteins from alternatively spliced mRNAs. The best studied of these retroviral, post-transcriptional effectors are the trans-acting Rev protein of lentiviruses and the cis-acting constitutive transport element (CTE) of the betaretrovirus Mason-Pfizer monkey virus (MPMV). How members of the gammaretrovirus genus translate protein from unspliced RNA has not been elucidated. The mechanism by which two gammaretroviruses, XMRV and MLV, synthesize the Gag polyprotein (Pr65Gag) from full-length, unspliced mRNA was investigated here. The yield of Pr65Gag from a gag-only expression plasmid was found to be at least 30-fold less than that from an otherwise isogenic gag-pol expression plasmid. A frameshift mutation disrupting the pol open reading frame within the gag-pol expression plasmid did not decrease Pr65Gag production and 398 silent nucleotide changes engineered into gag rendered Pr65Gag synthesis pol-independent. These results are consistent with pol-encoded RNA acting in cis to promote Pr65Gag translation. Two independently-acting pol fragments were identified by screening 17 pol deletion mutations. To determine the mechanism by which pol promoted Pr65Gag synthesis, gag RNA in total and cytoplasmic fractions was quantitated by northern blot and by RT-PCR. The pol sequences caused, maximally, three-fold increase in total or cytoplasmic gag mRNA. Instead, pol sequences increased gag mRNA association with polyribosomes ~100-fold, a magnitude sufficient to explain the increase in Pr65Gag translation efficiency. The MPMV CTE, an NXF1-binding element, substituted for pol in promoting Pr65Gag synthesis. A pol RNA stem-loop resembling the CTE promoted Pr65Gag synthesis. Over-expression of NXF1 and NXT, host factors that bind to the MPMV CTE, synergized with pol to promote gammaretroviral gag RNA loading onto

  15. Organization, chromosomal localization and promoter analysis of the gene encoding human acidic fibroblast growth factor intracellular binding protein.

    PubMed Central

    Kolpakova, E; Frengen, E; Stokke, T; Olsnes, S

    2000-01-01

    Acidic fibroblast growth factor (aFGF) intracellular binding protein (FIBP) is a protein found mainly in the nucleus that might be involved in the intracellular function of aFGF. Here we present a comparative analysis of the deduced amino acid sequences of human, murine and Drosophila FIBP analogues and demonstrate that FIBP is an evolutionarily conserved protein. The human gene spans more than 5 kb, comprising ten exons and nine introns, and maps to chromosome 11q13.1. Two slightly different splice variants found in different tissues were isolated and characterized. Sequence analysis of the region surrounding the translation start revealed a CpG island, a classical feature of widely expressed genes. Functional studies of the promoter region with a luciferase reporter system suggested a strong transcriptional activity residing within 600 bp of the 5' flanking region. PMID:11104667

  16. Elman RNN based classification of proteins sequences on account of their mutual information.

    PubMed

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    PubMed

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  18. Sequence patterns mediating functions of disordered proteins.

    PubMed

    Exarchos, Konstantinos P; Kourou, Konstantina; Exarchos, Themis P; Papaloukas, Costas; Karamouzis, Michalis V; Fotiadis, Dimitrios I

    2015-01-01

    Disordered proteins lack specific 3D structure in their native state and have been implicated with numerous cellular functions as well as with the induction of severe diseases, e.g., cardiovascular and neurodegenerative diseases as well as diabetes. Due to their conformational flexibility they are often found to interact with a multitude of protein molecules; this one-to-many interaction which is vital for their versatile functioning involves short consensus protein sequences, which are normally detected using slow and cumbersome experimental procedures. In this work we exploit information from disorder-oriented protein interaction networks focused specifically on humans, in order to assemble, by means of overrepresentation, a set of sequence patterns that mediate the functioning of disordered proteins; hence, we are able to identify how a single protein achieves such functional promiscuity. Next, we study the sequential characteristics of the extracted patterns, which exhibit a striking preference towards a very limited subset of amino acids; specifically, residues leucine, glutamic acid, and serine are particularly frequent among the extracted patterns, and we also observe a nontrivial propensity towards alanine and glycine. Furthermore, based on the extracted patterns we set off to infer potential functional implications in order to verify our findings and potentially further extrapolate our knowledge regarding the functioning of disordered proteins. We observe that the extracted patterns are primarily involved with regulation, binding and posttranslational modifications, which constitute the most prominent functions of disordered proteins.

  19. Comparative analysis of myostatin gene and promoter sequences of Qinchuan and Red Angus cattle.

    PubMed

    He, Y L; Wu, Y H; Quan, F S; Liu, Y G; Zhang, Y

    2013-09-04

    To better understand the function of the myostatin gene and its promoter region in bovine, we amplified and sequenced the myostatin gene and promoter from the blood of Qinchuan and Red Angus cattle by using polymerase chain reaction. The sequences of Qinchuan and Red Angus cattle were compared with those of other cattle breeds available in GenBank. Exon splice sites were confirmed by mRNA sequencing. Compared to the published sequence (GenBank accession No. AF320998), 69 single nucleotide polymorphisms (SNPs) were identified in the Qinchuan myostatin gene, only one of which was an insertion mutation in Qinchuan cattle. There was a 16-bp insertion in the first 705-bp intron in 3 Qinchuan cattle. A total of 7 SNPs were identified in exon 3, in which the mutation occurred in the third base of the codon and was synonymous. On comparing the Qinchuan myostatin gene sequence to that of Red Angus cattle, a total of 50 SNPs were identified in the first and third exons. In addition, there were 18 SNPs identified in the Qinchuan cattle promoter region compared with those of other cattle compared to the Red Angus cattle myostatin promoter region. breeds (GenBank accession No. AF348479), but only 14 SNPs when compared to the Red Angus cattle myostatin promoter region.

  20. The HMMER Web Server for Protein Sequence Similarity Search.

    PubMed

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  1. Sequence Determines Degree of Knottedness in a Coarse-Grained Protein Model

    NASA Astrophysics Data System (ADS)

    Wüst, Thomas; Reith, Daniel; Virnau, Peter

    2015-01-01

    Knots are abundant in globular homopolymers but rare in globular proteins. To shed new light on this long-standing conundrum, we study the influence of sequence on the formation of knots in proteins under native conditions within the framework of the hydrophobic-polar lattice protein model. By employing large-scale Wang-Landau simulations combined with suitable Monte Carlo trial moves we show that even though knots are still abundant on average, sequence introduces large variability in the degree of self-entanglements. Moreover, we are able to design sequences which are either almost always or almost never knotted. Our findings serve as proof of concept that the introduction of just one additional degree of freedom per monomer (in our case sequence) facilitates evolution towards a protein universe in which knots are rare.

  2. Quantiprot - a Python package for quantitative analysis of protein sequences.

    PubMed

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  3. Correlation between protein expression of FOXP3 and level of FOXP3 promoter methylation in recurrent spontaneous abortion.

    PubMed

    Hou, Wenhui; Li, Zhuyu; Li, Yinguang; Fang, Liyuan; Li, Jie; Huang, Jia; Li, Xiaoqing; You, Zeshan

    2016-11-01

    The aim of this study was to investigate the level of Forkhead box P3 (FOXP3) promoter methylation and protein expression in recurrent spontaneous abortion and to elucidate the pathogenesis of unexplained recurrent spontaneous abortion (URSA). We assessed a total of 56 URSA patients with a normal embryo, 24 recurrent spontaneous abortion (RSA) patients with an abnormal embryo (as control group 1), and 39 normal pregnant women (as control group 2). The expression of FOXP3 protein in deciduas was assessed through Western blot, and the level of FOXP3 promoter methylation was detected using bisulfite-assisted genomic sequencing polymerase chain reaction. The expressing quantity of FOXP3 protein in the URSA group was significantly lower than that in control groups 1 and 2, both with a P-value < 0.05. By contrast, no statistical difference was observed in the expressing quantity of FOXP3 protein of the two control groups (P = 0.212). The FOXP3 promoter methylation level in the URSA group was significantly higher than that in the two control groups, both of which exhibited a statistical difference of P-values < 0.05. Meanwhile, no statistical difference was observed in the FOXP3 promoter methylation level of the two control groups (P = 0.141). A negative correlation was found between the FOXP3 promoter methylation level and the expressing quantity of FOXP3 protein (r = -0.861, P < 0.05). Increasing FOXP3 promoter methylation levels may cause abnormal immune tolerance through the downregulation expression of the FOXP3 protein, which in turn leads to URSA. © 2016 The Authors. Journal of Obstetrics and Gynaecology Research published by John Wiley & Sons Australia, Ltd on behalf of Japan Society of Obstetrics and Gynecology.

  4. Enhanced expression of cro-beta-galactosidase fusion proteins under the control of the PR promoter of bacteriophage lambda.

    PubMed Central

    Zabeau, M; Stanley, K K

    1982-01-01

    Hybrid plasmids carrying cro-lacZ gene fusions have been constructed by joining DNA segments carrying the PR promoter and the start of the cro gene of bacteriophage lambda to the lacZ gene fragment carried by plasmid pLG400 . Plasmids in which the translational reading frames of the cro and lacZ genes are joined in-register (type I) direct the synthesis of elevated levels of cro-beta-galactosidase fusion protein amounting to 30% of the total cellular protein, while plasmids in which the genes are fused out-of-register (type II) produce a low level of beta-galactosidase protein. Sequence rearrangements downstream of the cro initiator AUG were found to influence the efficiency of translation, and have been correlated with alterations in the RNA secondary structure of the ribosome-binding site. Plasmids which direct the synthesis of high levels of beta-galactosidase are conditionally lethal and can only be propagated when the PR promoter is repressed. Deletion of sequences downstream of the lacZ gene restored viability, indicating that this region of the plasmid encodes a function which inhibits the growth of the cells. The different applications of these plasmids for expression of cloned genes are discussed. Images Fig. 6. PMID:6327257

  5. Deciphering mRNA Sequence Determinants of Protein Production Rate

    NASA Astrophysics Data System (ADS)

    Szavits-Nossan, Juraj; Ciandrini, Luca; Romano, M. Carmen

    2018-03-01

    One of the greatest challenges in biophysical models of translation is to identify coding sequence features that affect the rate of translation and therefore the overall protein production in the cell. We propose an analytic method to solve a translation model based on the inhomogeneous totally asymmetric simple exclusion process, which allows us to unveil simple design principles of nucleotide sequences determining protein production rates. Our solution shows an excellent agreement when compared to numerical genome-wide simulations of S. cerevisiae transcript sequences and predicts that the first 10 codons, which is the ribosome footprint length on the mRNA, together with the value of the initiation rate, are the main determinants of protein production rate under physiological conditions. Finally, we interpret the obtained analytic results based on the evolutionary role of the codons' choice for regulating translation rates and ribosome densities.

  6. MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions.

    PubMed

    Li, Minghui; Simonetti, Franco L; Goncearenco, Alexander; Panchenko, Anna R

    2016-07-08

    Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of impacts of variants on proteins. To address this need we introduce a new computational method MutaBind to evaluate the effects of sequence variants and disease mutations on protein interactions and calculate the quantitative changes in binding affinity. The MutaBind method uses molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. The MutaBind server maps mutations on a structural protein complex, calculates the associated changes in binding affinity, determines the deleterious effect of a mutation, estimates the confidence of this prediction and produces a mutant structural model for download. MutaBind can be applied to a large number of problems, including determination of potential driver mutations in cancer and other diseases, elucidation of the effects of sequence variants on protein fitness in evolution and protein design. MutaBind is available at http://www.ncbi.nlm.nih.gov/projects/mutabind/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  7. Structure-related statistical singularities along protein sequences: a correlation study.

    PubMed

    Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

    2005-01-01

    A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.

  8. On the relationship between residue structural environment and sequence conservation in proteins.

    PubMed

    Liu, Jen-Wei; Lin, Jau-Ji; Cheng, Chih-Wen; Lin, Yu-Feng; Hwang, Jenn-Kang; Huang, Tsun-Tsao

    2017-09-01

    Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C α positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C α atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C α atoms with other substructures in their contributions to the sequence conservation. Our results show that C α positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C α atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C α and all-atom substructures. These results indicate that only C α atoms of a protein structure could reflect sequence conservation at the residue level. © 2017 Wiley Periodicals, Inc.

  9. SeqRate: sequence-based protein folding type classification and rates prediction

    PubMed Central

    2010-01-01

    Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647

  10. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  11. Increase in Bacterial Colony Formation from a Permafrost Ice Wedge Dosed with a Tomitella biformata Recombinant Resuscitation-Promoting Factor Protein.

    PubMed

    Puspita, Indun Dewi; Kitagawa, Wataru; Kamagata, Yoichi; Tanaka, Michiko; Nakatsu, Cindy H

    2015-01-01

    Resuscitation-promoting factor (Rpf) is a protein that has been found in a number of different Actinobacteria species and has been shown to promote the growth of active cells and resuscitate dormant (non-dividing) cells. We previously reported the biological activity of an Rpf protein in Tomitella biformata AHU 1821(T), an Actinobacteria isolated from a permafrost ice wedge. This protein is excreted outside the cell; however, few studies have investigated its contribution in environmental samples to the growth or resuscitation of bacteria other than the original host. Therefore, the aim of the present study was to determine whether Rpf from T. biformata impacted the cultivation of other bacteria from the permafrost ice wedge from which it was originally isolated. All experiments used recombinant Rpf proteins produced using a Rhodococcus erythropolis expression system. Dilutions of melted surface sterilized ice wedge samples mixed with different doses of the purified recombinant Rpf (rRpf) protein indicated that the highest concentration tested, 1250 pM, had a significantly (p <0.05) higher number of CFUs on agar plates after 8 d, approximately 14-fold higher than that on control plates without rRpf. 16S rRNA gene sequences revealed that all the colonies on plates were mainly related to Brevibacterium antiquum strain VKM Ac-2118 (AY243344), with 98-99% sequence identity. This species is also a member of the phylum Actinobacteria and was originally isolated from Siberian permafrost sediments. The results of the present study demonstrated that rRpf not only promoted the growth of T. biformata from which it was isolated, but also enhanced colony formation by another Actinobacteria in an environmental sample.

  12. Increase in Bacterial Colony Formation from a Permafrost Ice Wedge Dosed with a Tomitella biformata Recombinant Resuscitation-Promoting Factor Protein

    PubMed Central

    Puspita, Indun Dewi; Kitagawa, Wataru; Kamagata, Yoichi; Tanaka, Michiko; Nakatsu, Cindy H.

    2015-01-01

    Resuscitation-promoting factor (Rpf) is a protein that has been found in a number of different Actinobacteria species and has been shown to promote the growth of active cells and resuscitate dormant (non-dividing) cells. We previously reported the biological activity of an Rpf protein in Tomitella biformata AHU 1821T, an Actinobacteria isolated from a permafrost ice wedge. This protein is excreted outside the cell; however, few studies have investigated its contribution in environmental samples to the growth or resuscitation of bacteria other than the original host. Therefore, the aim of the present study was to determine whether Rpf from T. biformata impacted the cultivation of other bacteria from the permafrost ice wedge from which it was originally isolated. All experiments used recombinant Rpf proteins produced using a Rhodococcus erythropolis expression system. Dilutions of melted surface sterilized ice wedge samples mixed with different doses of the purified recombinant Rpf (rRpf) protein indicated that the highest concentration tested, 1250 pM, had a significantly (p <0.05) higher number of CFUs on agar plates after 8 d, approximately 14-fold higher than that on control plates without rRpf. 16S rRNA gene sequences revealed that all the colonies on plates were mainly related to Brevibacterium antiquum strain VKM Ac-2118 (AY243344), with 98–99% sequence identity. This species is also a member of the phylum Actinobacteria and was originally isolated from Siberian permafrost sediments. The results of the present study demonstrated that rRpf not only promoted the growth of T. biformata from which it was isolated, but also enhanced colony formation by another Actinobacteria in an environmental sample. PMID:25843055

  13. Direct repeat sequences in the Streptomyces chitinase-63 promoter direct both glucose repression and chitin induction

    PubMed Central

    Ni, Xiangyang; Westpheling, Janet

    1997-01-01

    The chi63 promoter directs glucose-sensitive, chitin-dependent transcription of a gene involved in the utilization of chitin as carbon source. Analysis of 5′ and 3′ deletions of the promoter region revealed that a 350-bp segment is sufficient for wild-type levels of expression and regulation. The analysis of single base changes throughout the promoter region, introduced by random and site-directed mutagenesis, identified several sequences to be important for activity and regulation. Single base changes at −10, −12, −32, −33, −35, and −37 upstream of the transcription start site resulted in loss of activity from the promoter, suggesting that bases in these positions are important for RNA polymerase interaction. The sequences centered around −10 (TATTCT) and −35 (TTGACC) in this promoter are, in fact, prototypical of eubacterial promoters. Overlapping the RNA polymerase binding site is a perfect 12-bp direct repeat sequence. Some base changes within this direct repeat resulted in constitutive expression, suggesting that this sequence is an operator for negative regulation. Other base changes resulted in loss of glucose repression while retaining the requirement for chitin induction, suggesting that this sequence is also involved in glucose repression. The fact that cis-acting mutations resulted in glucose resistance but not inducer independence rules out the possibility that glucose repression acts exclusively by inducer exclusion. The fact that mutations that affect glucose repression and chitin induction fall within the same direct repeat sequence module suggests that the direct repeat sequence facilitates both chitin induction and glucose repression. PMID:9371809

  14. Characterization of the Structural Gene Promoter of Aedes aegypti Densovirus

    PubMed Central

    Ward, Todd W.; Kimmick, Michael W.; Afanasiev, Boris N.; Carlson, Jonathan O.

    2001-01-01

    Aedes aegypti densonucleosis virus (AeDNV) has two promoters that have been shown to be active by reporter gene expression analysis (B. N. Afanasiev, Y. V. Koslov, J. O. Carlson, and B. J. Beaty, Exp. Parasitol. 79:322–339, 1994). Northern blot analysis of cells infected with AeDNV revealed two transcripts 1,200 and 3,500 nucleotides in length that are assumed to express the structural protein (VP) gene and nonstructural protein genes, respectively. Primer extension was used to map the transcriptional start site of the structural protein gene. Surprisingly, the structural protein gene transcript began at an initiator consensus sequence, CAGT, 60 nucleotides upstream from the map unit 61 TATAA sequence previously thought to define the promoter. Constructs with the β-galactosidase gene fused to the structural protein gene were used to determine elements necessary for promoter function. Deletion or mutation of the initiator sequence, CAGT, reduced protein expression by 93%, whereas mutation of the TATAA sequence at map unit 61 had little effect. An additional open reading frame was observed upstream of the structural protein gene that can express β-galactosidase at a low level (20% of that of VP fusions). Expression of the AeDNV structural protein gene was shown to be stimulated by the major nonstructural protein NS1 (Afanasiev et al., Exp. parasitol., 1994). To determine the sequences required for transactivation, expression of structural protein gene–β-galactosidase gene fusion constructs differing in AeDNV genome content was measured with and without NS1. The presence of NS1 led to an 8- to 10-fold increase in expression when either genomic end was present, compared to a 2-fold increase with a construct lacking the genomic ends. An even higher (37-fold) increase in expression occurred with both genomic ends present; however, this was in part due to template replication as shown by Southern blot analysis. These data indicate the location and importance of

  15. Terminal sequence importance of de novo proteins from binary-patterned library: stable artificial proteins with 11- or 12-amino acid alphabet.

    PubMed

    Okura, Hiromichi; Takahashi, Tsuyoshi; Mihara, Hisakazu

    2012-06-01

    Successful approaches of de novo protein design suggest a great potential to create novel structural folds and to understand natural rules of protein folding. For these purposes, smaller and simpler de novo proteins have been developed. Here, we constructed smaller proteins by removing the terminal sequences from stable de novo vTAJ proteins and compared stabilities between mutant and original proteins. vTAJ proteins were screened from an α3β3 binary-patterned library which was designed with polar/ nonpolar periodicities of α-helix and β-sheet. vTAJ proteins have the additional terminal sequences due to the method of constructing the genetically repeated library sequences. By removing the parts of the sequences, we successfully obtained the stable smaller de novo protein mutants with fewer amino acid alphabets than the originals. However, these mutants showed the differences on ANS binding properties and stabilities against denaturant and pH change. The terminal sequences, which were designed just as flexible linkers not as secondary structure units, sufficiently affected these physicochemical details. This study showed implications for adjusting protein stabilities by designing N- and C-terminal sequences.

  16. Double promoter expression systems for recombinant protein production by industrial microorganisms.

    PubMed

    Öztürk, Sibel; Ergün, Burcu Gündüz; Çalık, Pınar

    2017-10-01

    Using double promoter expression systems is a promising approach to increase heterologous protein production. In this review, current double promoter expression systems for the production of recombinant proteins (r-proteins) by industrially important bacteria, Bacillus subtilis and Escherichia coli; and yeasts, Saccharomyces cerevisiae and Pichia pastoris, are discussed by assessing their potentials and drawbacks. Double promoter expression systems need to be designed to maintain a higher specific product formation rate within the production domain. While bacterial double promoter systems have been constructed as chimeric tandem promoters, yeast dual promoter systems have been developed as separate expression cassettes. To increase production and productivity, the optimal transcriptional activity should be justified either by simultaneously satisfying the requirements of both promoters, or by consecutively stimulating the changeover from one to another in a biphasic process or via successive-iterations. Thus, considering the dynamics of a fermentation process, double promoters can be classified according to their operational mechanisms, as: i) consecutively operating double promoter systems, and ii) simultaneously operating double promoter systems. Among these metabolic design strategies, extending the expression period with two promoters activated under different conditions, or enhancing the transcriptional activity with two promoters activated under similar conditions within the production domain, can be applied independently from the host. Novel studies with new insights, which aim a rational systematic design and construction of dual promoter expression vectors with tailored transcriptional activity, will empower r-protein production with enhanced production and productivity. Finally, the current state-of-the-art review emphasizes the advantages of double promoter systems along with the necessity for discovering new promoters for the development of more

  17. PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

    PubMed Central

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of

  18. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    PubMed

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently

  19. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    PubMed

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  20. Transactivation of a cellular promoter by the NS1 protein of the parvovirus minute virus of mice through a putative hormone-responsive element.

    PubMed Central

    Vanacker, J M; Corbau, R; Adelmant, G; Perros, M; Laudet, V; Rommelaere, J

    1996-01-01

    The promoter of the thyroid hormone receptor alpha gene (c-erbA-1) is activated by the nonstructural protein 1 (NS1) of parvovirus minute virus of mice (prototype strain [MVMp]) in ras-transformed FREJ4 cells that are permissive for lytic MVMp replication. This stimulation may be related to the sensitivity of host cells to MVMp, as it does not take place in parental FR3T3 cells, which are resistant to the parvovirus killing effect. The analysis of a series of deletion and point mutants of the c-erbA-1 promoter led to the identification of an upstream region that is necessary for NS1-driven transactivation. This sequence harbors a putative hormone-responsive element and is sufficient to render a minimal promoter NS1 inducible in FREJ4 but not in FR3T3 cells, and it is involved in distinct interactions with proteins from the respective cell lines. The NS1-responsive element of the c-erbA-1 promoter bears no homology with sequences that were previously reported to be necessary for NS1 DNA binding and transactivation. Altogether, our data point to a novel, cell-specific mechanism of promoter activation by NS1. PMID:8642664

  1. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of

  2. Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry.

    PubMed

    Asara, John M; Schweitzer, Mary H; Freimark, Lisa M; Phillips, Matthew; Cantley, Lewis C

    2007-04-13

    Fossilized bones from extinct taxa harbor the potential for obtaining protein or DNA sequences that could reveal evolutionary links to extant species. We used mass spectrometry to obtain protein sequences from bones of a 160,000- to 600,000-year-old extinct mastodon (Mammut americanum) and a 68-million-year-old dinosaur (Tyrannosaurus rex). The presence of T. rex sequences indicates that their peptide bonds were remarkably stable. Mass spectrometry can thus be used to determine unique sequences from ancient organisms from peptide fragmentation patterns, a valuable tool to study the evolution and adaptation of ancient taxa from which genomic sequences are unlikely to be obtained.

  3. Simian virus 40 major late promoter: an upstream DNA sequence required for efficient in vitro transcription.

    PubMed Central

    Brady, J; Radonovich, M; Thoren, M; Das, G; Salzman, N P

    1984-01-01

    We have previously identified an 11-base DNA sequence, 5'-G-G-T-A-C-C-T-A-A-C-C-3' (simian virus 40 [SV40] map position 294 to 304), which is important in the control of SV40 late RNA expression in vitro and in vivo (Brady et al., Cell 31:625-633, 1982). We report here the identification of another domain of the SV40 late promoter. A series of mutants with deletions extending from SV40 map position 0 to 300 was prepared by nuclease BAL 31 treatment. The cloned templates were then analyzed for efficiency and accuracy of late SV40 RNA expression in the Manley in vitro transcription system. Our studies showed that, in addition to the promoter domain near map position 300, there are essential DNA sequences between nucleotide positions 74 and 95 that are required for efficient expression of late SV40 RNA. Included in this SV40 DNA sequence were two of the six GGGCGG SV40 repeat sequences and an 11-nucleotide segment which showed strong homology with the upstream sequences required for the efficient in vitro and in vivo expression of the histone H2A gene. This upstream promoter sequence supported transcription with the same efficiency even when it was moved 72 nucleotides closer to the major late cap site. In vitro promoter competition analysis demonstrated that the upstream promoter sequence, independent of the 294 to 304 promoter element, is capable of binding polymerase-transcription factors required for SV40 late gene transcription. Finally, we show that DNA sequences which control the specificity of RNA initiation at nucleotide 325 lie downstream of map position 294. Images PMID:6321950

  4. TALE factors poise promoters for activation by Hox proteins.

    PubMed

    Choe, Seong-Kyu; Ladam, Franck; Sagerström, Charles G

    2014-01-27

    Hox proteins form complexes with TALE cofactors from the Pbx and Prep/Meis families to control transcription, but it remains unclear how Hox:TALE complexes function. Examining a Hoxb1b:TALE complex that regulates zebrafish hoxb1a transcription, we find maternally deposited TALE proteins at the hoxb1a promoter already during blastula stages. These TALE factors recruit histone-modifying enzymes to promote an active chromatin profile at the hoxb1a promoter and also recruit RNA polymerase II (RNAPII) and P-TEFb. However, in the presence of TALE factors, RNAPII remains phosphorylated on serine 5 and hoxb1a transcription is inefficient. By gastrula stages, Hoxb1b binds together with TALE factors to the hoxb1a promoter. This triggers P-TEFb-mediated transitioning of RNAPII to the serine 2-phosphorylated form and efficient hoxb1a transcription. We conclude that TALE factors access promoters during early embryogenesis to poise them for activation but that Hox proteins are required to trigger efficient transcription. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. Toscana virus NSs protein promotes degradation of double-stranded RNA-dependent protein kinase.

    PubMed

    Kalveram, Birte; Ikegami, Tetsuro

    2013-04-01

    Toscana virus (TOSV), which is transmitted by Phlebotomus spp. sandflies, is a major etiologic agent of aseptic meningitis and encephalitis in the Mediterranean. Like other members of the genus Phlebovirus of the family Bunyaviridae, TOSV encodes a nonstructural protein (NSs) in its small RNA segment. Although the NSs of Rift Valley fever virus (RVFV) has been identified as an important virulence factor, which suppresses host general transcription, inhibits transcription from the beta interferon promoter, and promotes the proteasomal degradation of double-stranded RNA-dependent protein kinase (PKR), little is known about the functions of NSs proteins encoded by less-pathogenic members of this genus. In this study we report that TOSV is able to downregulate PKR with similar efficiency as RVFV, while infection with the other phleboviruses-i.e., Punta Toro virus, sandfly fever Sicilian virus, or Frijoles virus-has no effect on cellular PKR levels. In contrast to RVFV, however, cellular transcription remains unaffected during TOSV infection. TOSV NSs protein promotes the proteasome-dependent downregulation of PKR and is able to interact with kinase-inactive PKR in infected cells.

  6. Toscana Virus NSs Protein Promotes Degradation of Double-Stranded RNA-Dependent Protein Kinase

    PubMed Central

    Kalveram, Birte

    2013-01-01

    Toscana virus (TOSV), which is transmitted by Phlebotomus spp. sandflies, is a major etiologic agent of aseptic meningitis and encephalitis in the Mediterranean. Like other members of the genus Phlebovirus of the family Bunyaviridae, TOSV encodes a nonstructural protein (NSs) in its small RNA segment. Although the NSs of Rift Valley fever virus (RVFV) has been identified as an important virulence factor, which suppresses host general transcription, inhibits transcription from the beta interferon promoter, and promotes the proteasomal degradation of double-stranded RNA-dependent protein kinase (PKR), little is known about the functions of NSs proteins encoded by less-pathogenic members of this genus. In this study we report that TOSV is able to downregulate PKR with similar efficiency as RVFV, while infection with the other phleboviruses—i.e., Punta Toro virus, sandfly fever Sicilian virus, or Frijoles virus—has no effect on cellular PKR levels. In contrast to RVFV, however, cellular transcription remains unaffected during TOSV infection. TOSV NSs protein promotes the proteasome-dependent downregulation of PKR and is able to interact with kinase-inactive PKR in infected cells. PMID:23325696

  7. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  8. Drosophila Suppressor of Sable Protein [Su(s)] Promotes Degradation of Aberrant and Transposon-Derived RNAs▿

    PubMed Central

    Kuan, Yung-Shu; Brewer-Jensen, Paul; Bai, Wen-Li; Hunter, Cedric; Wilson, Carrie B.; Bass, Sarah; Abernethy, John; Wing, James S.; Searles, Lillie L.

    2009-01-01

    RNA-binding proteins act at various stages of gene expression to regulate and fine-tune patterns of mRNA accumulation. One protein in this class is Drosophila Su(s), a nuclear protein that has been previously shown to inhibit the accumulation of mutant transcripts by an unknown mechanism. Here, we have identified several additional RNAs that are downregulated by Su(s). These Su(s) targets include cryptic wild-type transcripts from the developmentally regulated Sgs4 and ng1 genes, noncoding RNAs derived from tandemly repeated αβ/αγ elements within an Hsp70 locus, and aberrant transcripts induced by Hsp70 promoter transgenes inserted at ectopic sites. We used the αβ RNAs to investigate the mechanism of Su(s) function and obtained evidence that these transcripts are degraded by the nuclear exosome and that Su(s) promotes this process. Furthermore, we showed that the RNA binding domains of Su(s) are important for this effect and mapped the sequences involved to a 267-nucleotide region of an αβ element. Taken together, these results suggest that Su(s) binds to certain nascent transcripts and stimulates their degradation by the nuclear exosome. PMID:19687295

  9. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants

    PubMed Central

    Barta, Endre; Sebestyén, Endre; Pálfy, Tamás B.; Tóth, Gábor; Ortutay, Csaba P.; Patthy, László

    2005-01-01

    DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21 061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically. PMID:15608291

  10. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants.

    PubMed

    Barta, Endre; Sebestyén, Endre; Pálfy, Tamás B; Tóth, Gábor; Ortutay, Csaba P; Patthy, László

    2005-01-01

    DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21,061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically.

  11. Protein sequences bound to mineral surfaces persist into deep time

    PubMed Central

    Demarchi, Beatrice; Hall, Shaun; Roncal-Herrero, Teresa; Freeman, Colin L; Woolley, Jos; Crisp, Molly K; Wilson, Julie; Fotakis, Anna; Fischer, Roman; Kessler, Benedikt M; Rakownikow Jersie-Christensen, Rosa; Olsen, Jesper V; Haile, James; Thomas, Jessica; Marean, Curtis W; Parkington, John; Presslee, Samantha; Lee-Thorp, Julia; Ditchfield, Peter; Hamilton, Jacqueline F; Ward, Martyn W; Wang, Chunting Michelle; Shaw, Marvin D; Harrison, Terry; Domínguez-Rodrigo, Manuel; MacPhee, Ross DE; Kwekason, Amandus; Ecker, Michaela; Kolska Horwitz, Liora; Chazan, Michael; Kröger, Roland; Thomas-Oates, Jane; Harding, John H; Cappellini, Enrico; Penkman, Kirsty; Collins, Matthew J

    2016-01-01

    Proteins persist longer in the fossil record than DNA, but the longevity, survival mechanisms and substrates remain contested. Here, we demonstrate the role of mineral binding in preserving the protein sequence in ostrich (Struthionidae) eggshell, including from the palaeontological sites of Laetoli (3.8 Ma) and Olduvai Gorge (1.3 Ma) in Tanzania. By tracking protein diagenesis back in time we find consistent patterns of preservation, demonstrating authenticity of the surviving sequences. Molecular dynamics simulations of struthiocalcin-1 and -2, the dominant proteins within the eggshell, reveal that distinct domains bind to the mineral surface. It is the domain with the strongest calculated binding energy to the calcite surface that is selectively preserved. Thermal age calculations demonstrate that the Laetoli and Olduvai peptides are 50 times older than any previously authenticated sequence (equivalent to ~16 Ma at a constant 10°C). DOI: http://dx.doi.org/10.7554/eLife.17092.001 PMID:27668515

  12. Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry.

    PubMed

    Higgins, Sean A; Savage, David F

    2018-01-09

    A fundamental goal of protein biochemistry is to determine the sequence-function relationship, but the vastness of sequence space makes comprehensive evaluation of this landscape difficult. However, advances in DNA synthesis and sequencing now allow researchers to assess the functional impact of every single mutation in many proteins, but challenges remain in library construction and the development of general assays applicable to a diverse range of protein functions. This Perspective briefly outlines the technical innovations in DNA manipulation that allow massively parallel protein biochemistry and then summarizes the methods currently available for library construction and the functional assays of protein variants. Areas in need of future innovation are highlighted with a particular focus on assay development and the use of computational analysis with machine learning to effectively traverse the sequence-function landscape. Finally, applications in the fundamentals of protein biochemistry, disease prediction, and protein engineering are presented.

  13. Epigenetic repression of regulator of G-protein signaling 2 promotes androgen-independent prostate cancer cell growth.

    PubMed

    Wolff, Dennis W; Xie, Yan; Deng, Caishu; Gatalica, Zoran; Yang, Mingjie; Wang, Bo; Wang, Jincheng; Lin, Ming-Fong; Abel, Peter W; Tu, Yaping

    2012-04-01

    G-protein-coupled receptor (GPCR)-stimulated androgen-independent activation of androgen receptor (AR) contributes to acquisition of a hormone-refractory phenotype by prostate cancer. We previously reported that regulator of G-protein signaling (RGS) 2, an inhibitor of GPCRs, inhibits androgen-independent AR activation (Cao et al., Oncogene 2006;25:3719-34). Here, we show reduced RGS2 protein expression in human prostate cancer specimens compared to adjacent normal or hyperplastic tissue. Methylation-specific PCR analysis and bisulfite sequencing indicated that methylation of the CpG island in the RGS2 gene promoter correlated with RGS2 downregulation in prostate cancer. In vitro methylation of this promoter suppressed reporter gene expression in transient transfection studies, whereas reversal of this promoter methylation with 5-aza-2'-deoxycytidine (5-Aza-dC) induced RGS2 reexpression in androgen-independent prostate cancer cells and inhibited their growth under androgen-deficient conditions. Interestingly, the inhibitory effect of 5-Aza-dC was significantly reduced by an RGS2-targeted short hairpin RNA, indicating that reexpressed RGS2 contributed to this growth inhibition. Restoration of RGS2 levels by ectopic expression in androgen-independent prostate cancer cells suppressed growth of xenografts in castrated mice. Thus, RGS2 promoter hypermethylation represses its expression and unmasks a latent pathway for AR transactivation in prostate cancer cells. Targeting this reversible process may provide a new strategy for suppressing prostate cancer progression by reestablishing its androgen sensitivity. Copyright © 2011 UICC.

  14. Sequence- and Interactome-Based Prediction of Viral Protein Hotspots Targeting Host Proteins: A Case Study for HIV Nef

    PubMed Central

    Sarmady, Mahdi; Dampier, William; Tozeren, Aydin

    2011-01-01

    Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk. PMID:21738584

  15. Transcriptional activation of the Escherichia coli adaptive response gene aidB is mediated by binding of methylated Ada protein. Evidence for a new consensus sequence for Ada-binding sites.

    PubMed

    Landini, P; Volkert, M R

    1995-04-07

    The Escherichia coli aidB gene is part of the adaptive response to DNA methylation damage. Genes belonging to the adaptive response are positively regulated by the ada gene; the Ada protein acts as a transcriptional activator when methylated in one of its cysteine residues at position 69. Through DNaseI protection assays, we show that methylated Ada (meAda) is able to bind a DNA sequence between 40 and 60 base pairs upstream of the aidB transcriptional startpoint. Binding of meAda is necessary to activate transcription of the adaptive response genes; accordingly, in vitro transcription of aidB is dependent on the presence of meAda. Unmethylated Ada protein shows no protection against DNaseI digestion in the aidB promoter region nor does it promote aidB in vitro transcription. The aidB Ada-binding site shows only weak homology to the proposed consensus sequences for Ada-binding sites in E. coli (AAANNAA and AAAGCGCA) but shares a higher degree of similarity with the Ada-binding regions from other bacterial species, such as Salmonella typhimurium and Bacillus subtilis. Based on the comparison of five different Ada-dependent promoter regions, we suggest that a possible recognition sequence for meAda might be AATnnnnnnG-CAA. Higher concentrations of Ada are required for the binding of aidB than for the ada promoter, suggesting lower affinity of the protein for the aidB Ada-binding site. Common features in the Ada-binding regions of ada and aidB are a high A/T content, the presence of an inverted repeat structure, and their position relative to the transcriptional start site. We propose that these elements, in addition to the proposed recognition sequence, are important for binding of the Ada protein.

  16. Regulatory sequence of cupin family gene

    DOEpatents

    Hood, Elizabeth; Teoh, Thomas

    2017-07-25

    This invention is in the field of plant biology and agriculture and relates to novel seed specific promoter regions. The present invention further provide methods of producing proteins and other products of interest and methods of controlling expression of nucleic acid sequences of interest using the seed specific promoter regions.

  17. Analysis of an osmotically regulated pathogenesis-related osmotin gene promoter.

    PubMed

    Raghothama, K G; Liu, D; Nelson, D E; Hasegawa, P M; Bressan, R A

    1993-12-01

    Osmotin is a small (24 kDa), basic, pathogenesis-related protein, that accumulates during adaptation of tobacco (Nicotiana tabacum) cells to osmotic stress. There are more than 10 inducers that activate the osmotin gene in various plant tissues. The osmotin promoter contains several sequences bearing a high degree of similarity to ABRE, as-1 and E-8 cis element sequences. Gel retardation studies indicated the presence of at least two regions in the osmotin promoter that show specific interactions with nuclear factors isolated from cultured cells or leaves. The abundance of these binding factors increased in response to salt, ABA and ethylene. Nuclear factors protected a 35 bp sequence of the promoter from DNase I digestion. Different 5' deletions of the osmotin promoter cloned into a promoter-less GUSNOS plasmid (pBI 201) were used in transient expression studies with a Biolistic gun. The transient expression studies revealed the presence of three distinct regions in the osmotin promoter. The promoter sequence from -108 to -248 bp is absolutely required for reporter gene activity, followed by a long stretch (up to -1052) of enhancer-like sequence and then a sequence upstream of -1052, which appears to contain negative elements. The responses to ABA, ethylene, salt, desiccation and wounding appear to be associated with the -248 bp sequence of the promoter. This region also contains a putative ABRE (CACTGTG) core element. Activation of the osmotin gene by various inducers is discussed in view of antifungal activity of the osmotin protein.

  18. Reference System of DNA and Protein Sequences on CD-ROM

    NASA Astrophysics Data System (ADS)

    Nasu, Hisanori; Ito, Toshiaki

    DNASIS-DBREF31 is a database for DNA and Protein sequences in the form of optical Compact Disk (CD) ROM, developed and commercialized by Hitachi Software Engineering Co., Ltd. Both nucleic acid base sequences and protein amino acid sequences can be retrieved from a single CD-ROM. Existing database is offered in the form of on-line service, floppy disks, or magnetic tape, all of which have some problems or other, such as usability or storage capacity. DNASIS-DBREF31 newly adopt a CD-ROM as a database device to realize a mass storage and personal use of the database.

  19. CapZyme-Seq Comprehensively Defines Promoter-Sequence Determinants for RNA 5' Capping with NAD.

    PubMed

    Vvedenskaya, Irina O; Bird, Jeremy G; Zhang, Yuanchao; Zhang, Yu; Jiao, Xinfu; Barvík, Ivan; Krásný, Libor; Kiledjian, Megerditch; Taylor, Deanne M; Ebright, Richard H; Nickels, Bryce E

    2018-05-03

    Nucleoside-containing metabolites such as NAD + can be incorporated as 5' caps on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report CapZyme-seq, a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-seq with multiplexed transcriptomics, we determine efficiencies of NAD + capping by Escherichia coli RNAP for ∼16,000 promoter sequences. The results define preferred transcription start site (TSS) positions for NAD + capping and define a consensus promoter sequence for NAD + capping: HRRASWW (TSS underlined). By applying CapZyme-seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD + -capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs (sRNAs). Our findings define the promoter-sequence determinants for NCIN capping with NAD + and provide a general method for analysis of NCIN capping in vitro and in vivo. Copyright © 2018 Elsevier Inc. All rights reserved.

  20. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  1. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  2. Evidence that a sequence similar to TAR is important for induction of the JC virus late promoter by human immunodeficiency virus type 1 Tat.

    PubMed Central

    Chowdhury, M; Taylor, J P; Chang, C F; Rappaport, J; Khalili, K

    1992-01-01

    A specific RNA sequence located in the leader of all human immunodeficiency virus type 1 (HIV-1) mRNAs termed the transactivation response element, or TAR, is a primary target for induction of HIV-1 long terminal repeat activity by the HIV-1-derived trans-regulatory protein, Tat. Human neurotropic virus, JC virus (JCV), a causative agent of the degenerative demyelinating disease progressive multifocal leukoencephalopathy, contains sequences in the 5' end of the late RNA species with an extensive homology to HIV-1 TAR. In this study, we examined the possible role of the JCV-derived TAR-homologous sequence in Tat-mediated activation of the JCV late promoter (Tada et al., Proc. Natl. Acad. Sci. USA 87:3479-3483, 1990). Results from site-directed mutagenesis revealed that critical G residues required for the function of HIV-1 TAR that are conserved in the JCV TAR homolog play an important role in Tat activation of the JCV promoter. In addition, in vivo competition studies suggest that shared regulatory components mediate Tat activation of the JCV late and HIV-1 long terminal repeat promoters. Furthermore, we showed that the JCV-derived TAR sequence behaves in the same way as HIV-1 TAR in response to two distinct Tat mutants, one of which that has no ability to bind to HIV-1 TAR and another that lacks transcriptional activity on a responsive promoter. These results suggest that the TAR homolog of the JCV late promoter is responsive to HIV-1 Tat induction and thus may participate in the overall activation of the JCV late promoter mediated by this transactivation. Images PMID:1331525

  3. How proteins bind to DNA: target discrimination and dynamic sequence search by the telomeric protein TRF1

    PubMed Central

    2017-01-01

    Abstract Target search as performed by DNA-binding proteins is a complex process, in which multiple factors contribute to both thermodynamic discrimination of the target sequence from overwhelmingly abundant off-target sites and kinetic acceleration of dynamic sequence interrogation. TRF1, the protein that binds to telomeric tandem repeats, faces an intriguing variant of the search problem where target sites are clustered within short fragments of chromosomal DNA. In this study, we use extensive (>0.5 ms in total) MD simulations to study the dynamical aspects of sequence-specific binding of TRF1 at both telomeric and non-cognate DNA. For the first time, we describe the spontaneous formation of a sequence-specific native protein–DNA complex in atomistic detail, and study the mechanism by which proteins avoid off-target binding while retaining high affinity for target sites. Our calculated free energy landscapes reproduce the thermodynamics of sequence-specific binding, while statistical approaches allow for a comprehensive description of intermediate stages of complex formation. PMID:28633355

  4. Cloning and Characterization of an Outer Membrane Protein of Vibrio vulnificus Required for Heme Utilization: Regulation of Expression and Determination of the Gene Sequence

    PubMed Central

    Litwin, Christine M.; Byrne, Burke L.

    1998-01-01

    Vibrio vulnificus is a halophilic, marine pathogen that has been associated with septicemia and serious wound infections in patients with iron overload and preexisting liver disease. For V. vulnificus, the ability to acquire iron from the host has been shown to correlate with virulence. V. vulnificus is able to use host iron sources such as hemoglobin and heme. We previously constructed a fur mutant of V. vulnificus which constitutively expresses at least two iron-regulated outer membrane proteins, of 72 and 77 kDa. The N-terminal amino acid sequence of the 77-kDa protein purified from the V. vulnificus fur mutant had 67% homology with the first 15 amino acids of the mature protein of the Vibrio cholerae heme receptor, HutA. In this report, we describe the cloning, DNA sequence, mutagenesis, and analysis of transcriptional regulation of the structural gene for HupA, the heme receptor of V. vulnificus. DNA sequencing of hupA demonstrated a single open reading frame of 712 amino acids that was 50% identical and 66% similar to the sequence of V. cholerae HutA and similar to those of other TonB-dependent outer membrane receptors. Primer extension analysis localized one promoter for the V. vulnificus hupA gene. Analysis of the promoter region of V. vulnificus hupA showed a sequence homologous to the consensus Fur box. Northern blot analysis showed that the transcript was strongly regulated by iron. An internal deletion in the V. vulnificus hupA gene, done by using marker exchange, resulted in the loss of expression of the 77-kDa protein and the loss of the ability to use hemin or hemoglobin as a source of iron. The hupA deletion mutant of V. vulnificus will be helpful in future studies of the role of heme iron in V. vulnificus pathogenesis. PMID:9632577

  5. De novo protein sequencing by combining top-down and bottom-up tandem mass spectra.

    PubMed

    Liu, Xiaowen; Dekker, Lennard J M; Wu, Si; Vanduijn, Martijn M; Luider, Theo M; Tolić, Nikola; Kou, Qiang; Dvorkin, Mikhail; Alexandrova, Sonya; Vyatkina, Kira; Paša-Tolić, Ljiljana; Pevzner, Pavel A

    2014-07-03

    There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified. High-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass values, boosting the development of top-down MS. Top-down tandem mass spectra cover whole proteins. However, top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on data sets of two proteins showed that TBNovo achieved high sequence coverage and high sequence accuracy.

  6. Comparative analysis of ribosomal protein L5 sequences from bacteria of the genus Thermus.

    PubMed

    Jahn, O; Hartmann, R K; Boeckh, T; Erdmann, V A

    1991-06-01

    The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.

  7. Characterization of regulatory elements within the coat protein (CP) coding region of Tobacco mosaic virus affecting subgenomic transcription and green fluorescent protein expression from the CP subgenomic RNA promoter.

    PubMed

    Man, Michal; Epel, Bernard L

    2004-06-01

    A replicon based on Tobacco mosaic virus that was engineered to express the open reading frame (ORF) of the green fluorescent protein (GFP) gene in place of the native coat protein (CP) gene from a minimal CP subgenomic (sg) RNA promoter was found to accumulate very low levels of GFP. Regulatory regions within the CP ORF were identified that, when presented as untranslated regions flanking the GFP ORF, enhanced or inhibited sg transcription and GFP expression. Full GFP expression from the CP sgRNA promoter required more than the first 20 nt of the CP ORF but not beyond the first 56 nt. Further analysis indicated the presence of an enhancer element between nt +25 and +55 with respect to the CP translation start site. The inclusion of this enhancer sequence upstream of the GFP ORF led to elevated sg transcription and to a 50-fold increase in GFP accumulation in comparison with a minimal CP promoter in which the entire CP ORF was displaced by the GFP ORF. Inclusion of the 3'-terminal 22 nt had a minor positive effect on GFP accumulation, but the addition of extended untranslated sequences from the 3' terminus of the CP ORF downstream of the GFP ORF was basically found to inhibit sg transcription. Secondary structure analysis programs predicted the CP sgRNA promoter to reside within two stable stem-loop structures, which are followed by an enhancer region.

  8. The Homeodomain of PDX-1 Mediates Multiple Protein-Protein Interactions in the Formation of a Transcriptional Activation Complex on the Insulin Promoter

    PubMed Central

    Ohneda, Kinuko; Mirmira, Raghavendra G.; Wang, Juehu; Johnson, Jeffrey D.; German, Michael S.

    2000-01-01

    Activation of insulin gene transcription specifically in the pancreatic β cells depends on multiple nuclear proteins that interact with each other and with sequences on the insulin gene promoter to build a transcriptional activation complex. The homeodomain protein PDX-1 exemplifies such interactions by binding to the A3/4 region of the rat insulin I promoter and activating insulin gene transcription by cooperating with the basic-helix-loop-helix (bHLH) protein E47/Pan1, which binds to the adjacent E2 site. The present study provides evidence that the homeodomain of PDX-1 acts as a protein-protein interaction domain to recruit multiple proteins, including E47/Pan1, BETA2/NeuroD1, and high-mobility group protein I(Y), to an activation complex on the E2A3/4 minienhancer. The transcriptional activity of this complex results from the clustering of multiple activation domains capable of interacting with coactivators and the basal transcriptional machinery. These interactions are not common to all homeodomain proteins: the LIM homeodomain protein Lmx1.1 can also activate the E2A3/4 minienhancer in cooperation with E47/Pan1 but does so through different interactions. Cooperation between Lmx1.1 and E47/Pan1 results not only in the aggregation of multiple activation domains but also in the unmasking of a potent activation domain on E47/Pan1 that is normally silent in non-β cells. While more than one activation complex may be capable of activating insulin gene transcription through the E2A3/4 minienhancer, each is dependent on multiple specific interactions among a unique set of nuclear proteins. PMID:10629047

  9. Protein sequencing via nanopore based devices: a nanofluidics perspective

    NASA Astrophysics Data System (ADS)

    Chinappi, Mauro; Cecconi, Fabio

    2018-05-01

    Proteins perform a huge number of central functions in living organisms, thus all the new techniques allowing their precise, fast and accurate characterization at single-molecule level certainly represent a burst in proteomics with important biomedical impact. In this review, we describe the recent progresses in the developing of nanopore based devices for protein sequencing. We start with a critical analysis of the main technical requirements for nanopore protein sequencing, summarizing some ideas and methodologies that have recently appeared in the literature. In the last sections, we focus on the physical modelling of the transport phenomena occurring in nanopore based devices. The multiscale nature of the problem is discussed and, in this respect, some of the main possible computational approaches are illustrated.

  10. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  11. Optimization of Cry3A yields in Bacillus thuringiensis by use of sporulation-dependent promoters in combination with the STAB-SD mRNA sequence

    Treesearch

    Hyun-woo Park; Baoxue Ge; Leah S. Bauer; Brian A. Federici

    1998-01-01

    The insecticidal activity of Bacillus thuringiensis strains toxic to coleopterous insects is due to Cry3 proteins assembled into small rectangular crystals. Toxin synthesis in these strains is dependent primarily upon a promoter that is active in the stationary phase and a STAB-SD sequence that stabilizes the cry3 transcript-ribosome complex. Here we show that...

  12. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    PubMed

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-04

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. How the Sequence of a Gene Specifies Structural Symmetry in Proteins

    PubMed Central

    Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin

    2015-01-01

    Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668

  14. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    PubMed

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).

  15. Beta.-glucosidase coding sequences and protein from orpinomyces PC-2

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong; Ximenes, Eduardo A.

    2001-02-06

    Provided is a novel .beta.-glucosidase from Orpinomyces sp. PC2, nucleotide sequences encoding the mature protein and the precursor protein, and methods for recombinant production of this .beta.-glucosidase.

  16. A synthetic promoter library for constitutive gene expression in Lactobacillus plantarum.

    PubMed

    Rud, Ida; Jensen, Peter Ruhdal; Naterstad, Kristine; Axelsson, Lars

    2006-04-01

    A synthetic promoter library (SPL) for Lactobacillus plantarum has been developed, which generalizes the approach for obtaining synthetic promoters. The consensus sequence, derived from rRNA promoters extracted from the L. plantarum WCFS1 genome, was kept constant, and the non-consensus sequences were randomized. Construction of the SPL was performed in a vector (pSIP409) previously developed for high-level, inducible gene expression in L. plantarum and Lactobacillus sakei. A wide range of promoter strengths was obtained with the approach, covering 3-4 logs of expression levels in small increments of activity. The SPL was evaluated for the ability to drive beta-glucuronidase (GusA) and aminopeptidase N (PepN) expression. Protein production from the synthetic promoters was constitutive, and the most potent promoters gave high protein production with levels comparable to those of native rRNA promoters, and production of PepN protein corresponding to approximately 10-15 % of the total cellular protein. High correlation was obtained between the activities of promoters when tested in L. sakei and L. plantarum, which indicates the potential of the SPL for other Lactobacillus species. The SPL enables fine-tuning of stable gene expression for various applications in L. plantarum.

  17. Identification of a factor in HeLa cells specific for an upstream transcriptional control sequence of an EIA-inducible adenovirus promoter and its relative abundance in infected and uninfected cells.

    PubMed Central

    SivaRaman, L; Subramanian, S; Thimmappaya, B

    1986-01-01

    Utilizing the gel electrophoresis/DNA binding assay, a factor specific for the upstream transcriptional control sequence of the EIA-inducible adenovirus EIIA-early promoter has been detected in HeLa cell nuclear extract. Analysis of linker-scanning mutants of the promoter by DNA binding assays and methylation-interference experiments show that the factor binds to the 17-nucleotide sequence 5' TGGAGATGACGTAGTTT 3' located between positions -66 and -82 upstream from the cap site. This sequence has been shown to be essential for transcription of this promoter. The EIIA-early-promoter specific factor was found to be present at comparable levels in uninfected HeLa cells and in cells infected with either wild-type adenovirus or the EIA-deletion mutant dl312 under conditions in which the EIA proteins are induced to high levels [7 or 20 hr after infection in the presence of arabinonucleoside (cytosine arabinoside)]. Based on the quantitation in DNA binding assays, it appears that the mechanism of EIA-activated transcription of the EIIA-early promoter does not involve a net change in the amounts of this factor. Images PMID:2942943

  18. Protein sequence comparison based on K-string dictionary.

    PubMed

    Yu, Chenglong; He, Rong L; Yau, Stephen S-T

    2013-10-25

    The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees. © 2013.

  19. Characterization of DNA-protein interactions using high-throughput sequencing data from pulldown experiments

    NASA Astrophysics Data System (ADS)

    Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf

    Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.

  20. The promoter of the pepper pathogen-induced membrane protein gene CaPIMP1 mediates environmental stress responses in plants.

    PubMed

    Hong, Jeum Kyu; Hwang, Byung Kook

    2009-01-01

    The promoter of the pepper pathogen-induced membrane protein gene CaPIMP1 was analyzed by an Agrobacterium-mediated transient expression assay in tobacco leaves. Several stress-related cis-acting elements (GT-1, W-box and ABRE) are located within the CaPIMP1 promoter. In tobacco leaf tissues transiently transformed with a CaPIMP1 promoter-beta-glucuronidase (GUS) gene fusion, serially 5'-deleted CaPIMP1 promoters were differentially activated by Pseudomonas syringae pv. tabaci, ethylene, methyl jasmonate, abscisic acid, and nitric oxide. The -1,193 bp region of the CaPIMP1 gene promoter sequence exhibited full promoter activity. The -417- and -593 bp promoter regions were sufficient for GUS gene activation by ethylene and methyl jasmonate treatments, respectively. However, CaPIMP1 promoter sequences longer than -793 bp were required for promoter activation by abscisic acid and sodium nitroprusside treatments. CaPIMP1 expression was activated in pepper leaves by treatment with ethylene, methyl jasmonate, abscisic acid, beta-amino-n-butyric acid, NaCl, mechanical wounding, and low temperature, but not with salicylic acid. Overexpression of CaPIMP1 in Arabidopsis conferred hypersensitivity to mannitol, NaCl, and ABA during seed germination but not during seedling development. In contrast, transgenic plants overexpressing CaPIMP1 exhibited enhanced tolerance to oxidative stress induced by methyl viologen during germination and early seedling stages. These results suggest that CaPIMP1 expression may alter responsiveness to environmental stress, as well as to pathogen infection.

  1. An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences

    PubMed Central

    Wang, Lei; You, Zhu-Hong; Chen, Xing; Li, Jian-Qiang; Yan, Xin; Zhang, Wei; Huang, Yu-An

    2017-01-01

    Protein–Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use. PMID:28029645

  2. Utilization of RNA polymerase I promoter and terminator sequences to develop a DNA transfection system for the study of hepatitis C virus internal ribosomal entry site-dependent translation.

    PubMed

    Oem, Jae-Ku; Xiang, Zhonghua; Zhou, Yan; Babiuk, Lorne A; Liu, Qiang

    2007-09-01

    Hepatitis C virus (HCV) causes severe liver diseases in a large population worldwide. HCV protein translation is controlled by an internal ribosomal entry site (IRES) within the 5'-untranslated region (UTR). HCV IRES-dependent translation is critical for HCV-associated pathogenesis. To develop a plasmid DNA transfection system by using RNA polymerase I promoter and terminator sequences for studying HCV IRES-dependent translation. A gene cassette containing HCV 5'-UTR, Renilla luciferase reporter gene, and HCV 3'-UTR was inserted between RNA polymerase I promoter and terminator sequences. HCV IRES-directed translation was determined by luciferase assay after transfection. Transfection of the RNA polymerase I-HCV IRES plasmid into human hepatoma Huh-7 and HepG2 cells resulted in luciferase gene expression. Deletion of the IIIf domain in HCV IRES dramatically reduced luciferase activity. Our results indicated that the plasmid vector system-based on RNA polymerase I promoter and terminator sequences represents an effective approach for the study of HCV IRES-dependent translation.

  3. Improving pairwise comparison of protein sequences with domain co-occurrence

    PubMed Central

    Gascuel, Olivier

    2018-01-01

    Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498

  4. Sequence co-evolution gives 3D contacts and structures of protein complexes

    PubMed Central

    Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S

    2014-01-01

    Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213

  5. Top-down analysis of protein samples by de novo sequencing techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vyatkina, Kira; Wu, Si; Dekker, Lennard J. M.

    MOTIVATION: Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS: We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. Themore » former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns.« less

  6. Draft versus finished sequence data for DNA and protein diagnostic signature development

    PubMed Central

    Gardner, Shea N.; Lam, Marisa W.; Smith, Jason R.; Torres, Clinton L.; Slezak, Tom R.

    2005-01-01

    Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. PMID:16243783

  7. Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

    PubMed Central

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513

  8. Role for cis-acting RNA sequences in the temperature-dependent expression of the multiadhesive lig proteins in Leptospira interrogans.

    PubMed

    Matsunaga, James; Schlax, Paula J; Haake, David A

    2013-11-01

    The spirochete Leptospira interrogans causes a systemic infection that provokes a febrile illness. The putative lipoproteins LigA and LigB promote adhesion of Leptospira to host proteins, interfere with coagulation, and capture complement regulators. In this study, we demonstrate that the expression level of the LigA and LigB proteins was substantially higher when L. interrogans proliferated at 37°C instead of the standard culture temperature of 30°C. The RNA comprising the 175-nucleotide 5' untranslated region (UTR) and first six lig codons, whose sequence is identical in ligA and ligB, is predicted to fold into two distinct stem-loop structures separated by a single-stranded region. The ribosome-binding site is partially sequestered in double-stranded RNA within the second structure. Toeprint analysis revealed that in vitro formation of a 30S-tRNA(fMet)-mRNA ternary complex was inhibited unless a 5' deletion mutation disrupted the second stem-loop structure. To determine whether the lig sequence could mediate temperature-regulated gene expression in vivo, the 5' UTR and the first six codons were inserted between the Escherichia coli l-arabinose promoter and bgaB (β-galactosidase from Bacillus stearothermophilus) to create a translational fusion. The lig fragment successfully conferred thermoregulation upon the β-galactosidase reporter in E. coli. The second stem-loop structure was sufficient to confer thermoregulation on the reporter, while sequences further upstream in the 5' UTR slightly diminished expression at each temperature tested. Finally, the expression level of β-galactosidase was significantly higher when point mutations predicted to disrupt base pairs in the second structure were introduced into the stem. Compensatory mutations that maintained base pairing of the stem without restoring the wild-type sequence reinstated the inhibitory effect of the 5' UTR on expression. These results indicate that ligA and ligB expression is limited by double

  9. Relationships between residue Voronoi volume and sequence conservation in proteins.

    PubMed

    Liu, Jen-Wei; Cheng, Chih-Wen; Lin, Yu-Feng; Chen, Shao-Yu; Hwang, Jenn-Kang; Yen, Shih-Chung

    2018-02-01

    Functional and biophysical constraints can cause different levels of sequence conservation in proteins. Previously, structural properties, e.g., relative solvent accessibility (RSA) and packing density of the weighted contact number (WCN), have been found to be related to protein sequence conservation (CS). The Voronoi volume has recently been recognized as a new structural property of the local protein structural environment reflecting CS. However, for surface residues, it is sensitive to water molecules surrounding the protein structure. Herein, we present a simple structural determinant termed the relative space of Voronoi volume (RSV); it uses the Voronoi volume and the van der Waals volume of particular residues to quantify the local structural environment. RSV (range, 0-1) is defined as (Voronoi volume-van der Waals volume)/Voronoi volume of the target residue. The concept of RSV describes the extent of available space for every protein residue. RSV and Voronoi profiles with and without water molecules (RSVw, RSV, VOw, and VO) were compared for 554 non-homologous proteins. RSV (without water) showed better Pearson's correlations with CS than did RSVw, VO, or VOw values. The mean correlation coefficient between RSV and CS was 0.51, which is comparable to the correlation between RSA and CS (0.49) and that between WCN and CS (0.56). RSV is a robust structural descriptor with and without water molecules and can quantitatively reflect evolutionary information in a single protein structure. Therefore, it may represent a practical structural determinant to study protein sequence, structure, and function relationships. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

    PubMed

    Ibrahim, Wisam; Abadeh, Mohammad Saniee

    2017-05-21

    Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Use of signal sequences as an in situ removable sequence element to stimulate protein synthesis in cell-free extracts

    PubMed Central

    Ahn, Jin-Ho; Hwang, Mi-Yeon; Lee, Kyung-Ho; Choi, Cha-Yong; Kim, Dong-Myung

    2007-01-01

    This study developed a method to boost the expression of recombinant proteins in a cell-free protein synthesis system without leaving additional amino acid residues. It was found that the nucleotide sequences of the signal peptides serve as an efficient downstream box to stimulate protein synthesis when they were fused upstream of the target genes. The extent of stimulation was critically affected by the identity of the second codons of the signal sequences. Moreover, the yield of the synthesized protein was enhanced by as much as 10 times in the presence of an optimal second codon. The signal peptides were in situ cleaved and the target proteins were produced in their native sizes by carrying out the cell-free synthesis reactions in the presence of Triton X-100, most likely through the activation of signal peptidase in the S30 extract. The amplification of the template DNA and the addition of the signal sequences were accomplished by PCR. Hence, elevated levels of recombinant proteins were generated within several hours. PMID:17185295

  12. Cube - an online tool for comparison and contrasting of protein sequences.

    PubMed

    Zhang, Zong Hong; Khoo, Aik Aun; Mihalek, Ivana

    2013-01-01

    When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection. http://eopsf.org/cube.

  13. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    PubMed Central

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  14. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz

    2009-12-13

    Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation

  15. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel

  16. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.

  17. Learning cellular sorting pathways using protein interactions and sequence motifs.

    PubMed

    Lin, Tien-Ho; Bar-Joseph, Ziv; Murphy, Robert F

    2011-11-01

    Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.

  18. Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

    PubMed Central

    Lin, Tien-Ho; Bar-Joseph, Ziv

    2011-01-01

    Abstract Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. PMID:21999284

  19. Enhancer activity of Helitron in sericin-1 gene promoter from Bombyx mori.

    PubMed

    Huang, Ke; Li, Chun-Feng; Wu, Jie; Wei, Jun-Hong; Zou, Yong; Han, Min-Jin; Zhou, Ze-Yang

    2016-06-01

    Sericin is a kind of water-soluble protein expressed specifically in the middle silk gland of Bombyx mori. When the sericin-1 gene promoter was cloned and a transgenic vector was constructed to express a foreign protein, a specific Helitron, Bmhel-8, was identified in the sericin-1 gene promoter sequence in some genotypes of Bombyx mori and Bombyx mandarina. Given that the Bmhel-8 Helitron transposon was present only in some genotypes, it could be the source of allelic variation in the sericin-1 promoter. The length of the sericin-1 promoter sequence is approximately 1063 or 643 bp. The larger size of the sequence or allele is ascribed to the presence of Bmhel-8. Silkworm genotypes can be homozygous for either the shorter or larger promoter sequence or heterozygous, containing both alleles. Bmhel-8 in the sericin-1 promoter exhibits enhancer activity, as demonstrated by a dual-luciferase reporter system in BmE cell lines. Furthermore, Bmhel-8 displays enhancer activity in a sericin-1 promoter-driven gene expression system but does not regulate the tissue-specific expression of sericin-1. © 2016 Institute of Zoology, Chinese Academy of Sciences.

  20. Usefulness of heterologous promoters in the Pseudozyma flocculosa gene expression system.

    PubMed

    Avis, Tyler J; Anguenot, Raphaël; Neveu, Bertrand; Bolduc, Sébastien; Zhao, Yingyi; Cheng, Yali; Labbé, Caroline; Belzile, François; Bélanger, Richard R

    2008-02-01

    The basidiomycetous fungus Pseudozyma flocculosa represents a promising new host for the expression of complex recombinant proteins. Two novel heterologous promoter sequences, the Ustilago maydis glyceraldehyde-3-phosphate dehydrogenase (GPD) and Pseudozyma tsukubaensis alpha-glucosidase promoters, were tested for their ability to provide expression in P. flocculosa. In liquid medium, these two promoters produced lower levels of intracellular green fluorescent protein (GFP) as compared to the U. maydis hsp70 promoter. However, GPD and alpha-glucosidase sequences behaved as constitutive promoters whereas the hsp70 promoter appeared to be morphology-dependent. When using the hsp70 promoter, the expression of GFP increased proportionally to the concentration of hygromycin in the culture medium, indicating possible induction of the promoter by the antibiotic. Optimal solid-state culture conditions were designed for high throughput screening of hygromycin-resistant transformants with the hsp70 promoter in P. flocculosa.

  1. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  2. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE PAGES

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; ...

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  3. A protein block based fold recognition method for the annotation of twilight zone sequences.

    PubMed

    Suresh, V; Ganesan, K; Parthasarathy, S

    2013-03-01

    The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.

  4. GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank.

    PubMed

    You, Ronghui; Zhang, Zihan; Xiong, Yi; Sun, Fengzhu; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2018-03-07

    Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have <60% sequence identity to proteins with annotations already. Thus the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins. The key of this method is to extract not only homology information but also diverse, deep- rooted information/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification. The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods. http://datamining-iip.fudan.edu.cn/golabeler. zhusf@fudan.edu.cn. Supplementary data are available at Bioinformatics online.

  5. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

    PubMed

    Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

    2018-01-01

    Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

  6. Determination of the sequences of protein-derived peptides and peptide mixtures by mass spectrometry

    PubMed Central

    Morris, Howard R.; Williams, Dudley H.; Ambler, Richard P.

    1971-01-01

    Micro-quantities of protein-derived peptides have been converted into N-acetylated permethyl derivatives, and their sequences determined by low-resolution mass spectrometry without prior knowledge of their amino acid compositions or lengths. A new strategy is suggested for the mass spectrometric sequencing of oligopeptides or proteins, involving gel filtration of protein hydrolysates and subsequent sequence analysis of peptide mixtures. Finally, results are given that demonstrate for the first time the use of mass spectrometry for the analysis of a protein-derived peptide mixture, again without prior knowledge of the protein or components within the mixture. PMID:5158904

  7. Protein sequences clustering of herpes virus by using Tribe Markov clustering (Tribe-MCL)

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Siswantining, T.; Febriyani, N. L.; Novitasari, I. D.; Cahyaningrum, R. D.

    2017-07-01

    The herpes virus can be found anywhere and one of the important characteristics is its ability to cause acute and chronic infection at certain times so as a result of the infection allows severe complications occurred. The herpes virus is composed of DNA containing protein and wrapped by glycoproteins. In this work, the Herpes viruses family is classified and analyzed by clustering their protein-sequence using Tribe Markov Clustering (Tribe-MCL) algorithm. Tribe-MCL is an efficient clustering method based on the theory of Markov chains, to classify protein families from protein sequences using pre-computed sequence similarity information. We implement the Tribe-MCL algorithm using an open source program of R. We select 24 protein sequences of Herpes virus obtained from NCBI database. The dataset consists of three types of glycoprotein B, F, and H. Each type has eight herpes virus that infected humans. Based on our simulation using different inflation factor r=1.5, 2, 3 we find a various number of the clusters results. The greater the inflation factor the greater the number of their clusters. Each protein will grouped together in the same type of protein.

  8. Ubiquitin promoter-terminator cassette promotes genetically stable expression of the taste-modifying protein miraculin in transgenic lettuce.

    PubMed

    Hirai, Tadayoshi; Shohael, Abdullah Mohammad; Kim, You-Wang; Yano, Megumu; Ezura, Hiroshi

    2011-12-01

    Lettuce is a commercially important leafy vegetable that is cultivated worldwide, and it is also a target crop for plant factories. In this study, lettuce was selected as an alternative platform for recombinant miraculin production because of its fast growth, agronomic value, and wide availability. The taste-modifying protein miraculin is a glycoprotein extracted from the red berries of the West African native shrub Richadella dulcifica. Because of its limited natural availability, many attempts have been made to produce this protein in suitable alternative hosts. We produced transgenic lettuce with miraculin gene driven either by the ubiquitin promoter/terminator cassette from lettuce or a 35S promoter/nos terminator cassette. Miraculin gene expression and miraculin accumulation in both cassettes were compared by quantitative real-time PCR analysis, Western blotting, and enzyme-linked immunosorbent assay. The expression level of the miraculin gene and protein in transgenic lettuce was higher and more genetically stable in the ubiquitin promoter/terminator cassette than in the 35S promoter/nos terminator cassette. These results demonstrated that the ubiquitin promoter/terminator cassette is an efficient platform for the genetically stable expression of the miraculin protein in lettuce and hence this platform is of benefit for recombinant miraculin production on a commercial scale.

  9. Prediction of the aggregation propensity of proteins from the primary sequence: aggregation properties of proteomes.

    PubMed

    Castillo, Virginia; Graña-Montes, Ricardo; Sabate, Raimon; Ventura, Salvador

    2011-06-01

    In the cell, protein folding into stable globular conformations is in competition with aggregation into non-functional and usually toxic structures, since the biophysical properties that promote folding also tend to favor intermolecular contacts, leading to the formation of β-sheet-enriched insoluble assemblies. The formation of protein deposits is linked to at least 20 different human disorders, ranging from dementia to diabetes. Furthermore, protein deposition inside cells represents a major obstacle for the biotechnological production of polypeptides. Importantly, the aggregation behavior of polypeptides appears to be strongly influenced by the intrinsic properties encoded in their sequences and specifically by the presence of selective short regions with high aggregation propensity. This allows computational methods to be used to analyze the aggregation properties of proteins without the previous requirement for structural information. Applications range from the identification of individual amyloidogenic regions in disease-linked polypeptides to the analysis of the aggregation properties of complete proteomes. Herein, we review these theoretical approaches and illustrate how they have become important and useful tools in understanding the molecular mechanisms underlying protein aggregation. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Application of 2D graphic representation of protein sequence based on Huffman tree method.

    PubMed

    Qi, Zhao-Hui; Feng, Jun; Qi, Xiao-Qin; Li, Ling

    2012-05-01

    Based on Huffman tree method, we propose a new 2D graphic representation of protein sequence. This representation can completely avoid loss of information in the transfer of data from a protein sequence to its graphic representation. The method consists of two parts. One is about the 0-1 codes of 20 amino acids by Huffman tree with amino acid frequency. The amino acid frequency is defined as the statistical number of an amino acid in the analyzed protein sequences. The other is about the 2D graphic representation of protein sequence based on the 0-1 codes. Then the applications of the method on ten ND5 genes and seven Escherichia coli strains are presented in detail. The results show that the proposed model may provide us with some new sights to understand the evolution patterns determined from protein sequences and complete genomes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. Knowledge-based computational intelligence development for predicting protein secondary structures from sequences.

    PubMed

    Shen, Hong-Bin; Yi, Dong-Liang; Yao, Li-Xiu; Yang, Jie; Chou, Kuo-Chen

    2008-10-01

    In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.

  12. Molecular cloning and functional characterization of the promoter region of the human uncoupling protein-2 gene.

    PubMed

    Tu, N; Chen, H; Winnikes, U; Reinert, I; Marmann, G; Pirke, K M; Lentes, K U

    1999-11-19

    As a member of the uncoupling protein family, UCP2 is ubiquitously expressed in rodents and humans, implicating a major role in thermogenesis. To analyze promoter function and regulatory motifs involved in the transcriptional regulation of UCP2 gene expression, 3.3 kb of 5'-flanking region of the human UCP2 (hUCP2) gene have been cloned. Sequence analysis showed that the promoter region of hUCP2 lacks a classical TATA or CAAT box, however, appeared GC-rich resulting in the presence of several Sp-1 motifs and Ap-1/-2 binding sites near the transcription initiation site. Functional characterization of human UCP2 promoter-CAT fusion constructs in transient expression assays showed that minimal promoter activity was observed within 65 bp upstream of the transcriptional start site (+1). 75 bp further upstream (from nt -141 to -66) a strong cis-acting regulatory element (or enhancer) was identified, which significantly enhanced basal promoter activity. The regulation of human UCP2 gene expression involves complex interactions among positive and negative regulatory elements distributed over a minimum of 3.3 kb of the promoter region. Copyright 1999 Academic Press.

  13. The plant G box promoter sequence activates transcription in Saccharomyces cerevisiae and is bound in vitro by a yeast activity similar to GBF, the plant G box binding factor.

    PubMed Central

    Donald, R G; Schindler, U; Batschauer, A; Cashmore, A R

    1990-01-01

    G box and I box sequences of the Arabidopsis thaliana ribulose-bisphosphate-1,5-carboxylase small subunit (RBCS) promoter are required for expression mediated by the Arabidopsis rbcS-1A promoter in transgenic tobacco plants and are bound in vitro by factors from plant nuclear extracts termed GBF and GA-1, respectively. We show here that a -390 to -60 rbcS-1A promoter fragment containing the G box and two I boxes activates transcription from a truncated iso-1-cytochrome c (CYC1) gene promoter in Saccharomyces cerevisiae. Mutagenesis of either the rbcS-1A G box or both I box sequences eliminated the expression mediated by this fragment. When polymerized, I box oligonucleotides were also capable of enhancing expression from the truncated CYC1 promoter. Single-copy G box sequences from the Arabidopsis rbcS-1A, Arabidopsis Adh and tomato rbcS-3A promoters were more potent activators and were used in mobility shift assays to identify a DNA binding activity in yeast functionally similar to GBF. In methylation interference experiments, the binding specificity of the yeast protein was indistinguishable from that obtained with plant nuclear extracts. Images Fig. 3. Fig. 4. Fig. 5. Fig. 6. PMID:2161333

  14. Exploring the sequence-structure protein landscape in the glycosyltransferase family

    PubMed Central

    Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin

    2003-01-01

    To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887

  15. Sequence composition and environment effects on residue fluctuations in protein structures

    NASA Astrophysics Data System (ADS)

    Ruvinsky, Anatoly M.; Vakser, Ilya A.

    2010-10-01

    Structure fluctuations in proteins affect a broad range of cell phenomena, including stability of proteins and their fragments, allosteric transitions, and energy transfer. This study presents a statistical-thermodynamic analysis of relationship between the sequence composition and the distribution of residue fluctuations in protein-protein complexes. A one-node-per-residue elastic network model accounting for the nonhomogeneous protein mass distribution and the interatomic interactions through the renormalized inter-residue potential is developed. Two factors, a protein mass distribution and a residue environment, were found to determine the scale of residue fluctuations. Surface residues undergo larger fluctuations than core residues in agreement with experimental observations. Ranking residues over the normalized scale of fluctuations yields a distinct classification of amino acids into three groups: (i) highly fluctuating-Gly, Ala, Ser, Pro, and Asp, (ii) moderately fluctuating-Thr, Asn, Gln, Lys, Glu, Arg, Val, and Cys, and (iii) weakly fluctuating-Ile, Leu, Met, Phe, Tyr, Trp, and His. The structural instability in proteins possibly relates to the high content of the highly fluctuating residues and a deficiency of the weakly fluctuating residues in irregular secondary structure elements (loops), chameleon sequences, and disordered proteins. Strong correlation between residue fluctuations and the sequence composition of protein loops supports this hypothesis. Comparing fluctuations of binding site residues (interface residues) with other surface residues shows that, on average, the interface is more rigid than the rest of the protein surface and Gly, Ala, Ser, Cys, Leu, and Trp have a propensity to form more stable docking patches on the interface. The findings have broad implications for understanding mechanisms of protein association and stability of protein structures.

  16. Sequence Determinants of Compaction in Intrinsically Disordered Proteins

    PubMed Central

    Marsh, Joseph A.; Forman-Kay, Julie D.

    2010-01-01

    Abstract Intrinsically disordered proteins (IDPs), which lack folded structure and are disordered under nondenaturing conditions, have been shown to perform important functions in a large number of cellular processes. These proteins have interesting structural properties that deviate from the random-coil-like behavior exhibited by chemically denatured proteins. In particular, IDPs are often observed to exhibit significant compaction. In this study, we have analyzed the hydrodynamic radii of a number of IDPs to investigate the sequence determinants of this compaction. Net charge and proline content are observed to be strongly correlated with increased hydrodynamic radii, suggesting that these are the dominant contributors to compaction. Hydrophobicity and secondary structure, on the other hand, appear to have negligible effects on compaction, which implies that the determinants of structure in folded and intrinsically disordered proteins are profoundly different. Finally, we observe that polyhistidine tags seem to increase IDP compaction, which suggests that these tags have significant perturbing effects and thus should be removed before any structural characterizations of IDPs. Using the relationships observed in this analysis, we have developed a sequence-based predictor of hydrodynamic radius for IDPs that shows substantial improvement over a simple model based upon chain length alone. PMID:20483348

  17. PROFESS: a PROtein Function, Evolution, Structure and Sequence database

    PubMed Central

    Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

    2010-01-01

    The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718

  18. Hsp90 shapes protein and RNA evolution to balance trade-offs between protein stability and aggregation.

    PubMed

    Geller, Ron; Pechmann, Sebastian; Acevedo, Ashley; Andino, Raul; Frydman, Judith

    2018-05-03

    Acquisition of mutations is central to evolution; however, the detrimental effects of most mutations on protein folding and stability limit protein evolvability. Molecular chaperones, which suppress aggregation and facilitate polypeptide folding, may alleviate the effects of destabilizing mutations thus promoting sequence diversification. To illuminate how chaperones can influence protein evolution, we examined the effect of reduced activity of the chaperone Hsp90 on poliovirus evolution. We find that Hsp90 offsets evolutionary trade-offs between protein stability and aggregation. Lower chaperone levels favor variants of reduced hydrophobicity and protein aggregation propensity but at a cost to protein stability. Notably, reducing Hsp90 activity also promotes clusters of codon-deoptimized synonymous mutations at inter-domain boundaries, likely to facilitate cotranslational domain folding. Our results reveal how a chaperone can shape the sequence landscape at both the protein and RNA levels to harmonize competing constraints posed by protein stability, aggregation propensity, and translation rate on successful protein biogenesis.

  19. Regulation of the yeast RAD2 gene: DNA damage-dependent induction correlates with protein binding to regulatory sequences and their deletion influences survival.

    PubMed

    Siede, W; Friedberg, E C

    1992-03-01

    In the yeast Saccharomyces cerevisiae the RAD2 gene is absolutely required for damage-specific incision of DNA during nucleotide excision repair and is inducible by DNA-damaging agents. In the present study we correlated sensitivity to killing by DNA-damaging agents with the deletion of previously defined specific promoter elements. Deletion of the element DRE2 increased the UV sensitivity of cells in both the G1/early S and S/G2 phases of the cell cycle as well as in stationary phase. On the other hand, increased UV sensitivity associated with deletion of the sequence-related element DRE1 was restricted to cells irradiated in G1/S. Specific binding of protein(s) to the promoter elements DRE1 and DRE2 was observed under non-inducing conditions using gel retardation assays. Exposure of cells to DNA-damaging agents resulted in increased protein binding that was dependent on de novo protein synthesis.

  20. Genomic sequencing and in vivo footprinting of an expression-specific DNase I-hypersensitive site of avian vitellogenin II promoter reveal a demethylation of a mCpG and a change in specific interactions of proteins with DNA.

    PubMed Central

    Saluz, H P; Feavers, I M; Jiricny, J; Jost, J P

    1988-01-01

    Genomic sequencing was used to study the in vivo methylation pattern of two CpG sites in the promoter region of the avian vitellogenin gene. The CpG at position +10 was fully methylated in DNA isolated from tissues that do not express the gene but was unmethylated in the liver of mature hens and estradiol-treated roosters. In the latter tissue, this site became demethylated and DNase I hypersensitive after estradiol treatment. A second CpG (position -52) was unmethylated in all tissues examined. In vivo genomic footprinting with dimethyl sulfate revealed different patterns of DNA protection in silent and expressed genes. In rooster liver cells, at least 10 base pairs of DNA, including the methylated CpG, were protected by protein(s). Gel-shift assays indicated that a protein factor, present in rooster liver nuclear extract, bound at this site only when it was methylated. In hen liver cells, the same unmethylated CpG lies within a protected region of approximately equal to 20 base pairs. In vitro DNase I protection and gel-shift assays indicate that this sequence is bound by a protein, which binds both double- and single-stranded DNA. For the latter substrate, this factor was shown to bind solely the noncoding (i.e., mRNA-like) strand. Images PMID:3413118

  1. Human T-cell leukemia virus type 1 Tax requires direct access to DNA for recruitment of CREB binding protein to the viral promoter.

    PubMed

    Lenzmeier, B A; Giebler, H A; Nyborg, J K

    1998-02-01

    Efficient human T-cell leukemia virus type 1 (HTLV-1) replication and viral gene expression are dependent upon the virally encoded oncoprotein Tax. To activate HTLV-1 transcription, Tax interacts with the cellular DNA binding protein cyclic AMP-responsive element binding protein (CREB) and recruits the coactivator CREB binding protein (CBP), forming a nucleoprotein complex on the three viral cyclic AMP-responsive elements (CREs) in the HTLV-1 promoter. Short stretches of dG-dC-rich (GC-rich) DNA, immediately flanking each of the viral CREs, are essential for Tax recruitment of CBP in vitro and Tax transactivation in vivo. Although the importance of the viral CRE-flanking sequences is well established, several studies have failed to identify an interaction between Tax and the DNA. The mechanistic role of the viral CRE-flanking sequences has therefore remained enigmatic. In this study, we used high resolution methidiumpropyl-EDTA iron(II) footprinting to show that Tax extended the CREB footprint into the GC-rich DNA flanking sequences of the viral CRE. The Tax-CREB footprint was enhanced but not extended by the KIX domain of CBP, suggesting that the coactivator increased the stability of the nucleoprotein complex. Conversely, the footprint pattern of CREB on a cellular CRE lacking GC-rich flanking sequences did not change in the presence of Tax or Tax plus KIX. The minor-groove DNA binding drug chromomycin A3 bound to the GC-rich flanking sequences and inhibited the association of Tax and the Tax-CBP complex without affecting CREB binding. Tax specifically cross-linked to the viral CRE in the 5'-flanking sequence, and this cross-link was blocked by chromomycin A3. Together, these data support a model where Tax interacts directly with both CREB and the minor-groove viral CRE-flanking sequences to form a high-affinity binding site for the recruitment of CBP to the HTLV-1 promoter.

  2. Integrated analysis of RNA-binding protein complexes using in vitro selection and high-throughput sequencing and sequence specificity landscapes (SEQRS).

    PubMed

    Lou, Tzu-Fang; Weidmann, Chase A; Killingsworth, Jordan; Tanaka Hall, Traci M; Goldstrohm, Aaron C; Campbell, Zachary T

    2017-04-15

    RNA-binding proteins (RBPs) collaborate to control virtually every aspect of RNA function. Tremendous progress has been made in the area of global assessment of RBP specificity using next-generation sequencing approaches both in vivo and in vitro. Understanding how protein-protein interactions enable precise combinatorial regulation of RNA remains a significant problem. Addressing this challenge requires tools that can quantitatively determine the specificities of both individual proteins and multimeric complexes in an unbiased and comprehensive way. One approach utilizes in vitro selection, high-throughput sequencing, and sequence-specificity landscapes (SEQRS). We outline a SEQRS experiment focused on obtaining the specificity of a multi-protein complex between Drosophila RBPs Pumilio (Pum) and Nanos (Nos). We discuss the necessary controls in this type of experiment and examine how the resulting data can be complemented with structural and cell-based reporter assays. Additionally, SEQRS data can be integrated with functional genomics data to uncover biological function. Finally, we propose extensions of the technique that will enhance our understanding of multi-protein regulatory complexes assembled onto RNA. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Distinct Sequence Elements of Cyclin B1 Promote Localization to Chromatin, Centrosomes, and Kinetochores during Mitosis

    PubMed Central

    Bentley, Anna M.; Normand, Guillaume; Hoyt, Jonathan

    2007-01-01

    The mitotic cyclins promote cell division by binding and activating cyclin-dependent kinases (CDKs). Each cyclin has a unique pattern of subcellular localization that plays a vital role in regulating cell division. During mitosis, cyclin B1 is known to localize to centrosomes, microtubules, and chromatin. To determine the mechanisms of cyclin B1 localization in M phase, we imaged full-length and mutant versions of human cyclin B1-enhanced green fluorescent protein in live cells by using spinning disk confocal microscopy. In addition to centrosome, microtubule, and chromatin localization, we found that cyclin B1 also localizes to unattached kinetochores after nuclear envelope breakdown. Kinetochore recruitment of cyclin B1 required the kinetochore proteins Hec1 and Mad2, and it was stimulated by microtubule destabilization. Mutagenesis studies revealed that cyclin B1 is recruited to kinetochores through both CDK1-dependent and -independent mechanisms. In contrast, localization of cyclin B1 to chromatin and centrosomes is independent of CDK1 binding. The N-terminal domain of cyclin B1 is necessary and sufficient for chromatin association, whereas centrosome recruitment relies on sequences within the cyclin box. Our data support a role for cyclin B1 function at unattached kinetochores, and they demonstrate that separable and distinct sequence elements target cyclin B1 to kinetochores, chromatin, and centrosomes during mitosis. PMID:17881737

  4. Algorithm to find distant repeats in a single protein sequence

    PubMed Central

    Banerjee, Nirjhar; Sarani, Rangarajan; Ranjani, Chellamuthu Vasuki; Sowmiya, Govindaraj; Michael, Daliah; Balakrishnan, Narayanasamy; Sekar, Kanagaraj

    2008-01-01

    Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies. PMID:19052663

  5. Sequence and structural implications of a bovine corneal keratan sulfate proteoglycan core protein. Protein 37B represents bovine lumican and proteins 37A and 25 are unique

    NASA Technical Reports Server (NTRS)

    Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

    1993-01-01

    Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.

  6. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  7. Collagen peptide-based biomaterials for protein delivery and peptide-promoted self-assembly of gold nanoparticles

    NASA Astrophysics Data System (ADS)

    Ernenwein, Dawn M.

    2011-12-01

    Bottom-up self-assembly of peptides has driven the research progress for the following two projects: protein delivery vehicles of collagen microflorettes and the assembly of gold nanoparticles with coiled-coil peptides. Collagen is the most abundant protein in the mammals yet due to immunogenic responses, batch-to-batch variability and lack of sequence modifications, synthetic collagen has been designed to self-assemble into native collagen-like structures. In particular with this research, metal binding ligands were incorporated on the termini of collagen-like peptides to generate micron-sized particles, microflorettes. The over-arching goal of the first research project is to engineer MRI-active microflorettes, loaded with His-tagged growth factors with differential release rates while bound to stem cells that can be implemented toward regenerative cell-based therapies. His-tagged proteins, such as green fluorescent protein, have successfully been incorporated on the surface and throughout the microflorettes. Protein release was monitored under physiological conditions and was related to particle degradation. In human plasma full release was obtained within six days. Stability of the microflorettes under physiological conditions was also examined for the development of a therapeutically relevant delivery agent. Additionally, MRI active microflorettes have been generated through the incorporation of a gadolinium binding ligand, DOTA within the collagen-based peptide sequence. To probe peptide-promoted self-assemblies of gold nanoparticles (GNPs) by non-covalent, charge complementary interactions, a highly anionic coiled-coil peptide was designed and synthesized. Upon formation of peptide-GNP interactions, the hydrophobic domain of the coiled-coil were shown to promote the self-assembly of peptide-GNPs clustering. Hydrophobic forces were found to play an important role in the assembly process, as a peptide with an equally overall negative charge, but lacking an

  8. Prediction of the translocon-mediated membrane insertion free energies of protein sequences.

    PubMed

    Park, Yungki; Helms, Volkhard

    2008-05-15

    Helical membrane proteins (HMPs) play crucial roles in a variety of cellular processes. Unlike water-soluble proteins, HMPs need not only to fold but also get inserted into the membrane to be fully functional. This process of membrane insertion is mediated by the translocon complex. Thus, it is of great interest to develop computational methods for predicting the translocon-mediated membrane insertion free energies of protein sequences. We have developed Membrane Insertion (MINS), a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark test gives a correlation coefficient of 0.74 between predicted and observed free energies for 357 known cases, which corresponds to a mean unsigned error of 0.41 kcal/mol. These results are significantly better than those obtained by traditional hydropathy analysis. Moreover, the ability of MINS to reasonably predict membrane insertion free energies of protein sequences allows for effective identification of transmembrane (TM) segments. Subsequently, MINS was applied to predict the membrane insertion free energies of 316 TM segments found in known structures. An in-depth analysis of the predicted free energies reveals a number of interesting findings about the biogenesis and structural stability of HMPs. A web server for MINS is available at http://service.bioinformatik.uni-saarland.de/mins

  9. A 3D sequence-independent representation of the protein data bank.

    PubMed

    Fischer, D; Tsai, C J; Nussinov, R; Wolfson, H

    1995-10-01

    Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The

  10. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software.

    PubMed

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-03

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  11. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software

    NASA Astrophysics Data System (ADS)

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-01

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  12. C-Terminal DxD-Containing Sequences within Paramyxovirus Nucleocapsid Proteins Determine Matrix Protein Compatibility and Can Direct Foreign Proteins into Budding Particles

    PubMed Central

    Ray, Greeshma; Schmitt, Phuong Tieu

    2016-01-01

    ABSTRACT Paramyxovirus particles are formed by a budding process coordinated by viral matrix (M) proteins. M proteins coalesce at sites underlying infected cell membranes and induce other viral components, including viral glycoproteins and viral ribonucleoprotein complexes (vRNPs), to assemble at these locations from which particles bud. M proteins interact with the nucleocapsid (NP or N) components of vRNPs, and these interactions enable production of infectious, genome-containing virions. For the paramyxoviruses parainfluenza virus 5 (PIV5) and mumps virus, M-NP interaction also contributes to efficient production of virus-like particles (VLPs) in transfected cells. A DLD sequence near the C-terminal end of PIV5 NP protein was previously found to be necessary for M-NP interaction and efficient VLP production. Here, we demonstrate that 15-residue-long, DLD-containing sequences derived from either the PIV5 or Nipah virus nucleocapsid protein C-terminal ends are sufficient to direct packaging of a foreign protein, Renilla luciferase, into budding VLPs. Mumps virus NP protein harbors DWD in place of the DLD sequence found in PIV5 NP protein, and consequently, PIV5 NP protein is incompatible with mumps virus M protein. A single amino acid change converting DLD to DWD within PIV5 NP protein induced compatibility between these proteins and allowed efficient production of mumps VLPs. Our data suggest a model in which paramyxoviruses share an overall common strategy for directing M-NP interactions but with important variations contained within DLD-like sequences that play key roles in defining M/NP protein compatibilities. IMPORTANCE Paramyxoviruses are responsible for a wide range of diseases that affect both humans and animals. Paramyxovirus pathogens include measles virus, mumps virus, human respiratory syncytial virus, and the zoonotic paramyxoviruses Nipah virus and Hendra virus. Infectivity of paramyxovirus particles depends on matrix-nucleocapsid protein

  13. C-Terminal DxD-Containing Sequences within Paramyxovirus Nucleocapsid Proteins Determine Matrix Protein Compatibility and Can Direct Foreign Proteins into Budding Particles.

    PubMed

    Ray, Greeshma; Schmitt, Phuong Tieu; Schmitt, Anthony P

    2016-01-20

    Paramyxovirus particles are formed by a budding process coordinated by viral matrix (M) proteins. M proteins coalesce at sites underlying infected cell membranes and induce other viral components, including viral glycoproteins and viral ribonucleoprotein complexes (vRNPs), to assemble at these locations from which particles bud. M proteins interact with the nucleocapsid (NP or N) components of vRNPs, and these interactions enable production of infectious, genome-containing virions. For the paramyxoviruses parainfluenza virus 5 (PIV5) and mumps virus, M-NP interaction also contributes to efficient production of virus-like particles (VLPs) in transfected cells. A DLD sequence near the C-terminal end of PIV5 NP protein was previously found to be necessary for M-NP interaction and efficient VLP production. Here, we demonstrate that 15-residue-long, DLD-containing sequences derived from either the PIV5 or Nipah virus nucleocapsid protein C-terminal ends are sufficient to direct packaging of a foreign protein, Renilla luciferase, into budding VLPs. Mumps virus NP protein harbors DWD in place of the DLD sequence found in PIV5 NP protein, and consequently, PIV5 NP protein is incompatible with mumps virus M protein. A single amino acid change converting DLD to DWD within PIV5 NP protein induced compatibility between these proteins and allowed efficient production of mumps VLPs. Our data suggest a model in which paramyxoviruses share an overall common strategy for directing M-NP interactions but with important variations contained within DLD-like sequences that play key roles in defining M/NP protein compatibilities. Paramyxoviruses are responsible for a wide range of diseases that affect both humans and animals. Paramyxovirus pathogens include measles virus, mumps virus, human respiratory syncytial virus, and the zoonotic paramyxoviruses Nipah virus and Hendra virus. Infectivity of paramyxovirus particles depends on matrix-nucleocapsid protein interactions which enable

  14. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins.

    PubMed

    Firman, Taylor; Ghosh, Kingshuk

    2018-03-28

    We present an analytical theory to compute conformations of heteropolymers-applicable to describe disordered proteins-as a function of temperature and charge sequence. The theory describes coil-globule transition for a given protein sequence when temperature is varied and has been benchmarked against the all-atom Monte Carlo simulation (using CAMPARI) of intrinsically disordered proteins (IDPs). In addition, the model quantitatively shows how subtle alterations of charge placement in the primary sequence-while maintaining the same charge composition-can lead to significant changes in conformation, even as drastic as a coil (swelled above a purely random coil) to globule (collapsed below a random coil) and vice versa. The theory provides insights on how to control (enhance or suppress) these changes by tuning the temperature (or solution condition) and charge decoration. As an application, we predict the distribution of conformations (at room temperature) of all naturally occurring IDPs in the DisProt database and notice significant size variation even among IDPs with a similar composition of positive and negative charges. Based on this, we provide a new diagram-of-states delineating the sequence-conformation relation for proteins in the DisProt database. Next, we study the effect of post-translational modification, e.g., phosphorylation, on IDP conformations. Modifications as little as two-site phosphorylation can significantly alter the size of an IDP with everything else being constant (temperature, salt concentration, etc.). However, not all possible modification sites have the same effect on protein conformations; there are certain "hot spots" that can cause maximal change in conformation. The location of these "hot spots" in the parent sequence can readily be identified by using a sequence charge decoration metric originally introduced by Sawle and Ghosh. The ability of our model to predict conformations (both expanded and collapsed states) of IDPs at a high

  15. Promoter analysis of the membrane protein gp64 gene of the cellular slime mold Polysphondylium pallidum.

    PubMed

    Takaoka, N; Fukuzawa, M; Saito, T; Sakaitani, T; Ochiai, H

    1999-10-28

    We cloned a genomic fragment of the membrane protein gp64 gene of the cellular slime mold Polysphondylium pallidum by inverse PCR. Primer extension analysis identified a major transcription start site 65 bp upstream of the translation start codon. The promoter region of the gp64 gene contains sequences homologous to a TATA box at position -47 to -37 and to an initiator (Inr, PyPyCAPyPyPyPy) at position -3 to +5 from the transcription start site. Successively truncated segments of the promoter were tested for their ability to drive expression of the beta-galactosidase reporter gene in transformed cells; also the difference in activity between growth conditions was compared. The results indicated that there are two positive vegetative regulatory elements extending between -187 and -62 bp from the transcription start site of the gp64 promoter; also their activity was two to three times higher in the cells grown with bacteria in shaken suspension than in the cells grown in an axenic medium.

  16. Prediction of protein tertiary structure from sequences using a very large back-propagation neural network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, X.; Wilcox, G.L.

    1993-12-31

    We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences.more » Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.« less

  17. Sequence heuristics to encode phase behaviour in intrinsically disordered protein polymers

    PubMed Central

    Quiroz, Felipe García; Chilkoti, Ashutosh

    2015-01-01

    Proteins and synthetic polymers that undergo aqueous phase transitions mediate self-assembly in nature and in man-made material systems. Yet little is known about how the phase behaviour of a protein is encoded in its amino acid sequence. Here, by synthesizing intrinsically disordered, repeat proteins to test motifs that we hypothesized would encode phase behaviour, we show that the proteins can be designed to exhibit tunable lower or upper critical solution temperature (LCST and UCST, respectively) transitions in physiological solutions. We also show that mutation of key residues at the repeat level abolishes phase behaviour or encodes an orthogonal transition. Furthermore, we provide heuristics to identify, at the proteome level, proteins that might exhibit phase behaviour and to design novel protein polymers consisting of biologically active peptide repeats that exhibit LCST or UCST transitions. These findings set the foundation for the prediction and encoding of phase behaviour at the sequence level. PMID:26390327

  18. Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes

    NASA Astrophysics Data System (ADS)

    Weigt, Martin

    Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C

  19. Promoter Recognition by Extracytoplasmic Function σ Factors: Analyzing DNA and Protein Interaction Motifs

    PubMed Central

    Guzina, Jelena

    2016-01-01

    ABSTRACT Extracytoplasmic function (ECF) σ factors are the largest and the most diverse group of alternative σ factors, but their mechanisms of transcription are poorly studied. This subfamily is considered to exhibit a rigid promoter structure and an absence of mixing and matching; both −35 and −10 elements are considered necessary for initiating transcription. This paradigm, however, is based on very limited data, which bias the analysis of diverse ECF σ subgroups. Here we investigate DNA and protein recognition motifs involved in ECF σ factor transcription by a computational analysis of canonical ECF subfamily members, much less studied ECF σ subgroups, and the group outliers, obtained from recently sequenced bacteriophages. The analysis identifies an extended −10 element in promoters for phage ECF σ factors; a comparison with bacterial σ factors points to a putative 6-amino-acid motif just C-terminal of domain σ2, which is responsible for the interaction with the identified extension of the −10 element. Interestingly, a similar protein motif is found C-terminal of domain σ2 in canonical ECF σ factors, at a position where it is expected to interact with a conserved motif further upstream of the −10 element. Moreover, the phiEco32 ECF σ factor lacks a recognizable −35 element and σ4 domain, which we identify in a homologous phage, 7-11, indicating that the extended −10 element can compensate for the lack of −35 element interactions. Overall, the results reveal greater flexibility in promoter recognition by ECF σ factors than previously recognized and raise the possibility that mixing and matching also apply to this group, a notion that remains to be biochemically tested. IMPORTANCE ECF σ factors are the most numerous group of alternative σ factors but have been little studied. Their promoter recognition mechanisms are obscured by the large diversity within the ECF σ factor group and the limited similarity with the well

  20. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

    PubMed

    Pang, Erli; Wu, Xiaomei; Lin, Kui

    2016-06-01

    Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.

  1. Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

    PubMed

    Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

    2016-09-01

    S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.

  2. Biophysical models of protein evolution: Understanding the patterns of evolutionary sequence divergence

    PubMed Central

    Echave, Julian; Wilke, Claus O.

    2018-01-01

    For decades, rates of protein evolution have been interpreted in terms of the vague concept of “functional importance”. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating them has large impacts on protein structure and stability. Here, we review the studies of the emergent field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field. PMID:28301766

  3. Sequence specificity of single-stranded DNA-binding proteins: a novel DNA microarray approach

    PubMed Central

    Morgan, Hugh P.; Estibeiro, Peter; Wear, Martin A.; Max, Klaas E.A.; Heinemann, Udo; Cubeddu, Liza; Gallagher, Maurice P.; Sadler, Peter J.; Walkinshaw, Malcolm D.

    2007-01-01

    We have developed a novel DNA microarray-based approach for identification of the sequence-specificity of single-stranded nucleic-acid-binding proteins (SNABPs). For verification, we have shown that the major cold shock protein (CspB) from Bacillus subtilis binds with high affinity to pyrimidine-rich sequences, with a binding preference for the consensus sequence, 5′-GTCTTTG/T-3′. The sequence was modelled onto the known structure of CspB and a cytosine-binding pocket was identified, which explains the strong preference for a cytosine base at position 3. This microarray method offers a rapid high-throughput approach for determining the specificity and strength of ss DNA–protein interactions. Further screening of this newly emerging family of transcription factors will help provide an insight into their cellular function. PMID:17488853

  4. Identification of the protein sequence of the type III effector XopD from the B100 strain of Xanthomonas campestris pv campestris

    PubMed Central

    Canonne, Joanne; Pichereaux, Carole; Marino, Daniel; Roby, Dominique; Rossignol, Michel; Rivas, Susana

    2012-01-01

    During evolution, pathogens have developed sophisticated strategies to suppress plant defense responses and promote successful colonization of their hosts. In their attempt to quell host resistance, Gram-negative phytopathogenic bacteria inject type III effectors (T3Es) into plant cells, where they typically target plant components essential for the establishment of defense responses. We have recently shown that the XopD T3E from the strain B100 of Xanthomonas campestris pathovar campestris (XopDXccB100) is able to target AtMYB30, a positive regulator of Arabidopsis defense responses. This protein interaction leads to inhibition of AtMYB30 transcriptional activity and promotion of bacterial virulence. Here, we describe the identification of the complete protein sequence of XopDXccB100, which presents an N-terminal extension of 40 amino acids with respect to the protein annotated in public databases. The implications of this finding are discussed. PMID:22353870

  5. Dr. Sanger's Apprentice: A Computer-Aided Instruction to Protein Sequencing.

    ERIC Educational Resources Information Center

    Schmidt, Thomas G.; Place, Allen R.

    1985-01-01

    Modeled after the program "Mastermind," this program teaches students the art of protein sequencing. The program (written in Turbo Pascal for the IBM PC, requiring 128K, a graphics adapter, and an 8070 mathematics coprocessor) generates a polypeptide whose sequence and length can be user-defined (for practice) or computer-generated (for…

  6. Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

    PubMed

    Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A

    1997-01-01

    SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.

  7. Non-B-DNA structures on the interferon-beta promoter?

    PubMed

    Robbe, K; Bonnefoy, E

    1998-01-01

    The high mobility group (HMG) I protein intervenes as an essential factor during the virus induced expression of the interferon-beta (IFN-beta) gene. It is a non-histone chromatine associated protein that has the dual capacity of binding to a non-B-DNA structure such as cruciform-DNA as well as to AT rich B-DNA sequences. In this work we compare the binding affinity of HMGI for a synthetic cruciform-DNA to its binding affinity for the HMGI-binding-site present in the positive regulatory domain II (PRDII) of the IFN-beta promoter. Using gel retardation experiments, we show that HMGI protein binds with at least ten times more affinity to the synthetic cruciform-DNA structure than to the PRDII B-DNA sequence. DNA hairpin sequences are present in both the human and the murine PRDII-DNAs. We discuss in this work the presence of, yet putative, non-B-DNA structures in the IFN-beta promoter.

  8. Full trans-activation mediated by the immediate-early protein of equine herpesvirus 1 requires a consensus TATA box, but not its cognate binding sequence.

    PubMed

    Kim, Seong K; Shakya, Akhalesh K; O'Callaghan, Dennis J

    2016-01-04

    The immediate-early protein (IEP) of equine herpesvirus 1 (EHV-1) has extensive homology to the IEP of alphaherpesviruses and possesses domains essential for trans-activation, including an acidic trans-activation domain (TAD) and binding domains for DNA, TFIIB, and TBP. Our data showed that the IEP directly interacted with transcription factor TFIIA, which is known to stabilize the binding of TBP and TFIID to the TATA box of core promoters. When the TATA box of the EICP0 promoter was mutated to a nonfunctional TATA box, IEP-mediated trans-activation was reduced from 22-fold to 7-fold. The IEP trans-activated the viral promoters in a TATA motif-dependent manner. Our previous data showed that the IEP is able to repress its own promoter when the IEP-binding sequence (IEBS) is located within 26-bp from the TATA box. When the IEBS was located at 100 bp upstream of the TATA box, IEP-mediated trans-activation was very similar to that of the minimal IE(nt -89 to +73) promoter lacking the IEBS. As the distance from the IEBS to the TATA box decreased, IEP-mediated trans-activation progressively decreased, indicating that the IEBS located within 100 bp from the TATA box sequence functions as a distance-dependent repressive element. These results indicated that IEP-mediated full trans-activation requires a consensus TATA box of core promoters, but not its binding to the cognate sequence (IEBS). Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Full trans–activation mediated by the immediate–early protein of equine herpesvirus 1 requires a consensus TATA box, but not its cognate binding sequence

    PubMed Central

    Kim, Seong K.; Shakya, Akhalesh K.; O'Callaghan, Dennis J.

    2015-01-01

    The immediate-early protein (IEP) of equine herpesvirus 1 (EHV-1) has extensive homology to the IEP of alphaherpesviruses and possesses domains essential for trans-activation, including an acidic trans-activation domain (TAD) and binding domains for DNA, TFIIB, and TBP. Our data showed that the IEP directly interacted with transcription factor TFIIA, which is known to stabilize the binding of TBP and TFIID to the TATA box of core promoters. When the TATA box of the EICP0 promoter was mutated to a nonfunctional TATA box, IEP-mediated trans-activation was reduced from 22-fold to 7-fold. The IEP trans-activated the viral promoters in a TATA motif-dependent manner. Our previous data showed that the IEP is able to repress its own promoter when the IEP-binding sequence (IEBS) is located within 26-bp from the TATA box. When the IEBS was located at 100 bp upstream of the TATA box, IEP-mediated trans-activation was very similar to that of the minimal IE(nt −89 to +73) promoter lacking the IEBS. As the distance from the IEBS to the TATA box decreased, IEP-mediated trans-activation progressively decreased, indicating that the IEBS located within 100 bp from the TATA box sequence functions as a distance-dependent repressive element. These results indicated that IEP-mediated full trans-activation requires a consensus TATA box of core promoters, but not its binding to the cognate sequence (IEBS). PMID:26541315

  10. Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins

    PubMed Central

    Nakai, Shuryo; Li-Chan, Eunice CY; Dou, Jinglie

    2005-01-01

    Background Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families. Results Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme. Conclusion Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available. PMID:15904486

  11. Understanding sequence similarity and framework analysis between centromere proteins using computational biology.

    PubMed

    Doss, C George Priya; Chakrabarty, Chiranjib; Debajyoti, C; Debottam, S

    2014-11-01

    Certain mysteries pointing toward their recruitment pathways, cell cycle regulation mechanisms, spindle checkpoint assembly, and chromosome segregation process are considered the centre of attraction in cancer research. In modern times, with the established databases, ranges of computational platforms have provided a platform to examine almost all the physiological and biochemical evidences in disease-associated phenotypes. Using existing computational methods, we have utilized the amino acid residues to understand the similarity within the evolutionary variance of different associated centromere proteins. This study related to sequence similarity, protein-protein networking, co-expression analysis, and evolutionary trajectory of centromere proteins will speed up the understanding about centromere biology and will create a road map for upcoming researchers who are initiating their work of clinical sequencing using centromere proteins.

  12. A method for partitioning the information contained in a protein sequence between its structure and function.

    PubMed

    Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido

    2018-05-23

    Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  13. Synthetic signal sequences that enable efficient secretory protein production in the yeast Kluyveromyces marxianus.

    PubMed

    Yarimizu, Tohru; Nakamura, Mikiko; Hoshida, Hisashi; Akada, Rinji

    2015-02-14

    Targeting of cellular proteins to the extracellular environment is directed by a secretory signal sequence located at the N-terminus of a secretory protein. These signal sequences usually contain an N-terminal basic amino acid followed by a stretch containing hydrophobic residues, although no consensus signal sequence has been identified. In this study, simple modeling of signal sequences was attempted using Gaussia princeps secretory luciferase (GLuc) in the yeast Kluyveromyces marxianus, which allowed comprehensive recombinant gene construction to substitute synthetic signal sequences. Mutational analysis of the GLuc signal sequence revealed that the GLuc hydrophobic peptide length was lower limit for effective secretion and that the N-terminal basic residue was indispensable. Deletion of the 16th Glu caused enhanced levels of secreted protein, suggesting that this hydrophilic residue defined the boundary of a hydrophobic peptide stretch. Consequently, we redesigned this domain as a repeat of a single hydrophobic amino acid between the N-terminal Lys and C-terminal Glu. Stretches consisting of Phe, Leu, Ile, or Met were effective for secretion but the number of residues affected secretory activity. A stretch containing sixteen consecutive methionine residues (M16) showed the highest activity; the M16 sequence was therefore utilized for the secretory production of human leukemia inhibitory factor protein in yeast, resulting in enhanced secreted protein yield. We present a new concept for the provision of secretory signal sequence ability in the yeast K. marxianus, determined by the number of residues of a single hydrophobic residue located between N-terminal basic and C-terminal acidic amino acid boundaries.

  14. Ribosomal binding site sequences and promoters for expressing glutamate decarboxylase and producing γ-aminobutyrate in Corynebacterium glutamicum.

    PubMed

    Shi, Feng; Luan, Mingyue; Li, Yongfu

    2018-04-18

    Glutamate decarboxylase (GAD) converts L-glutamate (Glu) into γ-aminobutyric acid (GABA). Corynebacterium glutamicum that expresses exogenous GAD gene, gadB2 or gadB1, can synthesize GABA from its own produced Glu. To enhance GABA production in C. glutamicum, ribosomal binding site (RBS) sequence and promoter were searched and optimized for increasing the expression efficiency of gadB2. R4 exhibited the highest strength among RBS sequences tested, with 6 nt the optimal aligned spacing (AS) between RBS and start codon. This combination of RBS sequence and AS contributed to gadB2 expression, increased GAD activity by 156% and GABA production by 82% compared to normal strong RBS and AS combination. Then, a series of native promoters were selected for transcribing gadB2 under optimal RBS and AS combination. P dnaK , P dtsR , P odhI and P clgR expressed gadB2 and produced GABA as effectively as widely applied P tuf and P cspB promoters and more effectively than P sod promoter. However, each native promoter did not work as well as the synthetic strong promoter P tacM , which produced 20.2 ± 0.3 g/L GABA. Even with prolonged length and bicistronic architecture, the strength of P dnaK did not enhance. Finally, gadB2 and mutant gadB1 were co-expressed under the optimal promoter and RBS combination, thus converted Glu into GABA completely and improved GABA production to more than 25 g/L. This study provides useful promoters and RBS sequences for gene expression in C. glutamicum.

  15. Identification and application of self-binding zipper-like sequences in SARS-CoV spike protein.

    PubMed

    Zhang, Si Min; Liao, Ying; Neo, Tuan Ling; Lu, Yanning; Liu, Ding Xiang; Vahlne, Anders; Tam, James P

    2018-05-22

    Self-binding peptides containing zipper-like sequences, such as the Leu/Ile zipper sequence within the coiled coil regions of proteins and the cross-β spine steric zippers within the amyloid-like fibrils, could bind to the protein-of-origin through homophilic sequence-specific zipper motifs. These self-binding sequences represent opportunities for the development of biochemical tools and/or therapeutics. Here, we report on the identification of a putative self-binding β-zipper-forming peptide within the severe acute respiratory syndrome-associated coronavirus spike (S) protein and its application in viral detection. Peptide array scanning of overlapping peptides covering the entire length of S protein identified 34 putative self-binding peptides of six clusters, five of which contained octapeptide core consensus sequences. The Cluster I consensus octapeptide sequence GINITNFR was predicted by the Eisenberg's 3D profile method to have high amyloid-like fibrillation potential through steric β-zipper formation. Peptide C6 containing the Cluster I consensus sequence was shown to oligomerize and form amyloid-like fibrils. Taking advantage of this, C6 was further applied to detect the S protein expression in vitro by fluorescence staining. Meanwhile, the coiled-coil-forming Leu/Ile heptad repeat sequences within the S protein were under-represented during peptide array scanning, in agreement with that long peptide lengths were required to attain high helix-mediated interaction avidity. The data suggest that short β-zipper-like self-binding peptides within the S protein could be identified through combining the peptide scanning and predictive methods, and could be exploited as biochemical detection reagents for viral infection. Copyright © 2018. Published by Elsevier Ltd.

  16. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  17. Nipah Virus C Protein Recruits Tsg101 to Promote the Efficient Release of Virus in an ESCRT-Dependent Pathway.

    PubMed

    Park, Arnold; Yun, Tatyana; Vigant, Frederic; Pernet, Olivier; Won, Sohui T; Dawes, Brian E; Bartkowski, Wojciech; Freiberg, Alexander N; Lee, Benhur

    2016-05-01

    The budding of Nipah virus, a deadly member of the Henipavirus genus within the Paramyxoviridae, has been thought to be independent of the host ESCRT pathway, which is critical for the budding of many enveloped viruses. This conclusion was based on the budding properties of the virus matrix protein in the absence of other virus components. Here, we find that the virus C protein, which was previously investigated for its role in antagonism of innate immunity, recruits the ESCRT pathway to promote efficient virus release. Inhibition of ESCRT or depletion of the ESCRT factor Tsg101 abrogates the C enhancement of matrix budding and impairs live Nipah virus release. Further, despite the low sequence homology of the C proteins of known henipaviruses, they all enhance the budding of their cognate matrix proteins, suggesting a conserved and previously unknown function for the henipavirus C proteins.

  18. Top-down analysis of protein samples by de novo sequencing techniques.

    PubMed

    Vyatkina, Kira; Wu, Si; Dekker, Lennard J M; VanDuijn, Martijn M; Liu, Xiaowen; Tolić, Nikola; Luider, Theo M; Paša-Tolić, Ljiljana; Pevzner, Pavel A

    2016-09-15

    Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns. Freely available on the web at http://bioinf.spbau.ru/en/twister vyatkina@spbau.ru or ppevzner@ucsd.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Sequence diagrams and the presentation of structural and evolutionary relationships among proteins.

    PubMed

    Thomas, B R

    1975-01-01

    Protein sequences mapped on two-dimensional diagrams show characteristic patterns that should be of value in visualising sequence information and in distinguishing simpler structures. A convenient map form for comparative purposes is the alpha-helix diagram with aminoacid distribution analogous to the surface of an alpha-helix oriented so that an alpha-helix structure corresponds on the diagram to a vertical band 3.6 residues wide. The sequence diagram for an alpha-keratin, high-sulphur protein suggests a new form of polypeptide helix based on a repeating unit of five which may be an important component of alpha-keratin fibres.

  20. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  1. The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins.

    PubMed Central

    Fanning, T; Singer, M

    1987-01-01

    Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227

  2. A plasma membrane sucrose-binding protein that mediates sucrose uptake shares structural and sequence similarity with seed storage proteins but remains functionally distinct.

    PubMed

    Overvoorde, P J; Chao, W S; Grimes, H D

    1997-06-20

    Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.

  3. A scalable double-barcode sequencing platform for characterization of dynamic protein-protein interactions.

    PubMed

    Schlecht, Ulrich; Liu, Zhimin; Blundell, Jamie R; St Onge, Robert P; Levy, Sasha F

    2017-05-25

    Several large-scale efforts have systematically catalogued protein-protein interactions (PPIs) of a cell in a single environment. However, little is known about how the protein interactome changes across environmental perturbations. Current technologies, which assay one PPI at a time, are too low throughput to make it practical to study protein interactome dynamics. Here, we develop a highly parallel protein-protein interaction sequencing (PPiSeq) platform that uses a novel double barcoding system in conjunction with the dihydrofolate reductase protein-fragment complementation assay in Saccharomyces cerevisiae. PPiSeq detects PPIs at a rate that is on par with current assays and, in contrast with current methods, quantitatively scores PPIs with enough accuracy and sensitivity to detect changes across environments. Both PPI scoring and the bulk of strain construction can be performed with cell pools, making the assay scalable and easily reproduced across environments. PPiSeq is therefore a powerful new tool for large-scale investigations of dynamic PPIs.

  4. Heterologous mitochondrial targeting sequences can deliver functional proteins into mitochondria.

    PubMed

    Marcus, Dana; Lichtenstein, Michal; Cohen, Natali; Hadad, Rita; Erlich-Hadad, Tal; Greif, Hagar; Lorberboum-Galski, Haya

    2016-12-01

    Mitochondrial Targeting Sequences (MTSs) are responsible for trafficking nuclear-encoded proteins into mitochondria. Once entering the mitochondria, the MTS is recognized and cleaved off. Some MTSs are long and undergo two-step processing, as in the case of the human frataxin (FXN) protein (80aa), implicated in Friedreich's ataxia (FA). Therefore, we chose the FXN protein to examine whether nuclear-encoded mitochondrial proteins can efficiently be targeted via a heterologous MTS (hMTS) and deliver a functional protein into mitochondria. We examined three hMTSs; that of citrate synthase (cs), lipoamide deydrogenase (LAD) and C6ORF66 (ORF), as classically MTS sequences, known to be removed by one-step processing, to deliver FXN into mitochondria, in the form of fusion proteins. We demonstrate that using hMTSs for delivering FXN results in the production of 4-5-fold larger amounts of the fusion proteins, and at 4-5-fold higher concentrations. Moreover, hMTSs delivered a functional FXN protein into the mitochondria even more efficiently than the native MTSfxn, as evidenced by the rescue of FA patients' cells from oxidative stress; demonstrating a 18%-54% increase in cell survival; and a 13%-33% increase in ATP levels, as compared to the fusion protein carrying the native MTS. One fusion protein with MTScs increased aconitase activity within patients' cells, by 400-fold. The implications form our studies are of vast importance for both basic and translational research of mitochondrial proteins as any mitochondrial protein can be delivered efficiently by an hMTS. Moreover, effective targeting of functional proteins is important for restoration of mitochondrial function and treatment of related disorders. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Combining phage display with de novo protein sequencing for reverse engineering of monoclonal antibodies

    PubMed Central

    Rickert, Keith W.; Grinberg, Luba; Woods, Robert M.; Wilson, Susan; Bowen, Michael A.; Baca, Manuel

    2016-01-01

    ABSTRACT The enormous diversity created by gene recombination and somatic hypermutation makes de novo protein sequencing of monoclonal antibodies a uniquely challenging problem. Modern mass spectrometry-based sequencing will rarely, if ever, provide a single unambiguous sequence for the variable domains. A more likely outcome is computation of an ensemble of highly similar sequences that can satisfy the experimental data. This outcome can result in the need for empirical testing of many candidate sequences, sometimes iteratively, to identity one which can replicate the activity of the parental antibody. Here we describe an improved approach to antibody protein sequencing by using phage display technology to generate a combinatorial library of sequences that satisfy the mass spectrometry data, and selecting for functional candidates that bind antigen. This approach was used to reverse engineer 2 commercially-obtained monoclonal antibodies against murine CD137. Proteomic data enabled us to assign the majority of the variable domain sequences, with the exception of 3–5% of the sequence located within or adjacent to complementarity-determining regions. To efficiently resolve the sequence in these regions, small phage-displayed libraries were generated and subjected to antigen binding selection. Following enrichment of antigen-binding clones, 2 clones were selected for each antibody and recombinantly expressed as antigen-binding fragments (Fabs). In both cases, the reverse-engineered Fabs exhibited identical antigen binding affinity, within error, as Fabs produced from the commercial IgGs. This combination of proteomic and protein engineering techniques provides a useful approach to simplifying the technically challenging process of reverse engineering monoclonal antibodies from protein material. PMID:26852694

  6. Combining phage display with de novo protein sequencing for reverse engineering of monoclonal antibodies.

    PubMed

    Rickert, Keith W; Grinberg, Luba; Woods, Robert M; Wilson, Susan; Bowen, Michael A; Baca, Manuel

    2016-01-01

    The enormous diversity created by gene recombination and somatic hypermutation makes de novo protein sequencing of monoclonal antibodies a uniquely challenging problem. Modern mass spectrometry-based sequencing will rarely, if ever, provide a single unambiguous sequence for the variable domains. A more likely outcome is computation of an ensemble of highly similar sequences that can satisfy the experimental data. This outcome can result in the need for empirical testing of many candidate sequences, sometimes iteratively, to identity one which can replicate the activity of the parental antibody. Here we describe an improved approach to antibody protein sequencing by using phage display technology to generate a combinatorial library of sequences that satisfy the mass spectrometry data, and selecting for functional candidates that bind antigen. This approach was used to reverse engineer 2 commercially-obtained monoclonal antibodies against murine CD137. Proteomic data enabled us to assign the majority of the variable domain sequences, with the exception of 3-5% of the sequence located within or adjacent to complementarity-determining regions. To efficiently resolve the sequence in these regions, small phage-displayed libraries were generated and subjected to antigen binding selection. Following enrichment of antigen-binding clones, 2 clones were selected for each antibody and recombinantly expressed as antigen-binding fragments (Fabs). In both cases, the reverse-engineered Fabs exhibited identical antigen binding affinity, within error, as Fabs produced from the commercial IgGs. This combination of proteomic and protein engineering techniques provides a useful approach to simplifying the technically challenging process of reverse engineering monoclonal antibodies from protein material.

  7. OPAL: prediction of MoRF regions in intrinsically disordered protein sequences.

    PubMed

    Sharma, Ronesh; Raicar, Gaurav; Tsunoda, Tatsuhiko; Patil, Ashwini; Sharma, Alok

    2018-06-01

    Intrinsically disordered proteins lack stable 3-dimensional structure and play a crucial role in performing various biological functions. Key to their biological function are the molecular recognition features (MoRFs) located within long disordered regions. Computationally identifying these MoRFs from disordered protein sequences is a challenging task. In this study, we present a new MoRF predictor, OPAL, to identify MoRFs in disordered protein sequences. OPAL utilizes two independent sources of information computed using different component predictors. The scores are processed and combined using common averaging method. The first score is computed using a component MoRF predictor which utilizes composition and sequence similarity of MoRF and non-MoRF regions to detect MoRFs. The second score is calculated using half-sphere exposure (HSE), solvent accessible surface area (ASA) and backbone angle information of the disordered protein sequence, using information from the amino acid properties of flanks surrounding the MoRFs to distinguish MoRF and non-MoRF residues. OPAL is evaluated using test sets that were previously used to evaluate MoRF predictors, MoRFpred, MoRFchibi and MoRFchibi-web. The results demonstrate that OPAL outperforms all the available MoRF predictors and is the most accurate predictor available for MoRF prediction. It is available at http://www.alok-ai-lab.com/tools/opal/. ashwini@hgc.jp or alok.sharma@griffith.edu.au. Supplementary data are available at Bioinformatics online.

  8. EST-PAC a web package for EST annotation and protein sequence prediction

    PubMed Central

    Strahm, Yvan; Powell, David; Lefèvre, Christophe

    2006-01-01

    With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782

  9. Regulation of iron assimilation: nucleotide sequence analysis of an iron-regulated promoter from a fluorescent pseudomonad.

    PubMed

    O'Sullivan, D J; O'Gara, F

    1991-08-01

    An iron-regulated promoter was cloned on a 2.1 kb Bg/II fragment from Pseudomonas sp. strain M114 and fused to the lacZ reporter gene. Iron-regulated lacZ expression from the resulting construct (pSP1) in strain M114 was mediated via the Fur-like repressor which also regulates siderophore production in this strain. A 390 bp StuI-PstI internal fragment contained the necessary information for iron-regulated promoter expression. This fragment was sequenced and the initiation point for transcription was determined by primer extension analysis. The region directly upstream of the transcription start point contained no significant homology to known promoter consensus sequences. However the -16 to -25 bp region contained homology to four other iron-regulated pseudomonad promoters. Deletion of bases downstream from the transcriptional start did not affect the iron-regulated expression of the promoter. The -37 and -43 bp regions exhibited some homology to the 19 bp Escherichia coli Fur-binding consensus sequence. When expressed in E. coli (via a cloned transacting factor from strain M114) lacZ expression from pSP1 was found to be regulated by iron. A region of greater than 77 bases but less than 131 upstream from the transcriptional start was found to be necessary for promoter activity, further suggesting that a transcriptional activator may be required for expression.

  10. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  11. Mass spectrometry-based protein identification by integrating de novo sequencing with database searching.

    PubMed

    Wang, Penghao; Wilson, Susan R

    2013-01-01

    Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.

  12. Promoter architecture and transcriptional regulation of Abf1-dependent ribosomal protein genes in Saccharomyces cerevisiae

    PubMed Central

    Fermi, Beatrice; Bosio, Maria Cristina; Dieci, Giorgio

    2016-01-01

    In Saccharomyces cerevisiae, ribosomal protein gene (RPG) promoters display binding sites for either Rap1 or Abf1 transcription factors. Unlike Rap1-associated promoters, the small cohort of Abf1-dependent RPGs (Abf1-RPGs) has not been extensively investigated. We show that RPL3, RPL4B, RPP1A, RPS22B and RPS28A/B share a common promoter architecture, with an Abf1 site upstream of a conserved element matching the sequence recognized by Fhl1, a transcription factor which together with Ifh1 orchestrates Rap1-associated RPG regulation. Abf1 and Fhl1 promoter association was confirmed by ChIP and/or gel retardation assays. Mutational analysis revealed a more severe requirement of Abf1 than Fhl1 binding sites for RPG transcription. In the case of RPS22B an unusual Tbf1 binding site promoted both RPS22B and intron-hosted SNR44 expression. Abf1-RPG down-regulation upon TOR pathway inhibition was much attenuated at defective mutant promoters unable to bind Abf1. TORC1 inactivation caused the expected reduction of Ifh1 occupancy at RPS22B and RPL3 promoters, but unexpectedly it entailed largely increased Abf1 association with Abf1-RPG promoters. We present evidence that Abf1 recruitment upon nutritional stress, also observed for representative ribosome biogenesis genes, favours RPG transcriptional rescue upon nutrient replenishment, thus pointing to nutrient-regulated Abf1 dynamics at promoters as a novel mechanism in ribosome biogenesis control. PMID:27016735

  13. HNRNPLL stabilizes mRNAs for DNA replication proteins and promotes cell cycle progression in colorectal cancer cells.

    PubMed

    Sakuma, Keiichiro; Sasaki, Eiichi; Kimura, Kenya; Komori, Koji; Shimizu, Yasuhiro; Yatabe, Yasushi; Aoki, Masahiro

    2018-06-05

    HNRNPLL (heterogeneous nuclear ribonucleoprotein L-like), an RNA-binding protein that regulates alternative splicing of pre-mRNAs, has been shown to regulate differentiation of lymphocytes, as well as metastasis of colorectal cancer cells. Here we show that HNRNPLL promotes cell cycle progression and hence proliferation of colorectal cancer cells. Functional annotation analysis of those genes whose expression levels were changed by three-fold or more in RNA sequencing analysis between SW480 cells overexpressing HNRNPLL and those knocked down for HNRNPLL revealed enrichment of DNA replication-related genes by HNRNPLL overexpression. Among 13 genes detected in the DNA replication pathway, PCNA, RFC3, and FEN1 showed reproducible upregulation by HNRNPLL overexpression both at mRNA and protein levels in SW480 and HT29 cells. Importantly, knockdown of any of these genes alone suppressed the proliferation promoting effect induced by HNRNPLL overexpression. RNA-immunoprecipitation assay presented a binding of FLAG-tagged HNRNPLL to mRNA of these genes, and HNRNPLL overexpression significantly suppressed the downregulation of these genes during 12 hours of actinomycin D treatment, suggesting a role of HNRNPLL in mRNA stability. Finally, analysis of a public RNA sequencing dataset of clinical samples suggested a link between overexpression of HNRNPLL and that of PCNA, RFC3, and FEN1. This link was further supported by immunohistochemistry of colorectal cancer clinical samples, whereas expression of CDKN1A, which is known to inhibit the cooperative function of PCNA, RFC3, and FEN1, was negatively associated with HNRNPLL expression. These results indicate that HNRNPLL stabilizes mRNAs encoding regulators of DNA replication and promotes colorectal cancer cell proliferation. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  14. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    PubMed

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  15. Efficient use of unlabeled data for protein sequence classification: a comparative study

    PubMed Central

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-01-01

    Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450

  16. Efficient use of unlabeled data for protein sequence classification: a comparative study.

    PubMed

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-04-29

    Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.

  17. TRDistiller: a rapid filter for enrichment of sequence datasets with proteins containing tandem repeats.

    PubMed

    Richard, François D; Kajava, Andrey V

    2014-06-01

    The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes.

    PubMed

    Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying

    2015-01-01

    Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization

  19. Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

    PubMed Central

    Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

    1993-01-01

    The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943

  20. Sequence-Based Prediction of RNA-Binding Residues in Proteins.

    PubMed

    Walia, Rasna R; El-Manzalawy, Yasser; Honavar, Vasant G; Dobbs, Drena

    2017-01-01

    Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.

  1. Promoter/repressor system of Lactobacillus plantarum phage og1e: characterization of the promoters pR49-pR-pL and overproduction of the cro-like protein cng in Escherichia coli.

    PubMed

    Kakikawa, M; Watanabe, N; Funawatashi, T; Oki, M; Yasukawa, H; Taketo, A; Kodaira, K I

    1998-07-30

    The Lactobacillus plantarum phage og1e (42259bp) has two repressor-like genes cng and cpg oriented oppositely, accompanied by three potential promoters pR, pL and pR49, and seven operator-like sequences (GATAC-boxes) (Kodaira et al., 1997). In this study, the og1e putative promoters were introduced into the Escherichia coli promoter-detecting plasmid pKK232-8. In E. coli CK111, pR (pKPR1), pL (pKPL1) and pR49 (pKPR49) exhibited distinct CAT activities. When pKPR1 or pKPL1 was coexistent with a compatible plasmid pACYC184 carrying pR-cng (pA4PRCN1), the CAT activity was decreased significantly. On the other hand, cng directed a protein (Cng) of 10.1 kDa in E. coli under the control of T7 promoter. Gel mobility-shift assays demonstrated that Cng binds specifically to a DNA region containing the GATAC-boxes. In addition, primer extension analyses demonstrated that the two sequences pR and pL act as a promoter in L. plantarum as well as in E. coli. These results suggested that the potential promoters pR and pL probably function for the lytic and lysogenic pathways, respectively, and Cng may act as a repressor presumably through the GATAC-boxes as operators.

  2. STAT1:DNA sequence-dependent binding modulation by phosphorylation, protein:protein interactions and small-molecule inhibition

    PubMed Central

    Bonham, Andrew J.; Wenta, Nikola; Osslund, Leah M.; Prussin, Aaron J.; Vinkemeier, Uwe; Reich, Norbert O.

    2013-01-01

    The DNA-binding specificity and affinity of the dimeric human transcription factor (TF) STAT1, were assessed by total internal reflectance fluorescence protein-binding microarrays (TIRF-PBM) to evaluate the effects of protein phosphorylation, higher-order polymerization and small-molecule inhibition. Active, phosphorylated STAT1 showed binding preferences consistent with prior characterization, whereas unphosphorylated STAT1 showed a weak-binding preference for one-half of the GAS consensus site, consistent with recent models of STAT1 structure and function in response to phosphorylation. This altered-binding preference was further tested by use of the inhibitor LLL3, which we show to disrupt STAT1 binding in a sequence-dependent fashion. To determine if this sequence-dependence is specific to STAT1 and not a general feature of human TF biology, the TF Myc/Max was analysed and tested with the inhibitor Mycro3. Myc/Max inhibition by Mycro3 is sequence independent, suggesting that the sequence-dependent inhibition of STAT1 may be specific to this system and a useful target for future inhibitor design. PMID:23180800

  3. Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

    PubMed

    Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

    2012-04-01

    The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.

  4. Large-scale chromatin immunoprecipitation with promoter sequence microarray analysis of the interaction of the NSs protein of Rift Valley fever virus with regulatory DNA regions of the host genome.

    PubMed

    Benferhat, Rima; Josse, Thibaut; Albaud, Benoit; Gentien, David; Mansuroglu, Zeyni; Marcato, Vasco; Souès, Sylvie; Le Bonniec, Bernard; Bouloy, Michèle; Bonnefoy, Eliette

    2012-10-01

    Rift Valley fever virus (RVFV) is a highly pathogenic Phlebovirus that infects humans and ruminants. Initially confined to Africa, RVFV has spread outside Africa and presently represents a high risk to other geographic regions. It is responsible for high fatality rates in sheep and cattle. In humans, RVFV can induce hepatitis, encephalitis, retinitis, or fatal hemorrhagic fever. The nonstructural NSs protein that is the major virulence factor is found in the nuclei of infected cells where it associates with cellular transcription factors and cofactors. In previous work, we have shown that NSs interacts with the promoter region of the beta interferon gene abnormally maintaining the promoter in a repressed state. In this work, we performed a genome-wide analysis of the interactions between NSs and the host genome using a genome-wide chromatin immunoprecipitation combined with promoter sequence microarray, the ChIP-on-chip technique. Several cellular promoter regions were identified as significantly interacting with NSs, and the establishment of NSs interactions with these regions was often found linked to deregulation of expression of the corresponding genes. Among annotated NSs-interacting genes were present not only genes regulating innate immunity and inflammation but also genes regulating cellular pathways that have not yet been identified as targeted by RVFV. Several of these pathways, such as cell adhesion, axonal guidance, development, and coagulation were closely related to RVFV-induced disorders. In particular, we show in this work that NSs targeted and modified the expression of genes coding for coagulation factors, demonstrating for the first time that this hemorrhagic virus impairs the host coagulation cascade at the transcriptional level.

  5. Large-Scale Chromatin Immunoprecipitation with Promoter Sequence Microarray Analysis of the Interaction of the NSs Protein of Rift Valley Fever Virus with Regulatory DNA Regions of the Host Genome

    PubMed Central

    Benferhat, Rima; Josse, Thibaut; Albaud, Benoit; Gentien, David; Mansuroglu, Zeyni; Marcato, Vasco; Souès, Sylvie; Le Bonniec, Bernard

    2012-01-01

    Rift Valley fever virus (RVFV) is a highly pathogenic Phlebovirus that infects humans and ruminants. Initially confined to Africa, RVFV has spread outside Africa and presently represents a high risk to other geographic regions. It is responsible for high fatality rates in sheep and cattle. In humans, RVFV can induce hepatitis, encephalitis, retinitis, or fatal hemorrhagic fever. The nonstructural NSs protein that is the major virulence factor is found in the nuclei of infected cells where it associates with cellular transcription factors and cofactors. In previous work, we have shown that NSs interacts with the promoter region of the beta interferon gene abnormally maintaining the promoter in a repressed state. In this work, we performed a genome-wide analysis of the interactions between NSs and the host genome using a genome-wide chromatin immunoprecipitation combined with promoter sequence microarray, the ChIP-on-chip technique. Several cellular promoter regions were identified as significantly interacting with NSs, and the establishment of NSs interactions with these regions was often found linked to deregulation of expression of the corresponding genes. Among annotated NSs-interacting genes were present not only genes regulating innate immunity and inflammation but also genes regulating cellular pathways that have not yet been identified as targeted by RVFV. Several of these pathways, such as cell adhesion, axonal guidance, development, and coagulation were closely related to RVFV-induced disorders. In particular, we show in this work that NSs targeted and modified the expression of genes coding for coagulation factors, demonstrating for the first time that this hemorrhagic virus impairs the host coagulation cascade at the transcriptional level. PMID:22896612

  6. Seminal Plasma Proteins as Androgen Receptor Corregulators Promote Prostate Cancer Growth

    DTIC Science & Technology

    2016-12-01

    lines as well as the peptides described above, we will assess the efficacy of SgI peptides on tumor growth in a mouse xenograft model. Opportunities...Award Number: W81XWH-13-1-0412 TITLE: Seminal Plasma Proteins as Androgen Receptor Corregulators Promote Prostate Cancer Growth PRINCIPAL...SUBTITLE Seminal Plasma Proteins as Androgen Receptor Corregulators Promote Prostate Cancer Growth 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-13

  7. Flow-induced protein kinase A–CREB pathway acts via BMP signaling to promote HSC emergence

    PubMed Central

    Kim, Peter Geon; Nakano, Haruko; Das, Partha P.; Chen, Michael J.; Rowe, R. Grant; Chou, Stephanie S.; Ross, Samantha J.; Sakamoto, Kathleen M.; Zon, Leonard I.; Schlaeger, Thorsten M.; Orkin, Stuart H.; Nakano, Atsushi

    2015-01-01

    Fluid shear stress promotes the emergence of hematopoietic stem cells (HSCs) in the aorta–gonad–mesonephros (AGM) of the developing mouse embryo. We determined that the AGM is enriched for expression of targets of protein kinase A (PKA)–cAMP response element-binding protein (CREB), a pathway activated by fluid shear stress. By analyzing CREB genomic occupancy from chromatin-immunoprecipitation sequencing (ChIP-seq) data, we identified the bone morphogenetic protein (BMP) pathway as a potential regulator of CREB. By chemical modulation of the PKA–CREB and BMP pathways in isolated AGM VE-cadherin+ cells from mid-gestation embryos, we demonstrate that PKA–CREB regulates hematopoietic engraftment and clonogenicity of hematopoietic progenitors, and is dependent on secreted BMP ligands through the type I BMP receptor. Finally, we observed blunting of this signaling axis using Ncx1-null embryos, which lack a heartbeat and intravascular flow. Collectively, we have identified a novel PKA–CREB–BMP signaling pathway downstream of shear stress that regulates HSC emergence in the AGM via the endothelial-to-hematopoietic transition. PMID:25870201

  8. DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.

    PubMed

    Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano

    2018-01-01

    Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .

  9. Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.

    PubMed

    Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt

    2008-07-01

    MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.

  10. An Internal Signal Sequence Directs Intramembrane Proteolysis of a Cellular Immunoglobulin Domain Protein*S⃞

    PubMed Central

    Robakis, Thalia; Bak, Beata; Lin, Shu-huei; Bernard, Daniel J.; Scheiffele, Peter

    2008-01-01

    Precursor proteolysis is a crucial mechanism for regulating protein structure and function. Signal peptidase (SP) is an enzyme with a well defined role in cleaving N-terminal signal sequences but no demonstrated function in the proteolysis of cellular precursor proteins. We provide evidence that SP mediates intraprotein cleavage of IgSF1, a large cellular Ig domain protein that is processed into two separate Ig domain proteins. In addition, our results suggest the involvement of signal peptide peptidase (SPP), an intramembrane protease, which acts on substrates that have been previously cleaved by SP. We show that IgSF1 is processed through sequential proteolysis by SP and SPP. Cleavage is directed by an internal signal sequence and generates two separate Ig domain proteins from a polytopic precursor. Our findings suggest that SP and SPP function are not restricted to N-terminal signal sequence cleavage but also contribute to the processing of cellular transmembrane proteins. PMID:18981173

  11. Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies.

    PubMed

    Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G

    2012-09-01

    Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

  12. Novel numerical and graphical representation of DNA sequences and proteins.

    PubMed

    Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D

    2006-12-01

    We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.

  13. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  14. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

    PubMed Central

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-01-01

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363

  15. The N-terminal sequence of ribosomal protein L10 from the archaebacterium Halobacterium marismortui and its relationship to eubacterial protein L6 and other ribosomal proteins.

    PubMed

    Dijk, J; van den Broek, R; Nasiulas, G; Beck, A; Reinhardt, R; Wittmann-Liebold, B

    1987-08-01

    The amino-terminal sequence of ribosomal protein L10 from Halobacterium marismortui has been determined up to residue 54, using both a liquid- and a gas-phase sequenator. The two sequences are in good agreement. The protein is clearly homologous to protein HcuL10 from the related strain Halobacterium cutirubrum. Furthermore, a weaker but distinct homology to ribosomal protein L6 from Escherichia coli and Bacillus stearothermophilus can be detected. In addition to 7 identical amino acids in the first 36 residues in all four sequences a number of conservative replacements occurs, of mainly hydrophobic amino acids. In this common region the pattern of conserved amino acids suggests the presence of a beta-alpha fold as it occurs in ribosomal proteins L12 and L30. Furthermore, several potential cases of homology to other ribosomal components of the three ur-kingdoms have been found.

  16. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  17. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins.

    PubMed

    Sawle, Lucas; Ghosh, Kingshuk

    2015-08-28

    A general formalism to compute configurational properties of proteins and other heteropolymers with an arbitrary sequence of charges and non-uniform excluded volume interaction is presented. A variational approach is utilized to predict average distance between any two monomers in the chain. The presented analytical model, for the first time, explicitly incorporates the role of sequence charge distribution to determine relative sizes between two sequences that vary not only in total charge composition but also in charge decoration (even when charge composition is fixed). Furthermore, the formalism is general enough to allow variation in excluded volume interactions between two monomers. Model predictions are benchmarked against the all-atom Monte Carlo studies of Das and Pappu [Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013)] for 30 different synthetic sequences of polyampholytes. These sequences possess an equal number of glutamic acid (E) and lysine (K) residues but differ in the patterning within the sequence. Without any fit parameter, the model captures the strong sequence dependence of the simulated values of the radius of gyration with a correlation coefficient of R(2) = 0.9. The model is then applied to real proteins to compare the unfolded state dimensions of 540 orthologous pairs of thermophilic and mesophilic proteins. The excluded volume parameters are assumed similar under denatured conditions, and only electrostatic effects encoded in the sequence are accounted for. With these assumptions, thermophilic proteins are found-with high statistical significance-to have more compact disordered ensemble compared to their mesophilic counterparts. The method presented here, due to its analytical nature, is capable of making such high throughput analysis of multiple proteins and will have broad applications in proteomic studies as well as in other heteropolymeric systems.

  18. Promoter mapping of the mouse Tcp-10bt gene in transgenic mice identifies essential male germ cell regulatory sequences.

    PubMed

    Ewulonu, U K; Snyder, L; Silver, L M; Schimenti, J C

    1996-03-01

    Transgenic mice were generated to localize essential promoter elements in the mouse testis-expressed Tcp-10 genes. These genes are expressed exclusively in male germ cells, and exhibit a diffuse range of transcriptional start sites, possibly due to the absence of a TATA box. A series of transgene constructs containing different amounts of 5' flanking DNA revealed that all sequences necessary for appropriate temporal and tissue-specific transcription of Tcp-10 reside between positions -1 to -973. All transgenic animals containing these sequences expressed a chimeric transgene at high levels, in a pattern that paralleled the endogenous genes. These experiments further defined a 227 bp fragment from -746 to -973 that was absolutely essential for expression. In a gel-shift assay, this 227-bp fragment bound nuclear protein from testis, but not other tissues, to yield two retarded bands. Sequence analysis of this fragment revealed a half-site for the AP-2 transcription factor recognition sequence. Gel shift assays using native or mutant oligonucleotides demonstrated that the putative AP-2 recognition sequence was essential for generating the retarded bands. Since the binding activity is testis-specific, but AP-2 expression is not exclusive to male germ cells, it is possible that transcription of Tcp-10 requires interaction between AP-2 and a germ cell-specific transcription factor.

  19. Prediction of protein secondary structure content for the twilight zone sequences.

    PubMed

    Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke

    2007-11-15

    Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it

  20. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server.

    PubMed

    Abriata, Luciano A; Bovigny, Christophe; Dal Peraro, Matteo

    2016-06-17

    Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html ) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design.

  1. Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950

    PubMed Central

    Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick

    1982-01-01

    The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262

  2. Global investigation of protein-protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences.

    PubMed

    Pitre, S; North, C; Alamgir, M; Jessulat, M; Chan, A; Luo, X; Green, J R; Dumontier, M; Dehne, F; Golshani, A

    2008-08-01

    Protein-protein interaction (PPI) maps provide insight into cellular biology and have received considerable attention in the post-genomic era. While large-scale experimental approaches have generated large collections of experimentally determined PPIs, technical limitations preclude certain PPIs from detection. Recently, we demonstrated that yeast PPIs can be computationally predicted using re-occurring short polypeptide sequences between known interacting protein pairs. However, the computational requirements and low specificity made this method unsuitable for large-scale investigations. Here, we report an improved approach, which exhibits a specificity of approximately 99.95% and executes 16,000 times faster. Importantly, we report the first all-to-all sequence-based computational screen of PPIs in yeast, Saccharomyces cerevisiae in which we identify 29,589 high confidence interactions of approximately 2 x 10(7) possible pairs. Of these, 14,438 PPIs have not been previously reported and may represent novel interactions. In particular, these results reveal a richer set of membrane protein interactions, not readily amenable to experimental investigations. From the novel PPIs, a novel putative protein complex comprised largely of membrane proteins was revealed. In addition, two novel gene functions were predicted and experimentally confirmed to affect the efficiency of non-homologous end-joining, providing further support for the usefulness of the identified PPIs in biological investigations.

  3. Role of activator protein-1 on the effect of arginine-glycine-aspartic acid containing peptides on transforming growth factor-beta1 promoter activity.

    PubMed

    Ruiz-Torres, M P; Perez-Rivero, G; Diez-Marques, M L; Griera, M; Ortega, R; Rodriguez-Puyol, M; Rodríguez-Puyol, D

    2007-01-01

    While arginine-glycine-aspartic acid-based peptidomimetics have been employed for the treatment of cardiovascular disorders and cancer, their use in other contexts remains to be explored. Arginine-glycine-aspartic acid-serine induces Transforming growth factor-beta1 transcription in human mesangial cells, but the molecular mechanisms involved have not been studied extensively. We explored whether this effect could be due to Activator protein-1 activation and studied the potential pathways involved. Addition of arginine-glycine-aspartic acid-serine promoted Activator protein-1 binding to its cognate sequence within the Transforming growth factor-beta1 promoter as well as c-jun and c-fos protein abundance. Moreover, this effect was suppressed by curcumin, a c-Jun N terminal kinase inhibitor, and was absent when the Activator protein-1 cis-regulatory element was deleted. Activator protein-1 binding was dependent on the activity of integrin linked kinase, as transfection with a dominant negative mutant suppressed both Activator protein-1 binding and c-jun and c-fos protein increment. Integrin linked kinase was, in turn, dependent on Phosphoinositol-3 kinase activity. Arginine-glycine-aspartic acid-serine stimulated Phosphoinositol-3 kinase activity, and Transforming growth factor-beta1 promoter activation was abrogated by the use of Phosphoinositol-3 kinase specific inhibitors. In summary, we propose that arginine-glycine-aspartic acid-serine activates Integrin linked kinase via the Phosphoinositol-3 kinase pathway and this leads to activation of c-jun and c-fos and increased Activator protein-1 binding and Transforming growth factor-beta1 promoter activity. These data may contribute to understand the molecular mechanisms involved in the cellular actions of arginine-glycine-aspartic acid-related peptides and enhance their relevance as these products evolve into clinical therapeutic use.

  4. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

    PubMed

    Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

    2009-10-12

    Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.

  5. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.

    PubMed

    Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F

    2017-08-18

    Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.

  6. Preparation of protein samples for mass spectrometry and N-terminal sequencing.

    PubMed

    Glenn, Gary

    2014-01-01

    The preparation of protein samples for mass spectrometry and N-terminal sequencing is a key step in successfully identifying proteins. Mass spectrometry is a very sensitive technique, and as such, samples must be prepared carefully since they can be subject to contamination of the sample (e.g., due to incomplete subcellular fractionation or purification of a multiprotein complex), overwhelming of the sample by highly abundant proteins, and contamination from skin or hair (keratin can be a very common hit). One goal of sample preparation for mass spec is to reduce the complexity of the sample - in the example presented here, mitochondria are purified, solubilized, and fractionated by sucrose density gradient sedimentation prior to preparative 1D SDS-PAGE. It is important to verify the purity and integrity of the sample so that you can have confidence in the hits obtained. More protein is needed for N-terminal sequencing and ideally it should be purified to a single band when run on an SDS-polyacrylamide gel. The example presented here involves stably expressing a tagged protein in HEK293 cells and then isolating the protein by affinity purification and SDS-PAGE. © 2014 Elsevier Inc. All rights reserved.

  7. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  8. SSMART: Sequence-structure motif identification for RNA-binding proteins.

    PubMed

    Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe

    2018-06-11

    RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.

  9. Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

    2011-10-01

    The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.

  10. Promoter architecture and transcriptional regulation of Abf1-dependent ribosomal protein genes in Saccharomyces cerevisiae.

    PubMed

    Fermi, Beatrice; Bosio, Maria Cristina; Dieci, Giorgio

    2016-07-27

    In Saccharomyces cerevisiae, ribosomal protein gene (RPG) promoters display binding sites for either Rap1 or Abf1 transcription factors. Unlike Rap1-associated promoters, the small cohort of Abf1-dependent RPGs (Abf1-RPGs) has not been extensively investigated. We show that RPL3, RPL4B, RPP1A, RPS22B and RPS28A/B share a common promoter architecture, with an Abf1 site upstream of a conserved element matching the sequence recognized by Fhl1, a transcription factor which together with Ifh1 orchestrates Rap1-associated RPG regulation. Abf1 and Fhl1 promoter association was confirmed by ChIP and/or gel retardation assays. Mutational analysis revealed a more severe requirement of Abf1 than Fhl1 binding sites for RPG transcription. In the case of RPS22B an unusual Tbf1 binding site promoted both RPS22B and intron-hosted SNR44 expression. Abf1-RPG down-regulation upon TOR pathway inhibition was much attenuated at defective mutant promoters unable to bind Abf1. TORC1 inactivation caused the expected reduction of Ifh1 occupancy at RPS22B and RPL3 promoters, but unexpectedly it entailed largely increased Abf1 association with Abf1-RPG promoters. We present evidence that Abf1 recruitment upon nutritional stress, also observed for representative ribosome biogenesis genes, favours RPG transcriptional rescue upon nutrient replenishment, thus pointing to nutrient-regulated Abf1 dynamics at promoters as a novel mechanism in ribosome biogenesis control. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Identification of a DNA sequence motif required for expression of iron-regulated genes in pseudomonads.

    PubMed

    Rombel, I T; McMorran, B J; Lamont, I L

    1995-02-20

    Many bacteria respond to a lack of iron in the environment by synthesizing siderophores, which act as iron-scavenging compounds. Fluorescent pseudomonads synthesize strain-specific but chemically related siderophores called pyoverdines or pseudobactins. We have investigated the mechanisms by which iron controls expression of genes involved in pyoverdine metabolism in Pseudomonas aeruginosa. Transcription of these genes is repressed by the presence of iron in the growth medium. Three promoters from these genes were cloned and the activities of the promoters were dependent on the amounts of iron in the growth media. Two of the promoters were sequenced and the transcriptional start site were identified by S1 nuclease analysis. Sequences similar to the consensus binding site for the Fur repressor protein, which controls expression of iron-repressible genes in several gram-negative species, were not present in the promoters, suggesting that they are unlikely to have a high affinity for Fur. However, comparison of the promoter sequences with those of iron-regulated genes from other Pseudomonas species and also the iron-regulated exotoxin gene of P. aeruginosa allowed identification of a shared sequence element, with the consensus sequence (G/C)CTAAAT-CCC, which is likely to act as a binding site for a transcriptional activator protein. Mutations in this sequence greatly reduced the activities of the promoters characterized here as well as those of other iron-regulated promoters. The requirement for this motif in the promoters of iron-regulated genes of different Pseudomonas species indicates that similar mechanisms are likely to be involved in controlling expression of a range of iron-regulated genes in pseudomonads.

  12. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently

    PubMed Central

    Currin, Andrew; Swainston, Neil; Day, Philip J.

    2015-01-01

    The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the ‘search space’ of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (K d) and catalytic (k cat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving k cat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the ‘best’ amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole

  13. Identification of hamster inducible nitric oxide synthase (iNOS) promoter sequences that influence basal and inducible iNOS expression

    PubMed Central

    Saldarriaga, Omar A.; Travi, Bruno L.; Choudhury, Goutam Ghosh; Melby, Peter C.

    2012-01-01

    IFN-γ/LPS-activated hamster (Mesocricetus auratus) macrophages express significantly less iNOS (NOS2) than activated mouse macrophages, which contributes to the hamster's susceptibility to intracellular pathogens. We determined a mechanism responsible for differences in iNOS promoter activity in hamsters and mice. The HtPP (1.2 kb) showed low basal and inducible promoter activity when compared with the mouse, and sequences within a 100-bp region (−233 to −133) of the mouse and hamster promoters influenced this activity. Moreover, within this 100 bp, we identified a smaller region (44 bp) in the mouse promoter, which recovered basal promoter activity when swapped into the hamster promoter. The mouse homolog (100-bp region) contained a cis-element for NF-IL-6 (−153/−142), which was absent in the hamster counterpart. EMSA and supershift assays revealed that the hamster sequence did not support the binding of NF-IL-6. Introduction of a functional NF-IL-6 binding sequence into the hamster promoter or its alteration in the mouse promoter revealed the critical importance of this transcription factor for full iNOS promoter activity. Furthermore, the binding of NF-IL-6 to the iNOS promoter (−153/−142) in vivo was increased in mouse cells but was reduced in hamster cells after IFN-γ/LPS stimulation. Differences in the activity of the iNOS promoters were evident in mouse and hamster cells, so they were not merely a result of species-specific differences in transcription factors. Thus, we have identified unique DNA sequences and a critical transcription factor, NF-IL-6, which contribute to the overall basal and inducible expression of hamster iNOS. PMID:22517919

  14. DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.

    PubMed

    Mehdi, Ahmed M; Sehgal, Muhammad Shoaib B; Kobe, Bostjan; Bailey, Timothy L; Bodén, Mikael

    2013-01-01

    Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. http://bioinf.scmb.uq.edu.au/dlocalmotif/

  15. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

    PubMed

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-11-16

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

    PubMed Central

    2015-01-01

    Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein

  17. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  18. Identification of distal silencing elements in the murine interferon-A11 gene promoter.

    PubMed

    Roffet, P; Lopez, S; Navarro, S; Bandu, M T; Coulombel, C; Vignal, M; Doly, J; Vodjdani, G

    1996-08-01

    The murine interferon-A11 (Mu IFN-A11) gene is a member of the IFN-A multigenic family. In mouse L929 cells, the weak response of the gene's promoter to viral induction is due to a combination of both a point mutation in the virus responsive element (VRE) and the presence of negatively regulating sequences surrounding the VRE. In the distal part of the promoter, the negatively acting E1E2 sequence was delimited. This sequence displays an inhibitory effect in either orientation or position on the inducibility of a virus-responsive heterologous promoter. It selectively represses VRE-dependent transcription but is not able to reduce the transcriptional activity of a VRE-lacking promoter. In a transient transfection assay, an E1E2-containing DNA competitor was able to derepress the native Mu IFN-A11 promoter. Specific nuclear factors bind to this sequence; thus the binding of trans-regulators participates in the repression of the Mu IFN-A11 gene. The E1E2 sequence contains an IFN regulatory factor (IRF)-binding site. Recombinant IRF2 binds this sequence and anti-IRF2 antibodies supershift a major complex formed with nuclear extracts. The protein composing the complex is 50 kDa in size, indicating the presence of IRF2 or antigenically related proteins in the complex. The Mu IFN-A11 gene is the first example within the murine IFN-A family, in which a distal promoter element has been identified that can negatively modulate the transcriptional response to viral induction.

  19. Promoters, toll like receptors and microRNAs: a strange association.

    PubMed

    Korla, Kalyani; Arrigo, Patrizio; Mitra, Chanchal K

    2013-06-01

    Toll-like receptors (TLRs) are proteins that play key role in the innate immune system. In the present study, -1000 base pairs upstream are taken from the transcription start site of the various TLR genes (10 known) in human. About 40 microRNAs have been identified that share 12-19 nucleotide sequence similarity with the promoter regions of 10 TLRs. It is proposed that the microRNA performs potential role in identification of promoter sequence and initiation of transcription.

  20. Bacterial-like PPP protein phosphatases: novel sequence alterations in pathogenic eukaryotes and peculiar features of bacterial sequence similarity.

    PubMed

    Kerk, David; Uhrig, R Glen; Moorhead, Greg B

    2013-01-01

    Reversible phosphorylation is a widespread modification affecting the great majority of eukaryotic cellular proteins, and whose effects influence nearly every cellular function. Protein phosphatases are increasingly recognized as exquisitely regulated contributors to these changes. The PPP (phosphoprotein phosphatase) family comprises enzymes, which catalyze dephosphorylation at serine and threonine residues. Nearly a decade ago, "bacterial-like" enzymes were recognized with similarity to proteins from various bacterial sources: SLPs (Shewanella-like phosphatases), RLPHs (Rhizobiales-like phosphatases), and ALPHs (ApaH-like phosphatases). A recent article from our laboratory appearing in Plant Physiology characterizes their extensive organismal distribution, abundance in plant species, predicted subcellular localization, motif organization, and sequence evolution. One salient observation is the distinct evolutionary trajectory followed by SLP genes and proteins in photosynthetic eukaryotes vs. animal and plant pathogens derived from photosynthetic ancestors. We present here a closer look at sequence data that emphasizes the distinctiveness of pathogen SLP proteins and that suggests that they might represent novel drug targets. A second observation in our original report was the high degree of similarity between the bacterial-like PPPs of eukaryotes and closely related proteins of the "eukaryotic-like" phyla Myxococcales and Planctomycetes. We here reflect on the possible implications of these observations and their importance for future research.

  1. G protein-coupled odorant receptors: From sequence to structure.

    PubMed

    de March, Claire A; Kim, Soo-Kyung; Antonczak, Serge; Goddard, William A; Golebiowski, Jérôme

    2015-09-01

    Odorant receptors (ORs) are the largest subfamily within class A G protein-coupled receptors (GPCRs). No experimental structural data of any OR is available to date and atomic-level insights are likely to be obtained by means of molecular modeling. In this article, we critically align sequences of ORs with those GPCRs for which a structure is available. Here, an alignment consistent with available site-directed mutagenesis data on various ORs is proposed. Using this alignment, the choice of the template is deemed rather minor for identifying residues that constitute the wall of the binding cavity or those involved in G protein recognition. © 2015 The Protein Society.

  2. Soy protein isolate inhibits hepatic tumor promotion in mice fed a high-fat liquid diet.

    PubMed

    Mercer, Kelly E; Pulliam, Casey F; Pedersen, Kim B; Hennings, Leah; Ronis, Martin Jj

    2017-03-01

    Alcoholic and nonalcoholic fatty liver diseases are risk factors for development of hepatocellular carcinoma, but the underlying mechanisms are poorly understood. On the other hand, ingestion of soy-containing diets may oppose the development of certain cancers. We previously reported that replacing casein with a soy protein isolate reduced tumor promotion in the livers of mice with alcoholic liver disease after feeding a high fat ethanol liquid diet following initiation with diethylnitrosamine. Feeding soy protein isolate inhibited processes that may contribute to tumor promotion including inflammation, sphingolipid signaling, and Wnt/β-catenin signaling. We have extended these studies to characterize liver tumor promotion in a model of nonalcoholic fatty liver disease produced by chronic feeding of high-fat liquid diets in the absence of ethanol. Mice treated with diethylnitrosamine on postnatal day 14 were fed a high-fat liquid diet made with casein or SPI as the sole protein source for 16 weeks in adulthood. Relative to mice fed normal chow, a high fat/casein diet led to increased tumor promotion, hepatocyte proliferation, steatosis, and inflammation. Replacing casein with soy protein isolate counteracted these effects. The high fat diets also resulted in a general increase in transcripts for Wnt/β-catenin pathway components, which may be an important mechanism, whereby hepatic tumorigenesis is promoted. However, soy protein isolate did not block Wnt signaling in this nonalcoholic fatty liver disease model. We conclude that replacing casein with soy protein isolate blocks development of steatosis, inflammation, and tumor promotion in diethylnitrosamine-treated mice fed high fat diets. Impact statement The impact of dietary components on cancer is a topic of great interest for both the general public and the scientific community. Liver cancer is currently the second leading form of cancer deaths worldwide. Our study has addressed the effect of the protein

  3. The human haptoglobin gene promoter: interleukin-6-responsive elements interact with a DNA-binding protein induced by interleukin-6.

    PubMed Central

    Oliviero, S; Cortese, R

    1989-01-01

    Transcription of the human haptoglobin (Hp) gene is induced by interleukin-6 (IL-6) in the human hepatoma cell line Hep3B. Cis-acting elements responsible for this response are localized within the first 186 bp of the 5'-flanking region. Site-specific mutants of the Hp promoter fused to the chloramphenicol acetyl transferase (CAT) gene were analysed by transient transfection into uninduced and IL-6-treated Hep3B cells. We identified three regions, A, B and C, defined by mutation, which are important for the IL-6 response. Band shift experiments using nuclear extracts from untreated or IL-6-treated cells revealed the presence of IL-6-inducible DNA binding activities when DNA fragments containing the A or the C sequences were used. Competition experiments showed that both sequences bind to the same nuclear factors. Polymers of oligonucleotides containing either the A or the C regions confer IL-6 responsiveness to a truncated SV40 promoter. The B region forms several complexes with specific DNA-binding proteins different from those which bind to the A and C region. The B region complexes are identical in nuclear extracts from IL-6-treated and untreated cells. While important for IL-6 induction in the context of the haptoglobin promoter, the B site does not confer IL-6 inducibility to the SV40 promoter. Our results indicate that the IL-6 response of the haptoglobin promoter is dependent on the presence of multiple, partly redundant, cis-acting elements. Images PMID:2787245

  4. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  5. The C-Terminal Sequence of RhoB Directs Protein Degradation through an Endo-Lysosomal Pathway

    PubMed Central

    Ramos, Irene; Herrera, Mónica; Stamatakis, Konstantinos

    2009-01-01

    Background Protein degradation is essential for cell homeostasis. Targeting of proteins for degradation is often achieved by specific protein sequences or posttranslational modifications such as ubiquitination. Methodology/Principal Findings By using biochemical and genetic tools we have monitored the localization and degradation of endogenous and chimeric proteins in live primary cells by confocal microscopy and ultra-structural analysis. Here we identify an eight amino acid sequence from the C-terminus of the short-lived GTPase RhoB that directs the rapid degradation of both RhoB and chimeric proteins bearing this sequence through a lysosomal pathway. Elucidation of the RhoB degradation pathway unveils a mechanism dependent on protein isoprenylation and palmitoylation that involves sorting of the protein into multivesicular bodies, mediated by the ESCRT machinery. Moreover, RhoB sorting is regulated by late endosome specific lipid dynamics and is altered in human genetic lipid traffic disease. Conclusions/Significance Our findings characterize a short-lived cytosolic protein that is degraded through a lysosomal pathway. In addition, we define a novel motif for protein sorting and rapid degradation, which allows controlling protein levels by means of clinically used drugs. PMID:19956591

  6. Single-Nucleotide-Specific Targeting of the Tf1 Retrotransposon Promoted by the DNA-Binding Protein Sap1 of Schizosaccharomyces pombe.

    PubMed

    Hickey, Anthony; Esnault, Caroline; Majumdar, Anasuya; Chatterjee, Atreyi Ghatak; Iben, James R; McQueen, Philip G; Yang, Andrew X; Mizuguchi, Takeshi; Grewal, Shiv I S; Levin, Henry L

    2015-11-01

    Transposable elements (TEs) constitute a substantial fraction of the eukaryotic genome and, as a result, have a complex relationship with their host that is both adversarial and dependent. To minimize damage to cellular genes, TEs possess mechanisms that target integration to sequences of low importance. However, the retrotransposon Tf1 of Schizosaccharomyces pombe integrates with a surprising bias for promoter sequences of stress-response genes. The clustering of integration in specific promoters suggests that Tf1 possesses a targeting mechanism that is important for evolutionary adaptation to changes in environment. We report here that Sap1, an essential DNA-binding protein, plays an important role in Tf1 integration. A mutation in Sap1 resulted in a 10-fold drop in Tf1 transposition, and measures of transposon intermediates support the argument that the defect occurred in the process of integration. Published ChIP-Seq data on Sap1 binding combined with high-density maps of Tf1 integration that measure independent insertions at single-nucleotide positions show that 73.4% of all integration occurs at genomic sequences bound by Sap1. This represents high selectivity because Sap1 binds just 6.8% of the genome. A genome-wide analysis of promoter sequences revealed that Sap1 binding and amounts of integration correlate strongly. More important, an alignment of the DNA-binding motif of Sap1 revealed integration clustered on both sides of the motif and showed high levels specifically at positions +19 and -9. These data indicate that Sap1 contributes to the efficiency and position of Tf1 integration. Copyright © 2015 by the Genetics Society of America.

  7. Single-Nucleotide-Specific Targeting of the Tf1 Retrotransposon Promoted by the DNA-Binding Protein Sap1 of Schizosaccharomyces pombe

    PubMed Central

    Hickey, Anthony; Esnault, Caroline; Majumdar, Anasuya; Chatterjee, Atreyi Ghatak; Iben, James R.; McQueen, Philip G.; Yang, Andrew X.; Mizuguchi, Takeshi; Grewal, Shiv I. S.; Levin, Henry L.

    2015-01-01

    Transposable elements (TEs) constitute a substantial fraction of the eukaryotic genome and, as a result, have a complex relationship with their host that is both adversarial and dependent. To minimize damage to cellular genes, TEs possess mechanisms that target integration to sequences of low importance. However, the retrotransposon Tf1 of Schizosaccharomyces pombe integrates with a surprising bias for promoter sequences of stress-response genes. The clustering of integration in specific promoters suggests that Tf1 possesses a targeting mechanism that is important for evolutionary adaptation to changes in environment. We report here that Sap1, an essential DNA-binding protein, plays an important role in Tf1 integration. A mutation in Sap1 resulted in a 10-fold drop in Tf1 transposition, and measures of transposon intermediates support the argument that the defect occurred in the process of integration. Published ChIP-Seq data on Sap1 binding combined with high-density maps of Tf1 integration that measure independent insertions at single-nucleotide positions show that 73.4% of all integration occurs at genomic sequences bound by Sap1. This represents high selectivity because Sap1 binds just 6.8% of the genome. A genome-wide analysis of promoter sequences revealed that Sap1 binding and amounts of integration correlate strongly. More important, an alignment of the DNA-binding motif of Sap1 revealed integration clustered on both sides of the motif and showed high levels specifically at positions +19 and −9. These data indicate that Sap1 contributes to the efficiency and position of Tf1 integration. PMID:26358720

  8. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

    PubMed Central

    Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

    2017-01-01

    Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546

  9. Myelin protein zero gene sequencing diagnoses Charcot-Marie-Tooth Type 1B disease

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Su, Y.; Zhang, H.; Madrid, R.

    1994-09-01

    Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses ismore » to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.« less

  10. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    PubMed

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  11. muBLASTP: database-indexed protein sequence search on multicore CPUs.

    PubMed

    Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

    2016-11-04

    The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

  12. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

    PubMed Central

    2009-01-01

    Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996

  13. Streptomyces griseus streptomycin phosphotransferase: expression of its gene in Escherichia coli and sequence homology with other antibiotic phosphotransferases and with eukaryotic protein kinases.

    PubMed

    Lim, C K; Smith, M C; Petty, J; Baumberg, S; Wootton, J C

    1989-12-01

    The aphD gene of Streptomyces griseus, encoding a streptomycin 6-phosphotransferase (SPH), was sub-cloned in the pBR322-based expression vector pRK9 (which contains the Serratia marcescens trp promoter) with selection for expression of streptomycin resistance in Escherichia coli. Two hybrid plasmids, pCKL631 and pCKL711, were isolated which conferred resistance. Both contained a approximately 2 kbp fragment already suspected to include aphD. The properties of in vitro deletion derivatives of these plasmids were consistent with the presumed location of aphD. In vitro deletion of a sequence including most of the trp promoter largely, but not quite completely, abolished the ability of the plasmid to confer streptomycin resistance, confirming that expression was indeed principally from the trp promoter. A polypeptide of approximately 34.5 kDa was present in minicells containing plasmids that conferred streptomycin resistance, but was absent when the plasmids contained in vitro deletions removing streptomycin resistance. Part of the fragment was sequenced and an open reading frame corresponding to aphD identified. A computer-assisted comparison of the deduced SPH sequence with those of other antibiotic phosphotransferases suggested a common structure A-B-C-D-E, where B and D were conserved between all sequences compared while A, C and E divided between the streptomycin and hygromycin B phosphotransferases on one hand and kanamycin/neomycin ones on the other. A composite sequence data base was searched for homologues to consensus matrices constructed from five approximately 12-residue subsequences within blocks B and D. For one subsequence, corresponding to the N-terminal portion of block D, those sequences from the database that yielded the highest homology scores comprised almost entirely either antibiotic phosphotransferases or eukaryotic protein kinases. Possible evolutionary implications of this homology, previously described by other groups, are discussed.

  14. Selection of a platinum-binding sequence in a loop of a four-helix bundle protein.

    PubMed

    Yagi, Sota; Akanuma, Satoshi; Kaji, Asumi; Niiro, Hiroya; Akiyama, Hayato; Uchida, Tatsuya; Yamagishi, Akihiko

    2018-02-01

    Protein-metal hybrids are functional materials with various industrial applications. For example, a redox enzyme immobilized on a platinum electrode is a key component of some biofuel cells and biosensors. To create these hybrid materials, protein molecules are bound to metal surfaces. Here, we report the selection of a novel platinum-binding sequence in a loop of a four-helix bundle protein, the Lac repressor four-helix protein (LARFH), an artificial protein in which four identical α-helices are connected via three identical loops. We created a genetic library in which the Ser-Gly-Gln-Gly-Gly-Ser sequence within the first inter-helical loop of LARFH was semi-randomly mutated. The library was then subjected to selection for platinum-binding affinity by using the T7 phage display method. The majority of the selected variants contained the Tyr-Lys-Arg-Gly-Tyr-Lys (YKRGYK) sequence in their randomized segment. We characterized the platinum-binding properties of mutant LARFH by using quartz crystal microbalance analysis. Mutant LARFH seemed to interact with platinum through its loop containing the YKRGYK sequence, as judged by the estimated exclusive area occupied by a single molecule. Furthermore, a 10-residue peptide containing the YKRGYK sequence bound to platinum with reasonably high affinity and basic side chains in the peptide were crucial in mediating this interaction. In conclusion, we have identified an amino acid sequence, YKRGYK, in the loop of a helix-loop-helix motif that shows high platinum-binding affinity. This sequence could be grafted into loops of other polypeptides as an approach to immobilize proteins on platinum electrodes for use as biosensors among other applications. Copyright © 2017 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  15. Isolation and N-terminal sequencing of a novel cadmium-binding protein from Boletus edulis

    NASA Astrophysics Data System (ADS)

    Collin-Hansen, C.; Andersen, R. A.; Steinnes, E.

    2003-05-01

    A Cd-binding protein was isolated from the popular edible mushroom Boletus edulis, which is a hyperaccumulator of both Cd and Hg. Wild-growing samples of B. edulis were collected from soils rich in Cd. Cd radiotracer was added to the crude protein preparation obtained from ethanol precipitation of heat-treated cytosol. Proteins were then further separated in two consecutive steps; gel filtration and anion exchange chromatography. In both steps the Cd radiotracer profile showed only one distinct peak, which corresponded well with the profiles of endogenous Cd obtained by atomic absorption spectrophotometry (AAS). Concentrations of the essential elements Cu and Zn were low in the protein fractions high in Cd. N-terminal sequencing performed on the Cd-binding protein fractions revealed a protein with a novel amino acid sequence, which contained aromatic amino acids as well as proline. Both the N-terminal sequencing and spectrofluorimetric analysis with EDTA and ABD-F (4-aminosulfonyl-7-fluoro-2, 1, 3-benzoxadiazole) failed to detect cysteine in the Cd-binding fractions. These findings conclude that the novel protein does not belong to the metallothionein family. The results suggest a role for the protein in Cd transport and storage, and they are of importance in view of toxicology and food chemistry, but also for environmental protection.

  16. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins

    NASA Astrophysics Data System (ADS)

    Firman, Taylor; Ghosh, Kingshuk

    2018-03-01

    We present an analytical theory to compute conformations of heteropolymers—applicable to describe disordered proteins—as a function of temperature and charge sequence. The theory describes coil-globule transition for a given protein sequence when temperature is varied and has been benchmarked against the all-atom Monte Carlo simulation (using CAMPARI) of intrinsically disordered proteins (IDPs). In addition, the model quantitatively shows how subtle alterations of charge placement in the primary sequence—while maintaining the same charge composition—can lead to significant changes in conformation, even as drastic as a coil (swelled above a purely random coil) to globule (collapsed below a random coil) and vice versa. The theory provides insights on how to control (enhance or suppress) these changes by tuning the temperature (or solution condition) and charge decoration. As an application, we predict the distribution of conformations (at room temperature) of all naturally occurring IDPs in the DisProt database and notice significant size variation even among IDPs with a similar composition of positive and negative charges. Based on this, we provide a new diagram-of-states delineating the sequence-conformation relation for proteins in the DisProt database. Next, we study the effect of post-translational modification, e.g., phosphorylation, on IDP conformations. Modifications as little as two-site phosphorylation can significantly alter the size of an IDP with everything else being constant (temperature, salt concentration, etc.). However, not all possible modification sites have the same effect on protein conformations; there are certain "hot spots" that can cause maximal change in conformation. The location of these "hot spots" in the parent sequence can readily be identified by using a sequence charge decoration metric originally introduced by Sawle and Ghosh. The ability of our model to predict conformations (both expanded and collapsed states) of IDPs at

  17. High-Efficiency Promoter-Dependent Transduction by Adeno-Associated Virus Type 6 Vectors in Mouse Lung

    PubMed Central

    HALBERT, CHRISTINE L.; LAM, SIU-LING; MILLER, A. DUSTY

    2014-01-01

    The transduction efficiency of adeno-associated virus (AAV) vectors in various somatic tissues has been shown to depend heavily on the AAV type from which the vector capsid proteins are derived. Among the AAV types studied, AAV6 efficiently transduces cells of the airway epithelium, making it a good candidate for the treatment of lung diseases such as cystic fibrosis. Here we have evaluated the effects of various promoter sequences on transduction rates and gene expression levels in the lung. Of the strong viral promoters examined, the Rous sarcoma virus (RSV) promoter performed significantly better than a human cytomegalovirus (CMV) promoter in the airway epithelium. However, a hybrid promoter consisting of a CMV enhancer, β-actin promoter and splice donor, and a β-globin splice acceptor (CAG promoter) exhibited even higher expression than either of the strong viral promoters alone, showing a 38-fold increase in protein expression over the RSV promoter. In addition, we show that vectors containing either the RSV or CAG promoter expressed well in the nasal and tracheal epithelium. Transduction rates in the 90% range were achieved in many airways with the CAG promoter, showing that with the proper AAV capsid proteins and promoter sequences, highly efficient transduction can be achieved. PMID:17430088

  18. Confirmation of the "protein-traffic-hypothesis" and the "protein-localization-hypothesis" using the diabetes-mellitus-type-1-knock-in and transgenic-murine-models and the trepitope sequences.

    PubMed

    Arneth, Borros

    2012-10-01

    As possible mechanisms to explain the emergence of autoimmune diseases, the current author has suggested in earlier papers two new pathways: the "protein localization hypothesis" and the "protein traffic hypothesis". The "protein localization hypothesis" states that an autoimmune disease develops if a protein accumulates in a previously unoccupied compartment, that did not previously contain that protein. Similarly, the "protein traffic hypothesis" states that a sudden error within the transport of a certain protein leads to the emergence of an autoimmune disease. The current article discusses the usefulness of the different commercially available transgenic murine models of diabetes mellitus type 1 to confirm the aforementioned hypotheses. This discussion shows that several transgenic murine models of diabetes mellitus type 1 are in-line and confirm the aforementioned hypotheses. Furthermore, these hypotheses are additionally inline with the occurrence of several newly discovered protein sequences, the so-called trepitope sequences. These sequences modulate the immune response to certain proteins. The current study analyzed to what extent the hypotheses are supported by the occurrence of these new sequences. Thereby the occurrence of the trepitope sequences provides additional evidence supporting the aforementioned hypotheses. Both the "protein localization hypothesis" and the "protein traffic hypothesis" have the potential to lead to new causal therapy concepts. The "protein localization hypothesis" and the "protein traffic hypothesis" provide conceptional explanations for the diabetes mouse models as well as for the newly discovered trepitope sequences. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. The DynaMine webserver: predicting protein dynamics from sequence.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2014-07-01

    Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Adenovirus EIIA early promoter: transcriptional control elements and induction by the viral pre-early EIA gene, which appears to be sequence independent.

    PubMed Central

    Murthy, S C; Bhat, G P; Thimmappaya, B

    1985-01-01

    A molecular dissection of the adenovirus EIIA early (E) promoter was undertaken to study the sequence elements required for transcription and to examine the nucleotide sequences, if any, specific for its trans-activation by the viral pre-early EIA gene product. A chimeric gene in which the EIIA-E promoter region fused to the coding sequences of the bacterial chloramphenicol acetyltransferase (CAT) gene was used in transient assays to identify the transcriptional control regions. Deletion mapping studies revealed that the upstream DNA sequences up to -86 were sufficient for the optimal basal level transcription in HeLa cells and also for the EIA-induced transcription. A series of linker-scanning (LS) mutants were constructed to precisely identify the nucleotide sequences that control transcription. Analysis of these LS mutants allowed us to identify two regions of the promoter that are critical for the EIIA-E transcription. These regions are located between -29 and -21 (region I) and between -82 and -66 (region II). Mutations in region I affected initiation and appeared functionally similar to the "TATA" sequence of the commonly studied promoters. To examine whether or not the EIIA-E promoter contained DNA sequences specific for the trans-activation by the EIA, the LS mutants were analyzed in a cotransfection assay containing a plasmid carrying the EIA gene. CAT activity of all of the LS mutants was induced by the EIA gene in this assay, suggesting that the induction of transcription of the EIIA-E promoter by the EIA gene is not sequence-specific. Images PMID:3857577

  1. A statistical physics perspective on alignment-independent protein sequence comparison.

    PubMed

    Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R

    2015-08-01

    Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.

  2. Ligand-mediated protein degradation reveals functional conservation among sequence variants of the CUL4-type E3 ligase substrate receptor cereblon.

    PubMed

    Akuffo, Afua A; Alontaga, Aileen Y; Metcalf, Rainer; Beatty, Matthew S; Becker, Andreas; McDaniel, Jessica M; Hesterberg, Rebecca S; Goodheart, William E; Gunawan, Steven; Ayaz, Muhammad; Yang, Yan; Karim, Md Rezaul; Orobello, Morgan E; Daniel, Kenyon; Guida, Wayne; Yoder, Jeffrey A; Rajadhyaksha, Anjali M; Schönbrunn, Ernst; Lawrence, Harshani R; Lawrence, Nicholas J; Epling-Burnette, Pearlie K

    2018-04-20

    Upon binding to thalidomide and other immunomodulatory drugs, the E3 ligase substrate receptor cereblon (CRBN) promotes proteosomal destruction by engaging the DDB1-CUL4A-Roc1-RBX1 E3 ubiquitin ligase in human cells but not in mouse cells, suggesting that sequence variations in CRBN may cause its inactivation. Therapeutically, CRBN engagers have the potential for broad applications in cancer and immune therapy by specifically reducing protein expression through targeted ubiquitin-mediated degradation. To examine the effects of defined sequence changes on CRBN's activity, we performed a comprehensive study using complementary theoretical, biophysical, and biological assays aimed at understanding CRBN's nonprimate sequence variations. With a series of recombinant thalidomide-binding domain (TBD) proteins, we show that CRBN sequence variants retain their drug-binding properties to both classical immunomodulatory drugs and dBET1, a chemical compound and targeting ligand designed to degrade bromodomain-containing 4 (BRD4) via a CRBN-dependent mechanism. We further show that dBET1 stimulates CRBN's E3 ubiquitin-conjugating function and degrades BRD4 in both mouse and human cells. This insight paves the way for studies of CRBN-dependent proteasome-targeting molecules in nonprimate models and provides a new understanding of CRBN's substrate-recruiting function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  4. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  5. Toward rules relating zinc finger protein sequences and DNA binding site preferences.

    PubMed

    Desjarlais, J R; Berg, J M

    1992-08-15

    Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.

  6. Protein model discrimination using mutational sensitivity derived from deep sequencing.

    PubMed

    Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan

    2012-02-08

    A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

    PubMed

    Hsing, Michael; Cherkasov, Artem

    2008-06-25

    Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

  8. Mapping a nucleolar targeting sequence of an RNA binding nucleolar protein, Nop25

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fujiwara, Takashi; Suzuki, Shunji; Kanno, Motoko

    2006-06-10

    Nop25 is a putative RNA binding nucleolar protein associated with rRNA transcription. The present study was undertaken to determine the mechanism of Nop25 localization in the nucleolus. Deletion experiments of Nop25 amino acid sequence showed Nop25 to contain a nuclear targeting sequence in the N-terminal and a nucleolar targeting sequence in the C-terminal. By expressing derivative peptides from the C-terminal as GFP-fusion proteins in the cells, a lysine and arginine residue-enriched peptide (KRKHPRRAQDSTKKPPSATRTSKTQRRRR) allowed a GFP-fusion protein to be transported and fully retained in the nucleolus. When the peptide was fused with cMyc epitope and expressed in the cells, amore » cMyc epitope was then detected in the nucleolus. Nop25 did not localize in the nucleolus by deletion of the peptide from Nop25. Furthermore, deletion of a subdomain (KRKHPRRAQ) in the peptide or amino acid substitution of lysine and arginine residues in the subdomain resulted in the loss of Nop25 nucleolar localization. These results suggest that the lysine and arginine residue-enriched peptide is the most prominent nucleolar targeting sequence of Nop25 and that the long stretch of basic residues might play an important role in the nucleolar localization of Nop25. Although Nop25 contained putative SUMOylation, phosphorylation and glycosylation sites, the amino acid substitution in these sites had no effect on the nucleolar localization, thus suggesting that these post-translational modifications did not contribute to the localization of Nop25 in the nucleolus. The treatment of the cells, which expressed a GFP-fusion protein with a nucleolar targeting sequence of Nop25, with RNase A resulted in a complete dislocation of the protein from the nucleolus. These data suggested that the nucleolar targeting sequence might therefore play an important role in the binding of Nop25 to RNA molecules and that the RNA binding of Nop25 might be essential for the nucleolar localization of Nop25.« less

  9. Complete genome sequence of Serratia plymuthica strain AS12

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neupane, Saraswoti; Finlay, Roger D.; Alstrom, Sadhna

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  10. Trichomonas vaginalis ribosomal RNA: identification and characterisation of the transcription promoter and terminator sequences.

    PubMed

    Franco, Bernardo; Hernández, Roberto; López-Villaseñor, Imelda

    2012-09-01

    Trichomonas vaginalis is a parasitic protozoan of both medical and biological relevance. Transcriptional studies in this organism have focused mainly on type II pol promoters, whereas the elements necessary for transcription by polI or polIII have not been investigated. Here, with the aid of a transient transcription system, we characterised the rDNA intergenic region, defining both the promoter and the terminator sequences required for transcription. We defined the promoter as a compact region of approximately 180 bp. We also identified a potential upstream control element (UCE) that was located 80 bp upstream of the transcription start point (TSP). A transcription termination element was identified within a 34 bp region that was located immediately downstream of the 28S coding sequence. The function of this element depends upon polarity and the presence of both a stretch of uridine residues (U's) and a hairpin structure in the transcript. Our observations provide a strong basis for the study of DNA recognition by the polI transcriptional machinery in this early divergent organism. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Identification of a nuclear localization sequence in the polyomavirus capsid protein VP2

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1992-01-01

    A nuclear localization signal (NLS) has been identified in the C-terminal (Glu307-Glu-Asp-Gly-Pro-Gln-Lys-Lys-Lys-Arg-Arg-Leu318) amino acid sequence of the polyomavirus minor capsid protein VP2. The importance of this amino acid sequence for nuclear transport of newly synthesized VP2 was demonstrated by a genetic "subtractive" study using the constructs pSG5VP2 (expressing full-length VP2) and pSG5 delta 3VP2 (expressing truncated VP2, lacking amino acids Glu307-Leu318). These constructs were transfected into COS-7 cells, and the intracellular localization of the VP2 protein was determined by indirect immunofluorescence. These studies revealed that the full-length VP2 was localized in the nucleus, while the truncated VP2 protein was localized in the cytoplasm and not transported to the nucleus. A biochemical "additive" approach was also used to determine whether this sequence could target nonnuclear proteins to the nucleus. A synthetic peptide identical to VP2 amino acids Glu307-Leu318 was cross-linked to the nonnuclear proteins bovine serum albumin (BSA) or immunoglobulin G (IgG). The conjugates were then labeled with fluorescein isothiocyanate and microinjected into the cytoplasm of NIH 3T6 cells. Both conjugates localized in the nucleus of the microinjected cells, whereas unconjugated BSA and IgG remained in the cytoplasm. Taken together, these genetic subtractive and biochemical additive approaches have identified the C-terminal sequence of polyoma-virus VP2 (containing amino acids Glu307-Leu318) as the NLS of this protein.

  12. Computational Framework for Prediction of Peptide Sequences That May Mediate Multiple Protein Interactions in Cancer-Associated Hub Proteins.

    PubMed

    Sarkar, Debasree; Patra, Piya; Ghosh, Abhirupa; Saha, Sudipto

    2016-01-01

    A considerable proportion of protein-protein interactions (PPIs) in the cell are estimated to be mediated by very short peptide segments that approximately conform to specific sequence patterns known as linear motifs (LMs), often present in the disordered regions in the eukaryotic proteins. These peptides have been found to interact with low affinity and are able bind to multiple interactors, thus playing an important role in the PPI networks involving date hubs. In this work, PPI data and de novo motif identification based method (MEME) were used to identify such peptides in three cancer-associated hub proteins-MYC, APC and MDM2. The peptides corresponding to the significant LMs identified for each hub protein were aligned, the overlapping regions across these peptides being termed as overlapping linear peptides (OLPs). These OLPs were thus predicted to be responsible for multiple PPIs of the corresponding hub proteins and a scoring system was developed to rank them. We predicted six OLPs in MYC and five OLPs in MDM2 that scored higher than OLP predictions from randomly generated protein sets. Two OLP sequences from the C-terminal of MYC were predicted to bind with FBXW7, component of an E3 ubiquitin-protein ligase complex involved in proteasomal degradation of MYC. Similarly, we identified peptides in the C-terminal of MDM2 interacting with FKBP3, which has a specific role in auto-ubiquitinylation of MDM2. The peptide sequences predicted in MYC and MDM2 look promising for designing orthosteric inhibitors against possible disease-associated PPIs. Since these OLPs can interact with other proteins as well, these inhibitors should be specific to the targeted interactor to prevent undesired side-effects. This computational framework has been designed to predict and rank the peptide regions that may mediate multiple PPIs and can be applied to other disease-associated date hub proteins for prediction of novel therapeutic targets of small molecule PPI modulators.

  13. Surface Density of the Hendra G Protein Modulates Hendra F Protein-Promoted Membrane Fusion: Role for Hendra G Protein Trafficking and Degradation

    PubMed Central

    Whitman, Shannon D.; Dutch, Rebecca Ellis

    2007-01-01

    Hendra virus, like most paramyxoviruses, requires both a fusion (F) and attachment (G) protein for promotion of cell-cell fusion. Recent studies determined that Hendra F is proteolytically processed by the cellular protease cathepsin L after endocytosis. This unique cathepsin L processing results in a small percentage of Hendra F on the cell surface. To determine how the surface densities of the two Hendra glycoproteins affect fusion promotion, we performed experiments that varied the levels of glycoproteins expressed in transfected cells. Using two different fusion assays, we found a marked increase in fusion when expression of the Hendra G protein was increased, with a 1:1 molar ratio of Hendra F:G on the cell surface resulting in optimal membrane fusion. Our results also showed that Hendra G protein levels are modulated by both more rapid protein turnover and slower protein trafficking than is seen for Hendra F. PMID:17328935

  14. Amino acid sequence of the smaller basic protein from rat brain myelin

    PubMed Central

    Dunkley, Peter R.; Carnegie, Patrick R.

    1974-01-01

    1. The complete amino acid sequence of the smaller basic protein from rat brain myelin was determined. This protein differs from myelin basic proteins of other species in having a deletion of a polypeptide of 40 amino acid residues from the centre of the molecule. 2. A detailed comparison is made of the constant and variable regions in a group of myelin basic proteins from six species. 3. An arginine residue in the rat protein was found to be partially methylated. The ratio of methylated to unmethylated arginine at this position differed from that found for the human basic protein. 4. Three tryptic peptides were isolated in more than one form. The differences between the two forms of each peptide are discussed in relation to the electrophoretic heterogeneity of myelin basic proteins, which is known to occur at alkaline pH values. 5. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50029 at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1973) 131, 5. PMID:4141893

  15. Translational repression by an RNA-binding protein promotes differentiation to infective forms in Trypanosoma cruzi.

    PubMed

    Romaniuk, Maria Albertina; Frasch, Alberto Carlos; Cassola, Alejandro

    2018-06-01

    Trypanosomes, protozoan parasites of medical importance, essentially rely on post-transcriptional mechanisms to regulate gene expression in insect vectors and vertebrate hosts. RNA binding proteins (RBPs) that associate to the 3'-UTR of mature mRNAs are thought to orchestrate master developmental programs for these processes to happen. Yet, the molecular mechanisms by which differentiation occurs remain largely unexplored in these human pathogens. Here, we show that ectopic inducible expression of the RBP TcUBP1 promotes the beginning of the differentiation process from non-infective epimastigotes to infective metacyclic trypomastigotes in Trypanosoma cruzi. In early-log epimastigotes TcUBP1 promoted a drop-like phenotype, which is characterized by the presence of metacyclogenesis hallmarks, namely repositioning of the kinetoplast, the expression of an infective-stage virulence factor such as trans-sialidase, increased resistance to lysis by human complement and growth arrest. Furthermore, TcUBP1-ectopic expression in non-infective late-log epimastigotes promoted full development into metacyclic trypomastigotes. TcUBP1-derived metacyclic trypomastigotes were infective in cultured cells, and developed normally into amastigotes in the cytoplasm. By artificial in vivo tethering of TcUBP1 to the 3' untranslated region of a reporter mRNA we were able to determine that translation of the reporter was reduced by 8-fold, while its mRNA abundance was not significantly compromised. Inducible ectopic expression of TcUBP1 confirmed its role as a translational repressor, revealing significant reduction in the translation rate of multiple proteins, a reduction of polysomes, and promoting the formation of mRNA granules. Expression of TcUBP1 truncated forms revealed the requirement of both N and C-terminal glutamine-rich low complexity sequences for the development of the drop-like phenotype in early-log epimastigotes. We propose that a rise in TcUBP1 levels, in synchrony with

  16. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  17. Centriolar satellites assemble centrosomal microcephaly proteins to recruit CDK2 and promote centriole duplication

    PubMed Central

    Kodani, Andrew; Yu, Timothy W; Johnson, Jeffrey R; Jayaraman, Divya; Johnson, Tasha L; Al-Gazali, Lihadh; Sztriha, Lāszló; Partlow, Jennifer N; Kim, Hanjun; Krup, Alexis L; Dammermann, Alexander; Krogan, Nevan J; Walsh, Christopher A; Reiter, Jeremy F

    2015-01-01

    Primary microcephaly (MCPH) associated proteins CDK5RAP2, CEP152, WDR62 and CEP63 colocalize at the centrosome. We found that they interact to promote centriole duplication and form a hierarchy in which each is required to localize another to the centrosome, with CDK5RAP2 at the apex, and CEP152, WDR62 and CEP63 at sequentially lower positions. MCPH proteins interact with distinct centriolar satellite proteins; CDK5RAP2 interacts with SPAG5 and CEP72, CEP152 with CEP131, WDR62 with MOONRAKER, and CEP63 with CEP90 and CCDC14. These satellite proteins localize their cognate MCPH interactors to centrosomes and also promote centriole duplication. Consistent with a role for satellites in microcephaly, homozygous mutations in one satellite gene, CEP90, may cause MCPH. The satellite proteins, with the exception of CCDC14, and MCPH proteins promote centriole duplication by recruiting CDK2 to the centrosome. Thus, centriolar satellites build a MCPH complex critical for human neurodevelopment that promotes CDK2 centrosomal localization and centriole duplication. DOI: http://dx.doi.org/10.7554/eLife.07519.001 PMID:26297806

  18. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context

    PubMed Central

    Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing

    2016-01-01

    Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community. PMID:27282833

  19. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

    PubMed

    Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

    2016-12-01

    Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.

  20. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, J.; Peters, M.; Lottspeich, F.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less

  1. Predicting protein-protein interactions by combing various sequence- derived features into the general form of Chou's Pseudo amino acid composition.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2012-05-01

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  2. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    PubMed

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential

  3. Prefoldin Promotes Proteasomal Degradation of Cytosolic Proteins with Missense Mutations by Maintaining Substrate Solubility

    PubMed Central

    Young, Barry P.; Loewen, Christopher J.; Mayor, Thibault

    2016-01-01

    Misfolded proteins challenge the ability of cells to maintain protein homeostasis and can accumulate into toxic protein aggregates. As a consequence, cells have adopted a number of protein quality control pathways to prevent protein aggregation, promote protein folding, and target terminally misfolded proteins for degradation. In this study, we employed a thermosensitive allele of the yeast Guk1 guanylate kinase as a model misfolded protein to investigate degradative protein quality control pathways. We performed a flow cytometry based screen to identify factors that promote proteasomal degradation of proteins misfolded as the result of missense mutations. In addition to the E3 ubiquitin ligase Ubr1, we identified the prefoldin chaperone subunit Gim3 as an important quality control factor. Whereas the absence of GIM3 did not impair proteasomal function or the ubiquitination of the model substrate, it led to the accumulation of the poorly soluble model substrate in cellular inclusions that was accompanied by delayed degradation. We found that Gim3 interacted with the Guk1 mutant allele and propose that prefoldin promotes the degradation of the unstable model substrate by maintaining the solubility of the misfolded protein. We also demonstrated that in addition to the Guk1 mutant, prefoldin can stabilize other misfolded cytosolic proteins containing missense mutations. PMID:27448207

  4. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    PubMed

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  5. Domain fusion analysis by applying relational algebra to protein sequence and domain databases

    PubMed Central

    Truong, Kevin; Ikura, Mitsuhiko

    2003-01-01

    Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020

  6. [Molecular cloning and characterization of a novel Clonorchis sinensis antigenic protein containing tandem repeat sequences].

    PubMed

    Liu, Qian; Xu, Xue-Nian; Zhou, Yan; Cheng, Na; Dong, Yu-Ting; Zheng, Hua-Jun; Zhu, Yong-Qiang; Zhu, Yong-Qiang

    2013-08-01

    To find and clone new antigen genes from the lambda-ZAP cDNA expression library of adult Clonorchis sinensis, and determine the immunological characteristics of the recombinant proteins. The cDNA expression library of adult C. sinensis was screened by pooled sera of clonorchiasis patients. The sequences of the positive phage clones were compared with the sequences in EST database, and the full-length sequence of the gene (Cs22 gene) was obtained by RT-PCR. cDNA fragments containing 2 and 3 times tandem repeat sequences were generated by jumping PCR. The sequence encoding the mature peptide or the tandem repeat sequence was respectively cloned into the prokaryotic expression vector pET28a (+), and then transformed into E. coli Rosetta DE3 cells for expression. The recombinant proteins (rCs22-2r, rCs22-3r, rCs22M-2r, and rCs22M-3r) were purified by His-bind-resin (Ni-NTA) affinity chromatography. The immunogenicity of rCs22-2r and rCs22-3r was identified by ELISA. To evaluate the immunological diagnostic value of rCs22-2r and rCs22-3r, serum samples from 35 clonorchiasis patients, 31 healthy individuals, 15 schistosomiasis patients, 15 paragonimiasis westermani patients and 13 cysticercosis patients were examined by ELISA. To locate antigenic determinants, the pooled sera of clonorchiasis patients and healthy persons were analyzed for specific antibodies by ELISA with recombinant protein rCs22M-2r and rCs22M-3r containing the tandem repeat sequences. The full-length sequence of Cs22 antigen gene of C. sinensis was obtained. It contained 13 times tandem repeat sequences of EQQDGDEEGMGGDGGRGKEKGKVEGEDGAGEQKEQA. Bioinformatics analysis indicated that the protein (Cs22) belonged to GPI-anchored proteins family. The recombinant proteins rCs22-2r and rCs22-3r showed a certain level of immunogenicity. The positive rate by ELISA coated with the purified PrCs22-2r and PrCs22-3r for sera of clonorchiasis patients both were 45.7% (16/35), and 3.2% (1/31) for those of healthy

  7. The Murine Factor H-Related Protein FHR-B Promotes Complement Activation.

    PubMed

    Cserhalmi, Marcell; Csincsi, Ádám I; Mezei, Zoltán; Kopp, Anne; Hebecker, Mario; Uzonyi, Barbara; Józsi, Mihály

    2017-01-01

    Factor H-related (FHR) proteins consist of varying number of complement control protein domains that display various degrees of sequence identity to respective domains of the alternative pathway complement inhibitor factor H (FH). While such FHR proteins are described in several species, only human FHRs were functionally investigated. Their biological role is still poorly understood and in part controversial. Recent studies on some of the human FHRs strongly suggest a role for FHRs in enhancing complement activation via competing with FH for binding to certain ligands and surfaces. The aim of the current study was the functional characterization of a murine FHR, FHR-B. To this end, FHR-B was expressed in recombinant form. Recombinant FHR-B bound to human C3b and was able to compete with human FH for C3b binding. FHR-B supported the assembly of functionally active C3bBb alternative pathway C3 convertase via its interaction with C3b. This activity was confirmed by demonstrating C3 activation in murine serum. In addition, FHR-B bound to murine pentraxin 3 (PTX3), and this interaction resulted in murine C3 fragment deposition due to enhanced complement activation in mouse serum. FHR-B also induced C3 deposition on C-reactive protein, the extracellular matrix (ECM) extract Matrigel, and endothelial cell-derived ECM when exposed to mouse serum. Moreover, mouse C3 deposition was strongly enhanced on necrotic Jurkat T cells and the mouse B cell line A20 by FHR-B. FHR-B also induced lysis of sheep erythrocytes when incubated in mouse serum with FHR-B added in excess. Altogether, these data demonstrate that, similar to human FHR-1 and FHR-5, mouse FHR-B modulates complement activity by promoting complement activation via interaction with C3b and via competition with murine FH.

  8. A Sequence-Dependent DNA Condensation Induced by Prion Protein

    PubMed Central

    2018-01-01

    Different studies indicated that the prion protein induces hybridization of complementary DNA strands. Cell culture studies showed that the scrapie isoform of prion protein remained bound with the chromosome. In present work, we used an oxazole dye, YOYO, as a reporter to quantitative characterization of the DNA condensation by prion protein. We observe that the prion protein induces greater fluorescence quenching of YOYO intercalated in DNA containing only GC bases compared to the DNA containing four bases whereas the effect of dye bound to DNA containing only AT bases is marginal. DNA-condensing biological polyamines are less effective than prion protein in quenching of DNA-bound YOYO fluorescence. The prion protein induces marginal quenching of fluorescence of the dye bound to oligonucleotides, which are resistant to condensation. The ultrastructural studies with electron microscope also validate the biophysical data. The GC bases of the target DNA are probably responsible for increased condensation in the presence of prion protein. To our knowledge, this is the first report of a human cellular protein inducing a sequence-dependent DNA condensation. The increased condensation of GC-rich DNA by prion protein may suggest a biological function of the prion protein and a role in its pathogenesis. PMID:29657864

  9. A Sequence-Dependent DNA Condensation Induced by Prion Protein.

    PubMed

    Bera, Alakesh; Biring, Sajal

    2018-01-01

    Different studies indicated that the prion protein induces hybridization of complementary DNA strands. Cell culture studies showed that the scrapie isoform of prion protein remained bound with the chromosome. In present work, we used an oxazole dye, YOYO, as a reporter to quantitative characterization of the DNA condensation by prion protein. We observe that the prion protein induces greater fluorescence quenching of YOYO intercalated in DNA containing only GC bases compared to the DNA containing four bases whereas the effect of dye bound to DNA containing only AT bases is marginal. DNA-condensing biological polyamines are less effective than prion protein in quenching of DNA-bound YOYO fluorescence. The prion protein induces marginal quenching of fluorescence of the dye bound to oligonucleotides, which are resistant to condensation. The ultrastructural studies with electron microscope also validate the biophysical data. The GC bases of the target DNA are probably responsible for increased condensation in the presence of prion protein. To our knowledge, this is the first report of a human cellular protein inducing a sequence-dependent DNA condensation. The increased condensation of GC-rich DNA by prion protein may suggest a biological function of the prion protein and a role in its pathogenesis.

  10. Characterization of the rat RALDH1 promoter. A functional CCAAT and octamer motif are critical for basal promoter activity.

    PubMed

    Guimond, Julie; Devost, Dominic; Brodeur, Helene; Mader, Sylvie; Bhat, Pangala V

    2002-12-12

    Retinal dehydrogenase type 1 (RALDH1) catalyzes the oxidation of retinal to retinoic acid (RA), a metabolite of vitamin A important for embryogenesis and tissue differentiation. Rat RALDH1 is expressed to high levels in developing kidney, and in stomach, intestine epithelia. To understand the mechanisms of the transcriptional regulation of rat RALDH1, we cloned a 1360-base pair (bp) 5'-flanking region of RALDH1 gene. Using luciferase reporter constructs transfected into HEK 293 and LLCPK (kidney-derived) cells, basal promoter activity was associated with sequences between -80 and +43. In this minimal promoter region, TATA and CCAAT cis-acting elements as well as SP1, AP1 and octamer (Oct)-binding sites were present. The CCAAT box and Oct-binding site, located between positions -72 and -68 and -56 and -49, respectively, were shown by deletion analysis and site-directed mutation to be critical for promoter activity. Nuclear extracts from kidney cells contain proteins specifically binding the Oct and CCAAT sequences, resulting in the formation of six complexes, while different patterns of complexes were observed with non-kidney cell extracts. Gel shift assays using either single or double mutations of the Oct and CCAAT sequences as well as super shift assays demonstrated single and double occupancy of these two sites by Oct-1 and CBF-A. In addition, unidentified proteins also bound the Oct motif specifically in the absence of CBF-A binding. These results demonstrate specific involvement of Oct and CCAAT-binding proteins in the regulation of RALDH1 gene.

  11. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints

    PubMed Central

    Chan, Yvonne H.; Venev, Sergey V.; Zeldovich, Konstantin B.; Matthews, C. Robert

    2017-01-01

    Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. PMID:28262665

  12. Sequence-Based Prediction of RNA-Binding Residues in Proteins

    PubMed Central

    Walia, Rasna R.; EL-Manzalawy, Yasser; Honavar, Vasant G.; Dobbs, Drena

    2017-01-01

    Identifying individual residues in the interfaces of protein–RNA complexes is important for understanding the molecular determinants of protein–RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein–RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein–RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner. PMID:27787829

  13. The hypervariable region 1 protein of hepatitis C virus broadly reactive with sera of patients with chronic hepatitis C has a similar amino acid sequence with the consensus sequence.

    PubMed

    Watanabe, K; Yoshioka, K; Ito, H; Ishigami, M; Takagi, K; Utsunomiya, S; Kobayashi, M; Kishimoto, H; Yano, M; Kakumu, S

    1999-11-10

    Hypervariable region 1 (HVR1) proteins of hepatitis C virus (HCV) have been reported to react broadly with sera of patients with HCV infection. However, the variability of the broad reactivity of individual HVR1 proteins has not been elucidated. We assessed the reactivity of 25 different HVR1 proteins (genotype 1b) with sera of 81 patients with HCV infection (genotype 1b) by Western blot. HVR1 proteins reacted with 2-60 sera. The number of sera reactive with each HVR1 protein significantly correlated with the number of amino acid residues identical to the consensus sequence defined by Puntoriero et al. (G. Puntoriero, A. Lahm, S. Zucchelli, B. B. Ercole, R. Tafi, M. Penzzanera, M. U. Mondelli, R. Cortese, A. Tramontano, G. Galfre', and A. Nicosia. 1998. EMBO J. 17, 3521-3533. ) (r = 0.561, P < 0.005). The most widely reactive HVR1 protein, 12-22, had a sequence similar to the consensus sequence. The peptide with C-terminal 13-amino-acids sequence of HVR1 protein 12-22 (NH2-CSFTSLFTPGPSQK) was injected into rabbits as an immunogen. The rabbit immune sera reacted with 9 of 25 HVR1 proteins of genotype 1b including HVR1 protein 12-22 and with 3 of 12 proteins of genotype 2a. These results indicate that the HVR1 protein broadly reactive with patients' sera has a sequence similar to the consensus sequence, can induce broadly reactive sera, and could be one of the candidate immunogens in a prophylactic vaccine against HCV. Copyright 1999 Academic Press.

  14. Sequence characterization of S100A8 gene reveals structural differences of protein and transcriptional factor binding sites in water buffalo and yak.

    PubMed

    Kathiravan, P; Goyal, S; Kataria, R S; Mishra, B P; Jayakumar, S; Joshi, B K

    2011-01-01

    The present study was undertaken to characterize the structure of S100A8 gene and its promoter in water buffalo and yak. Sequence data of 2.067 kb, 2.071 kb, and 2.052 kb with respect to complete S100A8 gene including 5' flanking region was generated in river buffalo, swamp buffalo, and yak, respectively. BLAST analysis of coding DNA sequences (CDS) of S100A8 gene revealed 95% homology of buffalo sequence with cattle, 85% with pig and horse, 83% with dog, 72-73% with murines, and around 79% with primates and humans. Phylogenetic analysis of predicted CDS revealed distinct clustering of murines, primates, and domestic animals with bovines and bubalines forming a subcluster among farm animals. In silico translation of predicted CDS revealed a sequence of 89 amino acids with 7 amino acid changes between cattle and buffalo and 2 changes between cattle and yak. The search for Pfam family revealed the N-terminal calcium binding domain and the noncanonical EF hand domain in the carboxy terminus, with more variations being observed in the N-terminal domain among different species. Two amino acid changes observed in carboxy terminal EF hand domain resulted in altered secondary structure of yak S100A8 protein. Analysis of S100A8 gene promoter revealed 14 putative motifs for transcriptional factor binding sites. Two putative motifs viz. C/EBP and v-Myb were found to be absent in swamp buffalo as compared to river buffalo and cattle. Differences in the structure of S100A8 protein and the transcriptional factor binding sites identified in the present study need to be analyzed further for their functional significance in yak and swamp buffalo respectively. Copyright © Taylor & Francis Group, LLC

  15. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

    PubMed

    Neuwald, Andrew F

    2009-08-01

    The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.

  16. Role of promoter DNA sequence variations on the binding of EGR1 transcription factor.

    PubMed

    Mikles, David C; Schuchardt, Brett J; Bhat, Vikas; McDonald, Caleb B; Farooq, Amjad

    2014-05-01

    In response to a wide variety of stimuli such as growth factors and hormones, EGR1 transcription factor is rapidly induced and immediately exerts downstream effects central to the maintenance of cellular homeostasis. Herein, our biophysical analysis reveals that DNA sequence variations within the target gene promoters tightly modulate the energetics of binding of EGR1 and that nucleotide substitutions at certain positions are much more detrimental to EGR1-DNA interaction than others. Importantly, the reduction in binding affinity poorly correlates with the loss of enthalpy and gain of entropy-a trend indicative of a complex interplay between underlying thermodynamic factors due to the differential role of water solvent upon nucleotide substitution. We also provide a rationale for the physical basis of the effect of nucleotide substitutions on the EGR1-DNA interaction at atomic level. Taken together, our study bears important implications on understanding the molecular determinants of a key protein-DNA interaction at the cross-roads of human health and disease. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. Draft Genome Sequence of Acinetobacter calcoaceticus Strain P23, a Plant Growth-Promoting Bacterium of Duckweed

    PubMed Central

    Hosoyama, Akira; Yamazoe, Atsushi; Morikawa, Masaaki

    2015-01-01

    Acinetobacter calcoaceticus strain P23 is a plant growth-promoting bacterium, which was isolated from the surface of duckweed. We report here the draft genome sequence of strain P23. The genome data will serve as a valuable reference for understanding the molecular mechanism of plant growth promotion in aquatic plants. PMID:25720680

  18. Transcriptional regulation of human eosinophil RNases by an evolutionary- conserved sequence motif in primate genome

    PubMed Central

    Wang, Hsiu-Yu; Chang, Hao-Teng; Pai, Tun-Wen; Wu, Chung-I; Lee, Yuan-Hung; Chang, Yen-Hsin; Tai, Hsiu-Ling; Tang, Chuan-Yi; Chou, Wei-Yao; Chang, Margaret Dah-Tsyr

    2007-01-01

    Background Human eosinophil-derived neurotoxin (edn) and eosinophil cationic protein (ecp) are members of a subfamily of primate ribonuclease (rnase) genes. Although they are generated by gene duplication event, distinct edn and ecp expression profile in various tissues have been reported. Results In this study, we obtained the upstream promoter sequences of several representative primate eosinophil rnases. Bioinformatic analysis revealed the presence of a shared 34-nucleotide (nt) sequence stretch located at -81 to -48 in all edn promoters and macaque ecp promoter. Such a unique sequence motif constituted a region essential for transactivation of human edn in hepatocellular carcinoma cells. Gel electrophoretic mobility shift assay, transient transfection and scanning mutagenesis experiments allowed us to identify binding sites for two transcription factors, Myc-associated zinc finger protein (MAZ) and SV-40 protein-1 (Sp1), within the 34-nt segment. Subsequent in vitro and in vivo binding assays demonstrated a direct molecular interaction between this 34-nt region and MAZ and Sp1. Interestingly, overexpression of MAZ and Sp1 respectively repressed and enhanced edn promoter activity. The regulatory transactivation motif was mapped to the evolutionarily conserved -74/-65 region of the edn promoter, which was guanidine-rich and critical for recognition by both transcription factors. Conclusion Our results provide the first direct evidence that MAZ and Sp1 play important roles on the transcriptional activation of the human edn promoter through specific binding to a 34-nt segment present in representative primate eosinophil rnase promoters. PMID:17927842

  19. Analysis of sequencing data for probing RNA secondary structures and protein-RNA binding in studying posttranscriptional regulations.

    PubMed

    Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y

    2016-11-01

    High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  20. Sequence walkers: a graphical method to display how binding proteins interact with DNA or RNA sequences | Center for Cancer Research

    Cancer.gov

    A graphical method is presented for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of

  1. Understanding the molecular basis of plant growth promotional effect of Pseudomonas fluorescens on rice through protein profiling.

    PubMed

    Kandasamy, Saveetha; Loganathan, Karthiba; Muthuraj, Raveendran; Duraisamy, Saravanakumar; Seetharaman, Suresh; Thiruvengadam, Raguchander; Ponnusamy, Balasubramanian; Ramasamy, Samiyappan

    2009-12-24

    Plant Growth Promoting Rhizobacteria (PGPR), Pseudomonas fluorescens strain KH-1 was found to exhibit plant growth promotional activity in rice under both in-vitro and in-vivo conditions. But the mechanism underlying such promotional activity of P. fluorescens is not yet understood clearly. In this study, efforts were made to elucidate the molecular responses of rice plants to P. fluorescens treatment through protein profiling. Two-dimensional polyacrylamide gel electrophoresis strategy was adopted to identify the PGPR responsive proteins and the differentially expressed proteins were analyzed by mass spectrometry. Priming of P. fluorescens, 23 different proteins found to be differentially expressed in rice leaf sheaths and MS analysis revealed the differential expression of some important proteins namely putative p23 co-chaperone, Thioredoxin h- rice, Ribulose-bisphosphate carboxylase large chain precursor, Nucleotide diPhosphate kinase, Proteosome sub unit protein and putative glutathione S-transferase protein. Functional analyses of the differential proteins were reported to be directly or indirectly involved in growth promotion in plants. Thus, this study confirms the primary role of PGPR strain KH-1 in rice plant growth promotion.

  2. Theta oscillations promote temporal sequence learning.

    PubMed

    Crivelli-Decker, Jordan; Hsieh, Liang-Tien; Clarke, Alex; Ranganath, Charan

    2018-05-17

    Many theoretical models suggest that neural oscillations play a role in learning or retrieval of temporal sequences, but the extent to which oscillations support sequence representation remains unclear. To address this question, we used scalp electroencephalography (EEG) to examine oscillatory activity over learning of different object sequences. Participants made semantic decisions on each object as they were presented in a continuous stream. For three "Consistent" sequences, the order of the objects was always fixed. Activity during Consistent sequences was compared to "Random" sequences that consisted of the same objects presented in a different order on each repetition. Over the course of learning, participants made faster semantic decisions to objects in Consistent, as compared to objects in Random sequences. Thus, participants were able to use sequence knowledge to predict upcoming items in Consistent sequences. EEG analyses revealed decreased oscillatory power in the theta (4-7 Hz) band at frontal sites following decisions about objects in Consistent sequences, as compared with objects in Random sequences. The theta power difference between Consistent and Random only emerged in the second half of the task, as participants were more effectively able to predict items in Consistent sequences. Moreover, we found increases in parieto-occipital alpha (10-13 Hz) and beta (14-28 Hz) power during the pre-response period for objects in Consistent sequences, relative to objects in Random sequences. Linear mixed effects modeling revealed that single trial theta oscillations were related to reaction time for future objects in a sequence, whereas beta and alpha oscillations were only predictive of reaction time on the current trial. These results indicate that theta and alpha/beta activity preferentially relate to future and current events, respectively. More generally our findings highlight the importance of band-specific neural oscillations in the learning of

  3. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    PubMed

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  4. Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

    PubMed

    Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

    2017-02-01

    An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Promoting protein crystallization using a plate with simple geometry.

    PubMed

    Chen, Rui-Qing; Yin, Da-Chuan; Liu, Yong-Ming; Lu, Qin-Qin; He, Jin; Liu, Yue

    2014-03-01

    Increasing the probability of obtaining protein crystals in crystallization screening is always an important goal for protein crystallography. In this paper, a new method called the cross-diffusion microbatch (CDM) method is presented, which aims to efficiently promote protein crystallization and increase the chance of obtaining protein crystals. In this method, a very simple crystallization plate was designed in which all crystallization droplets are in one sealed space, so that a variety of volatile components from one droplet can diffuse into any other droplet via vapour diffusion. Crystallization screening and reproducibility tests indicate that this method could be a potentially powerful technique in practical protein crystallization screening. It can help to obtain crystals with higher probability and at a lower cost, while using a simple and easy procedure.

  6. Whole Wiskott‑Aldrich syndrome protein gene deletion identified by high throughput sequencing.

    PubMed

    He, Xiangling; Zou, Runying; Zhang, Bing; You, Yalan; Yang, Yang; Tian, Xin

    2017-11-01

    Wiskott‑Aldrich syndrome (WAS) is a rare X‑linked recessive immunodeficiency disorder, characterized by thrombocytopenia, small platelets, eczema and recurrent infections associated with increased risk of autoimmunity and malignancy disorders. Mutations in the WAS protein (WASP) gene are responsible for WAS. To date, WASP mutations, including missense/nonsense, splicing, small deletions, small insertions, gross deletions, and gross insertions have been identified in patients with WAS. In addition, WASP‑interacting proteins are suspected in patients with clinical features of WAS, in whom the WASP gene sequence and mRNA levels are normal. The present study aimed to investigate the application of next generation sequencing in definitive diagnosis and clinical therapy for WAS. A 5 month‑old child with WAS who displayed symptoms of thrombocytopenia was examined. Whole exome sequence analysis of genomic DNA showed that the coverage and depth of WASP were extremely low. Quantitative polymerase chain reaction indicated total WASP gene deletion in the proband. In conclusion, high throughput sequencing is useful for the verification of WAS on the genetic profile, and has implications for family planning guidance and establishment of clinical programs.

  7. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology.

    PubMed

    Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2014-09-07

    Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. A purified transcription factor (TIF-IB) binds to essential sequences of the mouse rDNA promoter.

    PubMed Central

    Clos, J; Buttgereit, D; Grummt, I

    1986-01-01

    A transcription factor that is specific for mouse rDNA has been partially purified from Ehrlich ascites cells. This factor [designated transcription initiation factor (TIF)-IB] is required for accurate in vitro synthesis of mouse rRNA in addition to RNA polymerase I and another regulatory factor, TIF-IA. TIF-IB activity is present in extracts both from growing and nongrowing cells in comparable amounts. Prebinding competition experiments with wild-type and mutant templates suggest that TIF-IB interacts with the core control element of the rDNA promoter, which is located immediately upstream of the initiation site. The specific binding of TIF-IB to the RNA polymerase I promoter is demonstrated by exonuclease III protection experiments. The 3' border of the sequences protected by TIF-IB is shown to be on the coding strand at position -21 and on the noncoding strand at position -7. The results suggest that direct binding of TIF-IB to sequences in the core promoter element is the mechanism by which this factor imparts promoter selectivity to RNA polymerase I. Images PMID:3456157

  9. Comprehensive analysis of RNA-protein interactions by high-throughput sequencing-RNA affinity profiling.

    PubMed

    Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T

    2014-06-01

    RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.

  10. Evol and ProDy for bridging protein sequence evolution and structural dynamics

    PubMed Central

    Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R.; Bahar, Ivet

    2014-01-01

    Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. Availability and implementation: ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. Contact: bahar@pitt.edu PMID:24849577

  11. An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics.

    PubMed

    Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao

    2015-09-09

    Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew's correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental Int. J. Mol. Sci. 2015, 16,21735 scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established.

  12. Porcine parvovirus: DNA sequence and genome organization.

    PubMed

    Ranz, A I; Manclús, J J; Díaz-Aroca, E; Casal, J I

    1989-10-01

    We have determined the nucleotide sequence of an almost full-length clone of porcine parvovirus (PPV). The sequence is 4973 nucleotides (nt) long. The 3' end of virion DNA shows a Y-shaped configuration homologous to rodent parvoviruses. The 5' end of virion DNA shows a repetition of 127 nt at the carboxy terminus of the capsid proteins. The overall organization of the PPV genome is similar to those of other autonomous parvoviruses. There are two large open reading frames (ORFs) that almost entirely cover the genome, both located in the same frame of the complementary strand. The left ORF encodes the non-structural protein NS1 and the right ORF encodes the capsid proteins (VP1, VP2 and VP3). Promoter analysis, location of splicing sites and putative amino acid sequences for the viral proteins show a high homology of PPV with feline panleukopenia virus and canine parvoviruses (FPV and CPV) and rodent parvovirus. Therefore we conclude that PPV is related to the Kilham rat virus (KRV) group of autonomous parvoviruses formed by KRV, minute virus of mice, Lu III, H-1, FPV and CPV.

  13. Evolution of Drosophila ribosomal protein gene core promoters.

    PubMed

    Ma, Xiaotu; Zhang, Kangyu; Li, Xiaoman

    2009-03-01

    The coordinated expression of ribosomal protein genes (RPGs) has been well documented in many species. Previous analyses of RPG promoters focus only on Fungi and mammals. Recognizing this gap and using a comparative genomics approach, we utilize a motif-finding algorithm that incorporates cross-species conservation to identify several significant motifs in Drosophila RPG promoters. As a result, significant differences of the enriched motifs in RPG promoter are found among Drosophila, Fungi, and mammals, demonstrating the evolutionary dynamics of the ribosomal gene regulatory network. We also report a motif present in similar numbers of RPGs among Drosophila species which does not appear to be conserved at the individual RPG gene level. A module-wise stabilizing selection theory is proposed to explain this observation. Overall, our results provide significant insight into the fast-evolving nature of transcriptional regulation in the RPG module.

  14. Evolution of Drosophila ribosomal protein gene core promoters

    PubMed Central

    Ma, Xiaotu; Zhang, Kangyu; Li, Xiaoman

    2011-01-01

    The coordinated expression of ribosomal protein genes (RPGs) has been well documented in many species. Previous analyses of RPG promoters focus only on Fungi and mammals. Recognizing this gap and using a comparative genomics approach, we utilize a motif-finding algorithm that incorporates cross-species conservation to identify several significant motifs in Drosophila RPG promoters. As a result, significant differences of the enriched motifs in RPG promoter are found among Drosophila, Fungi, and mammals, demonstrating the evolutionary dynamics of the ribosomal gene regulatory network. We also report a motif present in similar numbers of RPGs among Drosophila species which does not appear to be conserved at the individual RPG gene level. A module-wise stabilizing selection theory is proposed to explain this observation. Overall, our results provide significant insight into the fast-evolving nature of transcriptional regulation in the RPG module. PMID:19059316

  15. Strategies for Protein Overproduction in Escherichia coli.

    ERIC Educational Resources Information Center

    Mott, John E.

    1984-01-01

    Examines heterologous expression in Escherichia coli and the role of regulatory sequences which control gene expression at transcription resulting in abundant production of messenger RNA and regulatory sequences in mRNA which promote efficient translation. Also examines the role of E. coli cells in stabilizing mRNA and protein that is…

  16. RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

    PubMed

    Brule, C E; Dean, K M; Grayhack, E J

    2016-01-01

    The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.

  17. RNA regulators responding to ribosomal protein S15 are frequent in sequence space

    PubMed Central

    Slinger, Betty L.; Meyer, Michelle M.

    2016-01-01

    There are several natural examples of distinct RNA structures that interact with the same ligand to regulate the expression of homologous genes in different organisms. One essential question regarding this phenomenon is whether such RNA regulators are the result of convergent or divergent evolution. Are the RNAs derived from some common ancestor and diverged to the point where we cannot identify the similarity, or have multiple solutions to the same biological problem arisen independently? A key variable in assessing these alternatives is how frequently such regulators arise within sequence space. Ribosomal protein S15 is autogenously regulated via an RNA regulator in many bacterial species; four apparently distinct regulators have been functionally validated in different bacterial phyla. Here, we explore how frequently such regulators arise within a partially randomized sequence population. We find many RNAs that interact specifically with ribosomal protein S15 from Geobacillus kaustophilus with biologically relevant dissociation constants. Furthermore, of the six sequences we characterize, four show regulatory activity in an Escherichia coli reporter assay. Subsequent footprinting and mutagenesis analysis indicates that protein binding proximal to regulatory features such as the Shine–Dalgarno sequence is sufficient to enable regulation, suggesting that regulation in response to S15 is relatively easily acquired. PMID:27580716

  18. Poliovirus replication proteins: RNA sequence encoding P3-1b and the sites of proteolytic processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Semler, B.L.; Anderson, C.W.; Kitamura, N.

    1981-06-01

    A partial amino-terminal amino acid sequence of each of the major proteins encoded by the replicase region of the poliovirus genome has been determined. A comparison of this sequence information with the amino acid sequence predicted from the RNA sequence that has been determined for the 3' region of the poliovirus genome has allowed us to locate precisely the proteolytic cleavage sites at which the initial polyprotein is processed to create the poliovirus products P3-1b (NCVP1b), P3-2 (NCVP2), P3-4b (NCVP4b), and P3-7c (NCVP7c). For each of these products, as well as for the small genome-linked protein VPg, proteolytic cleavage occursmore » between a glutamine and a glycine residue to create the amino terminus of each protein. This result suggests that a single proteinase may be responsible for all of these cleavages. The sequence data also allow the precise positioning of the genome-linked protein VPg within the precursor P3-1b just proximal to the amino terminus of polypeptide P3-2.« less

  19. Nucleotide sequence of the L1 ribosomal protein gene of Xenopus laevis: remarkable sequence homology among introns.

    PubMed Central

    Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F

    1985-01-01

    Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512

  20. Isolation and characterization of adrenoleukodystrophy protein (ALDP) related sequences in the human genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Geraghty, M.T.; Stetten, G.; Kearns, W.

    1994-09-01

    X-linked adrenoleukodystrophy (ALD) is a disorder of peroxisomal {beta}-oxidation of very long chain fatty acids. It presents either as progressive dementia in childhood or as progressive paraparesis in later years. Adrenal insufficiency occurs in both phenotypes. The gene of the ALD protein has been mapped to Xq28 and has recently been cloned and characterized. The ALD protein has significant homology to the peroxisomal membrane protein, PMP70 and belongs to the ATP binding cassette superfamily of transporters. We screened a human genomic library with an ALDP cDNA and isolated 5 different but highly similar clones containing sequences corresponding to the 3{prime}more » end of the ALDP gene. Comparison of the sequences over the region corresponding to exon 9 through the 3{prime} end of the ALDP gene reveals {approximately}96% nucleotide identity in both exonic and intronic regions. Splice sites and open reading frames are maintained. Using both FISH and human-rodent DNA mapping panels, we positively assign these ALDP-related sequences to chromosomes 2, 16 and 22, and provisionally to 1 and 20. Southern blot of primate DNA probed with a partial ALDP cDNA (exon 2-10) shows that expansion of ALDP-related sequences occurred in higher primates (chimp, gorilla and human). Although Northern blots show multiple ALDP-hybridizing transcripts in certain tissues, we have no evidence to date for expression of these ALDP-related sequences. In conclusion, our data show there has been an unusual and recent dispersal to multiple chromosomes of structural gene sequences related to the ALDP gene. The functional significance of these sequences remains to be determined but their existence complicates PCR and mutation analysis of the ALDP gene.« less

  1. Understanding the molecular basis of plant growth promotional effect of Pseudomonas fluorescens on rice through protein profiling

    PubMed Central

    2009-01-01

    Background Plant Growth Promoting Rhizobacteria (PGPR), Pseudomonas fluorescens strain KH-1 was found to exhibit plant growth promotional activity in rice under both in-vitro and in-vivo conditions. But the mechanism underlying such promotional activity of P. fluorescens is not yet understood clearly. In this study, efforts were made to elucidate the molecular responses of rice plants to P. fluorescens treatment through protein profiling. Two-dimensional polyacrylamide gel electrophoresis strategy was adopted to identify the PGPR responsive proteins and the differentially expressed proteins were analyzed by mass spectrometry. Results Priming of P. fluorescens, 23 different proteins found to be differentially expressed in rice leaf sheaths and MS analysis revealed the differential expression of some important proteins namely putative p23 co-chaperone, Thioredoxin h- rice, Ribulose-bisphosphate carboxylase large chain precursor, Nucleotide diPhosphate kinase, Proteosome sub unit protein and putative glutathione S-transferase protein. Conclusion Functional analyses of the differential proteins were reported to be directly or indirectly involved in growth promotion in plants. Thus, this study confirms the primary role of PGPR strain KH-1 in rice plant growth promotion. PMID:20034395

  2. A combined de novo protein sequencing and cDNA library approach to the venomic analysis of Chinese spider Araneus ventricosus.

    PubMed

    Duan, Zhigui; Cao, Rui; Jiang, Liping; Liang, Songping

    2013-01-14

    In past years, spider venoms have attracted increasing attention due to their extraordinary chemical and pharmacological diversity. The recently popularized proteomic method highly improved our ability to analyze the proteins in the venom. However, the lack of information about isolated venom proteins sequences dramatically limits the ability to confidently identify venom proteins. In the present paper, the venom from Araneus ventricosus was analyzed using two complementary approaches: 2-DE/Shotgun-LC-MS/MS coupled to MASCOT search and 2-DE/Shotgun-LC-MS/MS coupled to manual de novo sequencing followed by local venom protein database (LVPD) search. The LVPD was constructed with toxin-like protein sequences obtained from the analysis of cDNA library from A. ventricosus venom glands. Our results indicate that a total of 130 toxin-like protein sequences were unambiguously identified by manual de novo sequencing coupled to LVPD search, accounting for 86.67% of all toxin-like proteins in LVPD. Thus manual de novo sequencing coupled to LVPD search was proved an extremely effective approach for the analysis of venom proteins. In addition, the approach displays impeccable advantage in validating mutant positions of isoforms from the same toxin-like family. Intriguingly, methyl esterifcation of glutamic acid was discovered for the first time in animal venom proteins by manual de novo sequencing. Crown Copyright © 2012. Published by Elsevier B.V. All rights reserved.

  3. Mediator, TATA-binding Protein, and RNA Polymerase II Contribute to Low Histone Occupancy at Active Gene Promoters in Yeast*

    PubMed Central

    Ansari, Suraiya A.; Paul, Emily; Sommer, Sebastian; Lieleg, Corinna; He, Qiye; Daly, Alexandre Z.; Rode, Kara A.; Barber, Wesley T.; Ellis, Laura C.; LaPorta, Erika; Orzechowski, Amanda M.; Taylor, Emily; Reeb, Tanner; Wong, Jason; Korber, Philipp; Morse, Randall H.

    2014-01-01

    Transcription by RNA polymerase II (Pol II) in eukaryotes requires the Mediator complex, and often involves chromatin remodeling and histone eviction at active promoters. Here we address the role of Mediator in recruitment of the Swi/Snf chromatin remodeling complex and its role, along with components of the preinitiation complex (PIC), in histone eviction at inducible and constitutively active promoters in the budding yeast Saccharomyces cerevisiae. We show that recruitment of the Swi/Snf chromatin remodeling complex to the induced CHA1 promoter, as well as its association with several constitutively active promoters, depends on the Mediator complex but is independent of Mediator at the induced MET2 and MET6 genes. Although transcriptional activation and histone eviction at CHA1 depends on Swi/Snf, Swi/Snf recruitment is not sufficient for histone eviction at the induced CHA1 promoter. Loss of Swi/Snf activity does not affect histone occupancy of several constitutively active promoters; in contrast, higher histone occupancy is seen at these promoters in Mediator and PIC component mutants. We propose that an initial activator-dependent, nucleosome remodeling step allows PIC components to outcompete histones for occupancy of promoter sequences. We also observe reduced promoter association of Mediator and TATA-binding protein in a Pol II (rpb1-1) mutant, indicating mutually cooperative binding of these components of the transcription machinery and indicating that it is the PIC as a whole whose binding results in stable histone eviction. PMID:24727477

  4. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species

    NASA Technical Reports Server (NTRS)

    Haney, P. J.; Badger, J. H.; Buldak, G. L.; Reich, C. I.; Woese, C. R.; Olsen, G. J.

    1999-01-01

    The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.

  5. Hidden Markov models incorporating fuzzy measures and integrals for protein sequence identification and alignment.

    PubMed

    Bidargaddi, Niranjan P; Chetty, Madhu; Kamruzzaman, Joarder

    2008-06-01

    Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.

  6. From protein sequence to dynamics and disorder with DynaMine.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2013-01-01

    Protein function and dynamics are closely related; however, accurate dynamics information is difficult to obtain. Here based on a carefully assembled data set derived from experimental data for proteins in solution, we quantify backbone dynamics properties on the amino-acid level and develop DynaMine--a fast, high-quality predictor of protein backbone dynamics. DynaMine uses only protein sequence information as input and shows great potential in distinguishing regions of different structural organization, such as folded domains, disordered linkers, molten globules and pre-structured binding motifs of different sizes. It also identifies disordered regions within proteins with an accuracy comparable to the most sophisticated existing predictors, without depending on prior disorder knowledge or three-dimensional structural information. DynaMine provides molecular biologists with an important new method that grasps the dynamical characteristics of any protein of interest, as we show here for human p53 and E1A from human adenovirus 5.

  7. Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

    PubMed Central

    Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V

    2007-01-01

    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321

  8. CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: Chaos game representation walk model for the protein sequences

    NASA Astrophysics Data System (ADS)

    Gao, Jie; Jiang, Li-Li; Xu, Zhen-Yuan

    2009-10-01

    A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337 (2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CGR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CGR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.

  9. Sequence preservation of osteocalcin protein and mitochondrial DNA in bison bones older than 55 ka

    NASA Astrophysics Data System (ADS)

    Nielsen-Marsh, Christina M.; Ostrom, Peggy H.; Gandhi, Hasand; Shapiro, Beth; Cooper, Alan; Hauschka, Peter V.; Collins, Matthew J.

    2002-12-01

    We report the first complete sequences of the protein osteocalcin from small amounts (20 mg) of two bison bone (Bison priscus) dated to older than 55.6 ka and older than 58.9 ka. Osteocalcin was purified using new gravity columns (never exposed to protein) followed by microbore reversed-phase high-performance liquid chromatography. Sequencing of osteocalcin employed two methods of matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS): peptide mass mapping (PMM) and post-source decay (PSD). The PMM shows that ancient and modern bison osteocalcin have the same mass to charge (m/z) distribution, indicating an identical protein sequence and absence of diagenetic products. This was confirmed by PSD of the m/z 2066 tryptic peptide (residues 1 19); the mass spectra from ancient and modern peptides were identical. The 129 mass unit difference in the molecular ion between cow (Bos taurus) and bison is caused by a single amino-acid substitution between the taxa (Trp in cow is replaced by Gly in bison at residue 5). Bison mitochondrial control region DNA sequences were obtained from the older than 55.6 ka fossil. These results suggest that DNA and protein sequences can be used to directly investigate molecular phylogenies over a considerable time period, the absolute limit of which is yet to be determined.

  10. Negative effect of the 5'-untranslated leader sequence on Ac transposon promoter expression.

    PubMed

    Scortecci, K C; Raina, R; Fedoroff, N V; Van Sluys, M A

    1999-08-01

    Transposable elements are used in heterologous plant hosts to clone genes by insertional mutagenesis. The Activator (Ac) transposable element has been cloned from maize, and introduced into a variety of plants. However, differences in regulation and transposition frequency have been observed between different host plants. The cause of this variability is still unknown. To better understand the activity of the Ac element, we analyzed the Ac promoter region and its 5'-untranslated leader sequence (5' UTL). Transient assays in tobacco NT1 suspension cells showed that the Ac promoter is a weak promoter and its activity was localized by deletion analyses. The data presented here indicate that the core of the Ac promoter is contained within 153 bp fragment upstream to transcription start sites. An important inhibitory effect (80%) due to the presence of the 5' UTL was found on the expression of LUC reporter gene. Here we demonstrate that the presence of the 5' UTL in the constructs reduces the expression driven by either strong or weak promoters.

  11. Genome Sequence of Herbaspirillum sp. Strain GW103, a Plant Growth-Promoting Bacterium

    PubMed Central

    Lee, Gun Woong; Lee, Kui-Jae

    2012-01-01

    Herbaspirillum sp. strain GW103 was isolated from rhizosphere soil of the reed Phragmites australis on reclaimed land. Here we report the 5.05-Mb draft genome sequence of the strain, providing bioinformation about the agronomic benefits of this strain, such as multiple traits relevant to plant root colonization and plant growth promotion. PMID:22815460

  12. Sequence, Structure, and Context Preferences of Human RNA Binding Proteins.

    PubMed

    Dominguez, Daniel; Freese, Peter; Alexis, Maria S; Su, Amanda; Hochman, Myles; Palden, Tsultrim; Bazile, Cassandra; Lambert, Nicole J; Van Nostrand, Eric L; Pratt, Gabriel A; Yeo, Gene W; Graveley, Brenton R; Burge, Christopher B

    2018-06-07

    RNA binding proteins (RBPs) orchestrate the production, processing, and function of mRNAs. Here, we present the affinity landscapes of 78 human RBPs using an unbiased assay that determines the sequence, structure, and context preferences of these proteins in vitro by deep sequencing of bound RNAs. These data enable construction of "RNA maps" of RBP activity without requiring crosslinking-based assays. We found an unexpectedly low diversity of RNA motifs, implying frequent convergence of binding specificity toward a relatively small set of RNA motifs, many with low compositional complexity. Offsetting this trend, however, we observed extensive preferences for contextual features distinct from short linear RNA motifs, including spaced "bipartite" motifs, biased flanking nucleotide composition, and bias away from or toward RNA structure. Our results emphasize the importance of contextual features in RNA recognition, which likely enable targeting of distinct subsets of transcripts by different RBPs that recognize the same linear motif. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  13. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

    PubMed

    Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke

    2009-02-15

    Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http

  14. DNA sequence requirements for the accurate transcription of a protein-coding plastid gene in a plastid in vitro system from mustard (Sinapis alba L.)

    PubMed Central

    Link, Gerhard

    1984-01-01

    A nuclease-treated plastid extract from mustard (Sinapis alba L.) allows efficient transcription of cloned plastid DNA templates. In this in vitro system, the major runoff transcript of the truncated gene for the 32 000 mol. wt. photosystem II protein was accurately initiated from a site close to or identical with the in vivo start site. By using plasmids with deletions in the 5'-flanking region of this gene as templates, a DNA region required for efficient and selective initiation was detected ˜28-35 nucleotides upstream of the transcription start site. This region contains the sequence element TTGACA, which matches the consensus sequence for prokaryotic `−35' promoter elements. In the absence of this region, a region ˜13-27 nucleotides upstream of the start site still enables a basic level of specific transcription. This second region contains the sequence element TATATAA, which matches the consensus sequence for the `TATA' box of genes transcribed by RNA polymerase II (or B). The region between the `TATA'-like element and the transcription start site is not sufficient but may be required for specific transcription of the plastid gene. This latter region contains the sequence element TATACT, which resembles the prokaryotic `−10' (Pribnow) box. Based on the structural and transcriptional features of the 5' upstream region, a `promoter switch' mechanism is proposed, which may account for the developmentally regulated expression of this plastid gene. ImagesFig. 1.Fig. 2.Fig. 3.Fig. 4.Figure 5. PMID:16453540

  15. Severe Acute Respiratory Syndrome Coronavirus Envelope Protein Ion Channel Activity Promotes Virus Fitness and Pathogenesis

    PubMed Central

    Nieto-Torres, Jose L.; DeDiego, Marta L.; Verdiá-Báguena, Carmina; Jimenez-Guardeño, Jose M.; Regla-Nava, Jose A.; Fernandez-Delgado, Raul; Castaño-Rodriguez, Carlos; Alcaraz, Antonio; Torres, Jaume; Aguilella, Vicente M.; Enjuanes, Luis

    2014-01-01

    Deletion of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) envelope (E) gene attenuates the virus. E gene encodes a small multifunctional protein that possesses ion channel (IC) activity, an important function in virus-host interaction. To test the contribution of E protein IC activity in virus pathogenesis, two recombinant mouse-adapted SARS-CoVs, each containing one single amino acid mutation that suppressed ion conductivity, were engineered. After serial infections, mutant viruses, in general, incorporated compensatory mutations within E gene that rendered active ion channels. Furthermore, IC activity conferred better fitness in competition assays, suggesting that ion conductivity represents an advantage for the virus. Interestingly, mice infected with viruses displaying E protein IC activity, either with the wild-type E protein sequence or with the revertants that restored ion transport, rapidly lost weight and died. In contrast, mice infected with mutants lacking IC activity, which did not incorporate mutations within E gene during the experiment, recovered from disease and most survived. Knocking down E protein IC activity did not significantly affect virus growth in infected mice but decreased edema accumulation, the major determinant of acute respiratory distress syndrome (ARDS) leading to death. Reduced edema correlated with lung epithelia integrity and proper localization of Na+/K+ ATPase, which participates in edema resolution. Levels of inflammasome-activated IL-1β were reduced in the lung airways of the animals infected with viruses lacking E protein IC activity, indicating that E protein IC function is required for inflammasome activation. Reduction of IL-1β was accompanied by diminished amounts of TNF and IL-6 in the absence of E protein ion conductivity. All these key cytokines promote the progression of lung damage and ARDS pathology. In conclusion, E protein IC activity represents a new determinant for SARS-CoV virulence. PMID:24788150

  16. An E8 promoter-HSP terminator cassette promotes the high-level accumulation of recombinant protein predominantly in transgenic tomato fruits: a case study of miraculin.

    PubMed

    Kurokawa, Natsuko; Hirai, Tadayoshi; Takayama, Mariko; Hiwasa-Tanase, Kyoko; Ezura, Hiroshi

    2013-04-01

    The E8 promoter-HSP terminator expression cassette is a powerful tool for increasing the accumulation of recombinant protein in a ripening tomato fruit. Strong, tissue-specific transgene expression is a desirable feature in transgenic plants to allow the production of variable recombinant proteins. The expression vector is a key tool to control the expression level and site of transgene and recombinant protein expression in transgenic plants. The combination of the E8 promoter, a fruit-ripening specific promoter, and a heat shock protein (HSP) terminator, derived from heat shock protein 18.2 of Arabidopsis thaliana, produces the strong and fruit-specific accumulation of recombinant miraculin in transgenic tomato. Miraculin gene expression was driven by an E8 promoter and HSP terminator cassette (E8-MIR-HSP) in transgenic tomato plants, and the miraculin concentration was the highest in the ripening fruits, representing 30-630 μg miraculin of the gram fresh weight. The highest level of miraculin concentration among the transgenic tomato plant lines containing the E8-MIR-HSP cassette was approximately four times higher than those observed in a previous study using a constitutive 35S promoter and NOS terminator cassette (Hiwasa-Tanase et al. in Plant Cell Rep 30:113-124, 2011). These results demonstrate that the combination of the E8 promoter and HSP terminator cassette is a useful tool to increase markedly the accumulation of recombinant proteins in a ripening fruit-specific manner.

  17. Cooperative heteroassembly of the adenoviral L4-22K and IVa2 proteins onto the viral packaging sequence DNA.

    PubMed

    Yang, Teng-Chieh; Maluf, Nasib Karl

    2012-02-21

    Human adenovirus (Ad) is an icosahedral, double-stranded DNA virus. Viral DNA packaging refers to the process whereby the viral genome becomes encapsulated by the viral particle. In Ad, activation of the DNA packaging reaction requires at least three viral components: the IVa2 and L4-22K proteins and a section of DNA within the viral genome, called the packaging sequence. Previous studies have shown that the IVa2 and L4-22K proteins specifically bind to conserved elements within the packaging sequence and that these interactions are absolutely required for the observation of DNA packaging. However, the equilibrium mechanism for assembly of IVa2 and L4-22K onto the packaging sequence has not been determined. Here we characterize the assembly of the IVa2 and L4-22K proteins onto truncated packaging sequence DNA by analytical sedimentation velocity and equilibrium methods. At limiting concentrations of L4-22K, we observe a species with two IVa2 monomers and one L4-22K monomer bound to the DNA. In this species, the L4-22K monomer is promoting positive cooperative interactions between the two bound IVa2 monomers. As L4-22K levels are increased, we observe a species with one IVa2 monomer and three L4-22K monomers bound to the DNA. To explain this result, we propose a model in which L4-22K self-assembly on the DNA competes with IVa2 for positive heterocooperative interactions, destabilizing binding of the second IVa2 monomer. Thus, we propose that L4-22K levels control the extent of cooperativity observed between adjacently bound IVa2 monomers. We have also determined the hydrodynamic properties of all observed stoichiometric species; we observe that species with three L4-22K monomers bound have more extended conformations than species with a single L4-22K bound. We suggest this might reflect a molecular switch that controls insertion of the viral DNA into the capsid.

  18. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  19. Variability of the protein sequences of lcrV between epidemic and atypical rhamnose-positive strains of Yersinia pestis.

    PubMed

    Anisimov, Andrey P; Panfertsev, Evgeniy A; Svetoch, Tat'yana E; Dentovskaya, Svetlana V

    2007-01-01

    Sequencing of lcrV genes and comparison of the deduced amino acid sequences from ten Y. pestis strains belonging mostly to the group of atypical rhamnose-positive isolates (non-pestis subspecies or pestoides group) showed that the LcrV proteins analyzed could be classified into five sequence types. This classification was based on major amino acid polymorphisms among LcrV proteins in the four "hot points" of the protein sequences. Some additional minor polymorphisms were found throughout these sequence types. The "hot points" corresponded to amino acids 18 (Lys --> Asn), 72 (Lys --> Arg), 273 (Cys --> Ser), and 324-326 (Ser-Gly-Lys --> Arg) in the LcrV sequence of the reference Y. pestis strain CO92. One possible explanation for polymorphism in amino acid sequences of LcrV among different strains is that strain-specific variation resulted from adaptation of the plague pathogen to different rodent and lagomorph hosts.

  20. Evol and ProDy for bridging protein sequence evolution and structural dynamics.

    PubMed

    Bakan, Ahmet; Dutta, Anindita; Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R; Bahar, Ivet

    2014-09-15

    Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Apple Macintosh programs for nucleic and protein sequence analyses.

    PubMed Central

    Bellon, B

    1988-01-01

    This paper describes a package of programs for handling and analyzing nucleic acid and protein sequences using the Apple Macintosh microcomputer. There are three important features of these programs: first, because of the now classical Macintosh interface the programs can be easily used by persons with little or no computer experience. Second, it is possible to save all the data, written in an editable scrolling text window or drawn in a graphic window, as files that can be directly used either as word processing documents or as picture documents. Third, sequences can be easily exchanged with any other computer. The package is composed of thirteen programs, written in Pascal programming language. PMID:2832832

  2. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

    PubMed

    Xu, Qifang; Dunbrack, Roland L

    2012-11-01

    Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.

  3. The inhibition of IGF-1 signaling promotes proteostasis by enhancing protein aggregation and deposition.

    PubMed

    Moll, Lorna; Ben-Gedalya, Tziona; Reuveni, Hadas; Cohen, Ehud

    2016-04-01

    The discovery that the alteration of aging by reducing the activity of the insulin/IGF-1 signaling (IIS) cascade protects nematodes and mice from neurodegeneration-linked, toxic protein aggregation (proteotoxicity) raises the prospect that IIS inhibitors bear therapeutic potential to counter neurodegenerative diseases. Recently, we reported that NT219, a highly efficient IGF-1 signaling inhibitor, protects model worms from the aggregation of amyloid β peptide and polyglutamine peptides that are linked to the manifestation of Alzheimer's and Huntington's diseases, respectively. Here, we employed cultured cell systems to investigate whether NT219 promotes protein homeostasis (proteostasis) in mammalian cells and to explore its underlying mechanisms. We found that NT219 enhances the aggregation of misfolded prion protein and promotes its deposition in quality control compartments known as "aggresomes." NT219 also elevates the levels of certain molecular chaperones but, surprisingly, reduces proteasome activity and impairs autophagy. Our findings show that IGF-1 signaling inhibitors in general and NT219 in particular can promote proteostasis in mammalian cells by hyperaggregating hazardous proteins, thereby bearing the potential to postpone the onset and slow the progression of neurodegenerative illnesses in the elderly.-Moll, L., Ben-Gedalya, T., Reuveni, H., Cohen, E. The inhibition of IGF-1 signaling promotes proteostasis by enhancing protein aggregation and deposition. © FASEB.

  4. Visualization of protein sequence features using JavaScript and SVG with pViz.js.

    PubMed

    Mukhyala, Kiran; Masselot, Alexandre

    2014-12-01

    pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Sequence fingerprints distinguish erroneous from correct predictions of intrinsically disordered protein regions.

    PubMed

    Saravanan, Konda Mani; Dunker, A Keith; Krishnaswamy, Sankaran

    2017-12-27

    More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the ~65%-~80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.

  6. Autoregulation of transcription of the hupA gene in Escherichia coli: evidence for steric hindrance of the functional promoter domains induced by HU.

    PubMed

    Kohno, K; Yasuzawa, K; Hirose, M; Kano, Y; Goshima, N; Tanaka, H; Imamoto, F

    1994-06-01

    The molecular mechanism of autoregulation of expression of the hupA gene in Escherichia coli was examined. The promoter of the gene contains a palindromic sequence with the potential to form a cruciform DNA structure in which the -35 sequence lies at the base of the stem and the -10 sequence forms a single-stranded loop. An artificial promoter lacking the palindrome, which was constructed by replacing a 10 nucleotide repeat for the predicted cruciform arm by a sequence in the opposite orientation, was not subject to HU-repression. DNA relaxation induced by deleting HU proteins and/or inhibiting DNA gyrase in cells results in increased expression from the hupA promoter. We propose that initiation of transcription of the hupA gene is negatively regulated by steric hindrance of the functional promoter domains for formation of the cruciform configuration, which is facilitated at least in part by negative supercoiling of the hupA promoter DNA region. The promoter region of the hupB gene also contains a palindromic sequence that can assume a cruciform configuration. Negative regulation of this gene by HU proteins may occur by a mechanism similar to that operating for the hupA gene.

  7. Tobacco arabinogalactan protein NtEPc can promote banana (Musa AAA) somatic embryogenesis.

    PubMed

    Shu, H; Xu, L; Li, Z; Li, J; Jin, Z; Chang, S

    2014-12-01

    Banana is an important tropical fruit worldwide. Parthenocarpy and female sterility made it impossible to improve banana varieties through common hybridization. Genetic transformation for banana improvement is imperative. But the low rate that banana embryogenic callus was induced made the transformation cannot be performed in many laboratories. Finding ways to promote banana somatic embryogenesis is critical for banana genetic transformation. After tobacco arabinogalactan protein gene NtEPc was transformed into Escherichia coli (DE3), the recombinant protein was purified and filter-sterilized. A series of the sterilized protein was added into tissue culture medium. It was found that the number of banana immature male flowers developing embryogenic calli increased significantly in the presence of NtEPc protein compared with the effect of the control medium. Among the treatments, explants cultured on medium containing 10 mg/l of NtEPc protein had the highest chance to develop embryogenic calli. The percentage of lines that developed embryogenic calli on this medium was about 12.5 %. These demonstrated that NtEPc protein can be used to promote banana embryogenesis. This is the first paper that reported that foreign arabinogalactan protein (AGP) could be used to improve banana somatic embryogenesis.

  8. G protein-coupled odorant receptors: From sequence to structure

    PubMed Central

    de March, Claire A; Kim, Soo-Kyung; Antonczak, Serge; Goddard, William A; Golebiowski, Jérôme

    2015-01-01

    Odorant receptors (ORs) are the largest subfamily within class A G protein-coupled receptors (GPCRs). No experimental structural data of any OR is available to date and atomic-level insights are likely to be obtained by means of molecular modeling. In this article, we critically align sequences of ORs with those GPCRs for which a structure is available. Here, an alignment consistent with available site-directed mutagenesis data on various ORs is proposed. Using this alignment, the choice of the template is deemed rather minor for identifying residues that constitute the wall of the binding cavity or those involved in G protein recognition. PMID:26044705

  9. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  10. Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences.

    PubMed

    Burden, S; Lin, Y-X; Zhang, R

    2005-03-01

    Although a great deal of research has been undertaken in the area of promoter prediction, prediction techniques are still not fully developed. Many algorithms tend to exhibit poor specificity, generating many false positives, or poor sensitivity. The neural network prediction program NNPP2.2 is one such example. To improve the NNPP2.2 prediction technique, the distance between the transcription start site (TSS) associated with the promoter and the translation start site (TLS) of the subsequent gene coding region has been studied for Escherichia coli K12 bacteria. An empirical probability distribution that is consistent for all E.coli promoters has been established. This information is combined with the results from NNPP2.2 to create a new technique called TLS-NNPP, which improves the specificity of promoter prediction. The technique is shown to be effective using E.coli DNA sequences, however, it is applicable to any organism for which a set of promoters has been experimentally defined. The data used in this project and the prediction results for the tested sequences can be obtained from http://www.uow.edu.au/~yanxia/E_Coli_paper/SBurden_Results.xls alh98@uow.edu.au.

  11. Sequencing and functional analysis of the nifENXorf1orf2 gene cluster of Herbaspirillum seropedicae.

    PubMed

    Klassen, G; Pedrosa, F O; Souza, E M; Yates, M G; Rigo, L U

    1999-12-01

    A 5.1-kb DNA fragment from the nifHDK region of H. seropedicae was isolated and sequenced. Sequence analysis showed the presence of nifENXorf1orf2 but nifTY were not present. No nif or consensus promoter was identified. Furthermore, orf1 expression occurred only under nitrogen-fixing conditions and no promoter activity was detected between nifK and nifE, suggesting that these genes are expressed from the upstream nifH promoter and are parts of a unique nif operon. Mutagenesis studies indicate that nifN was essential for nitrogenase activity whereas nifXorf1orf2 were not. High homology between the C-terminal region of the NifX and NifB proteins from H. seropedicae was observed. Since the NifX and NifY proteins are important for FeMo cofactor (FeMoco) synthesis, we propose that alternative proteins with similar activities exist in H. seropedicae.

  12. Experimental rugged fitness landscape in protein sequence space.

    PubMed

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-12-20

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7x10(4)-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18-24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region.

  13. Experimental Rugged Fitness Landscape in Protein Sequence Space

    PubMed Central

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-01-01

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12–130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7×104-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18–24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region. PMID:17183728

  14. Topological frustration in βα-repeat proteins: sequence diversity modulates the conserved folding mechanisms of α/β/α sandwich proteins

    PubMed Central

    Hills, Ronald D.; Kathuria, Sagar V.; Wallace, Louise A.; Day, Iain J.; Brooks, Charles L.; Matthews, C. Robert

    2010-01-01

    The thermodynamic hypothesis of Anfinsen postulates that structures and stabilities of globular proteins are determined by their amino acid sequences. Chain topology, however, is known to influence the folding reaction, in that motifs with a preponderance of local interactions typically fold more rapidly than those with a larger fraction of non-local interactions. Together, the topology and sequence can modulate the energy landscape and influence the rate at which the protein folds to the native conformation. To explore the relationship of sequence and topology in the folding of βα–repeat proteins, which are dominated by local interactions, a combined experimental and simulation analysis was performed on two members of the flavodoxin-like, α/β/α sandwich fold. Spo0F and the N-terminal receiver domain of NtrC (NT-NtrC) have similar topologies but low sequence identity, enabling a test of the effects of sequence on folding. Experimental results demonstrated that both response-regulator proteins fold via parallel channels through highly structured sub-millisecond intermediates before accessing their cis prolyl peptide bond-containing native conformations. Global analysis of the experimental results preferentially places these intermediates off the productive folding pathway. Sequence-sensitive Gō-model simulations conclude that frustration in the folding in Spo0F, corresponding to the appearance of the off-pathway intermediate, reflects competition for intra-subdomain van der Waals contacts between its N- and C-terminal subdomains. The extent of transient, premature structure appears to correlate with the number of isoleucine, leucine and valine (ILV) side-chains that form a large sequence-local cluster involving the central β-sheet and helices α2, α3 and α4. The failure to detect the off-pathway species in the simulations of NT-NtrC may reflect the reduced number of ILV side-chains in its corresponding hydrophobic cluster. The location of the hydrophobic

  15. Programming molecular self-assembly of intrinsically disordered proteins containing sequences of low complexity

    NASA Astrophysics Data System (ADS)

    Simon, Joseph R.; Carroll, Nick J.; Rubinstein, Michael; Chilkoti, Ashutosh; López, Gabriel P.

    2017-06-01

    Dynamic protein-rich intracellular structures that contain phase-separated intrinsically disordered proteins (IDPs) composed of sequences of low complexity (SLC) have been shown to serve a variety of important cellular functions, which include signalling, compartmentalization and stabilization. However, our understanding of these structures and our ability to synthesize models of them have been limited. We present design rules for IDPs possessing SLCs that phase separate into diverse assemblies within droplet microenvironments. Using theoretical analyses, we interpret the phase behaviour of archetypal IDP sequences and demonstrate the rational design of a vast library of multicomponent protein-rich structures that ranges from uniform nano-, meso- and microscale puncta (distinct protein droplets) to multilayered orthogonally phase-separated granular structures. The ability to predict and program IDP-rich assemblies in this fashion offers new insights into (1) genetic-to-molecular-to-macroscale relationships that encode hierarchical IDP assemblies, (2) design rules of such assemblies in cell biology and (3) molecular-level engineering of self-assembled recombinant IDP-rich materials.

  16. Promoter classifier: software package for promoter database analysis.

    PubMed

    Gershenzon, Naum I; Ioshikhes, Ilya P

    2005-01-01

    Promoter Classifier is a package of seven stand-alone Windows-based C++ programs allowing the following basic manipulations with a set of promoter sequences: (i) calculation of positional distributions of nucleotides averaged over all promoters of the dataset; (ii) calculation of the averaged occurrence frequencies of the transcription factor binding sites and their combinations; (iii) division of the dataset into subsets of sequences containing or lacking certain promoter elements or combinations; (iv) extraction of the promoter subsets containing or lacking CpG islands around the transcription start site; and (v) calculation of spatial distributions of the promoter DNA stacking energy and bending stiffness. All programs have a user-friendly interface and provide the results in a convenient graphical form. The Promoter Classifier package is an effective tool for various basic manipulations with eukaryotic promoter sequences that usually are necessary for analysis of large promoter datasets. The program Promoter Divider is described in more detail as a representative component of the package.

  17. Identification of a functional element in the promoter of the silkworm (Bombyx mori) fat body-specific gene Bmlp3.

    PubMed

    Xu, Hanfu; Deng, Dangjun; Yuan, Lin; Wang, Yuancheng; Wang, Feng; Xia, Qingyou

    2014-08-01

    30K proteins are a group of structurally related proteins that play important roles in the life cycle of the silkworm Bombyx mori and are largely synthesized and regulated in a time-dependent manner in the fat body. Little is known about the upstream regulatory elements associated with the genes encoding these proteins. In the present study, the promoter of Bmlp3, a fat body-specific gene encoding a 30K protein family member, was characterized by joining sequences containing the Bmlp3 promoter with various amounts of 5' upstream sequences to a luciferase reporter gene. The results indicated that the sequences from -150 to -250bp and -597 to -675bp upstream of the Bmlp3 transcription start site were necessary for high levels of luciferase activity. Further analysis showed that a 21-bp sequence located between -230 and -250 was specifically recognized by nuclear factors from silkworm fat bodies and BmE cells, and could enhance luciferase reporter-gene expression 2.8-fold in BmE cells. This study provides new insights into the Bmlp3 promoter and contributes to the further clarification of the function and developmental regulation of Bmlp3. Copyright © 2014. Published by Elsevier B.V.

  18. Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

    PubMed

    Raghav, Sunil Kumar; Deplancke, Bart

    2012-01-01

    Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.

  19. Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.

    PubMed

    Libbrecht, Maxwell W; Bilmes, Jeffrey A; Noble, William Stafford

    2018-04-01

    Selecting a non-redundant representative subset of sequences is a common step in many bioinformatics workflows, such as the creation of non-redundant training sets for sequence and structural models or selection of "operational taxonomic units" from metagenomics data. Previous methods for this task, such as CD-HIT, PISCES, and UCLUST, apply a heuristic threshold-based algorithm that has no theoretical guarantees. We propose a new approach based on submodular optimization. Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success for other representative set selection problems. We demonstrate that the submodular optimization approach results in representative protein sequence subsets with greater structural diversity than sets chosen by existing methods, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by competing approaches. We also show how the optimization framework allows us to design a mixture objective function that performs well for both large and small representative sets. The framework we describe is the best possible in polynomial time (under some assumptions), and it is flexible and intuitive because it applies a suite of generic methods to optimize one of a variety of objective functions. © 2018 Wiley Periodicals, Inc.

  20. Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds

    PubMed Central

    Roessler, Christian G.; Hall, Branwen M.; Anderson, William J.; Ingram, Wendy M.; Roberts, Sue A.; Montfort, William R.; Cordes, Matthew H. J.

    2008-01-01

    Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a “stepping-stone” method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and λ. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and λ. The domains show 40% sequence identity but differ by switching of α-helix to β-sheet in a C-terminal region spanning ≈25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization. PMID:18227506

  1. A peptide sequence on carcinoembryonic antigen binds to a 80kD protein on Kupffer cells.

    PubMed

    Thomas, P; Petrick, A T; Toth, C A; Fox, E S; Elting, J J; Steele, G

    1992-10-30

    Clearance of carcinoembryonic antigen (CEA) from the circulation is by binding to Kupffer cells in the liver. We have shown that CEA binding to Kupffer cells occurs via a peptide sequence YPELPK representing amino acids 107-112 of the CEA sequence. This peptide sequence is located in the region between the N-terminal and the first immunoglobulin like loop domain. Using native CEA and peptides containing this sequence complexed with a heterobifunctional crosslinking agent and ligand blotting with biotinylated CEA and NCA we have shown binding to an 80kD protein on the Kupffer cell surface. This binding protein may be important in the development of hepatic metastases.

  2. Identifying and engineering promoters for high level and sustainable therapeutic recombinant protein production in cultured mammalian cells.

    PubMed

    Ho, Steven C L; Yang, Yuansheng

    2014-08-01

    Promoters are essential on plasmid vectors to initiate transcription of the transgenes when generating therapeutic recombinant proteins expressing mammalian cell lines. High and sustained levels of gene expression are desired during therapeutic protein production while gene expression is useful for cell engineering. As many finely controlled promoters exhibit cell and product specificity, new promoters need to be identified, optimized and carefully evaluated before use. Suitable promoters can be identified using techniques ranging from simple molecular biology methods to modern high-throughput omics screenings. Promoter engineering is often required after identification to either obtain high and sustained expression or to provide a wider range of gene expression. This review discusses some of the available methods to identify and engineer promoters for therapeutic recombinant protein expression in mammalian cells.

  3. Molecular Simulations of Sequence-Specific Association of Transmembrane Proteins in Lipid Bilayers

    NASA Astrophysics Data System (ADS)

    Doxastakis, Manolis; Prakash, Anupam; Janosi, Lorant

    2011-03-01

    Association of membrane proteins is central in material and information flow across the cellular membranes. Amino-acid sequence and the membrane environment are two critical factors controlling association, however, quantitative knowledge on such contributions is limited. In this work, we study the dimerization of helices in lipid bilayers using extensive parallel Monte Carlo simulations with recently developed algorithms. The dimerization of Glycophorin A is examined employing a coarse-grain model that retains a level of amino-acid specificity, in three different phospholipid bilayers. Association is driven by a balance of protein-protein and lipid-induced interactions with the latter playing a major role at short separations. Following a different approach, the effect of amino-acid sequence is studied using the four transmembrane domains of the epidermal growth factor receptor family in identical lipid environments. Detailed characterization of dimer formation and estimates of the free energy of association reveal that these helices present significant affinity to self-associate with certain dimers forming non-specific interfaces.

  4. Population-genetic analysis of HvABCG31 promoter sequence in wild barley (Hordeum vulgare ssp. spontaneum)

    PubMed Central

    2012-01-01

    Background The cuticle is an important adaptive structure whose origin played a crucial role in the transition of plants from aqueous to terrestrial conditions. HvABCG31/Eibi1 is an ABCG transporter gene, involved in cuticle formation that was recently identified in wild barley (Hordeum vulgare ssp. spontaneum). To study the genetic variation of HvABCG31 in different habitats, its 2 kb promoter region was sequenced from 112 wild barley accessions collected from five natural populations from southern and northern Israel. The sites included three mesic and two xeric habitats, and differed in annual rainfall, soil type, and soil water capacity. Results Phylogenetic analysis of the aligned HvABCG31 promoter sequences clustered the majority of accessions (69 out of 71) from the three northern mesic populations into one cluster, while all 21 accessions from the Dead Sea area, a xeric southern population, and two isolated accessions (one from a xeric population at Mitzpe Ramon and one from the xeric ‘African Slope’ of “Evolution Canyon”) formed the second cluster. The southern arid populations included six haplotypes, but they differed from the consensus sequence at a large number of positions, while the northern mesic populations included 15 haplotypes that were, on average, more similar to the consensus sequence. Most of the haplotypes (20 of 22) were unique to a population. Interestingly, higher genetic variation occurred within populations (54.2%) than among populations (45.8%). Analysis of the promoter region detected a large number of transcription factor binding sites: 121–128 and 121–134 sites in the two southern arid populations, and 123–128,125–128, and 123–125 sites in the three northern mesic populations. Three types of TFBSs were significantly enriched: those related to GA (gibberellin), Dof (DNA binding with one finger), and light. Conclusions Drought stress and adaptive natural selection may have been important determinants in the observed

  5. The cancer-promoting gene fatty acid-binding protein 5 (FABP5) is epigenetically regulated during human prostate carcinogenesis.

    PubMed

    Kawaguchi, Koichiro; Kinameri, Ayumi; Suzuki, Shunsuke; Senga, Shogo; Ke, Youqiang; Fujii, Hiroshi

    2016-02-15

    FABPs (fatty-acid-binding proteins) are a family of low-molecular-mass intracellular lipid-binding proteins consisting of ten isoforms. FABPs are involved in binding and storing hydrophobic ligands such as long-chain fatty acids, as well as transporting these ligands to the appropriate compartments in the cell. FABP5 is overexpressed in multiple types of tumours. Furthermore, up-regulation of FABP5 is strongly associated with poor survival in triple-negative breast cancer. However, the mechanisms underlying the specific up-regulation of the FABP5 gene in these cancers remain poorly characterized. In the present study, we determined that FABP5 has a typical CpG island around its promoter region. The DNA methylation status of the CpG island in the FABP5 promoter of benign prostate cells (PNT2), prostate cancer cells (PC-3, DU-145, 22Rv1 and LNCaP) and human normal or tumour tissue was assessed by bisulfite sequencing analysis, and then confirmed by COBRA (combined bisulfite restriction analysis) and qAMP (quantitative analysis of DNA methylation using real-time PCR). These results demonstrated that overexpression of FABP5 in prostate cancer cells can be attributed to hypomethylation of the CpG island in its promoter region, along with up-regulation of the direct trans-acting factors Sp1 (specificity protein 1) and c-Myc. Together, these mechanisms result in the transcriptional activation of FABP5 expression during human prostate carcinogenesis. Importantly, silencing of Sp1, c-Myc or FABP5 expression led to a significant decrease in cell proliferation, indicating that up-regulation of FABP5 expression by Sp1 and c-Myc is critical for the proliferation of prostate cancer cells. © 2016 Authors; published by Portland Press Limited.

  6. A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing.

    PubMed

    Stiffler, Michael A; Subramanian, Subu K; Salinas, Victor H; Ranganathan, Rama

    2016-07-03

    Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.

  7. Tenebrio molitor antifreeze protein gene identification and regulation.

    PubMed

    Qin, Wensheng; Walker, Virginia K

    2006-02-15

    The yellow mealworm, Tenebrio molitor, is a freeze susceptible, stored product pest. Its winter survival is facilitated by the accumulation of antifreeze proteins (AFPs), encoded by a small gene family. We have now isolated 11 different AFP genomic clones from 3 genomic libraries. All the clones had a single coding sequence, with no evidence of intervening sequences. Three genomic clones were further characterized. All have putative TATA box sequences upstream of the coding regions and multiple potential poly(A) signal sequences downstream of the coding regions. A TmAFP regulatory region, B1037, conferred transcriptional activity when ligated to a luciferase reporter sequence and after transfection into an insect cell line. A 143 bp core promoter including a TATA box sequence was identified. Its promoter activity was increased 4.4 times by inserting an exotic 245 bp intron into the construct, similar to the enhancement of transgenic expression seen in several other systems. The addition of a duplication of the first 120 bp sequence from the 143 bp core promoter decreased promoter activity by half. Although putative hormonal response sequences were identified, none of the five hormones tested enhanced reporter activity. These studies on the mechanisms of AFP transcriptional control are important for the consideration of any transfer of freeze-resistance phenotypes to beneficial hosts.

  8. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  9. Preparative Protein Production from Inclusion Bodies and Crystallization: A Seven-Week Biochemistry Sequence

    ERIC Educational Resources Information Center

    Peterson, Megan J.; Snyder, W. Kalani; Westerman, Shelley; McFarland, Benjamin J.

    2011-01-01

    We describe how to produce and purify proteins from "Escherichia coli" inclusion bodies by adapting versatile, preparative-scale techniques to the undergraduate laboratory schedule. This 7-week sequence of experiments fits into an annual cycle of research activity in biochemistry courses. Recombinant proteins are expressed as inclusion bodies,…

  10. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

    PubMed

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

  11. Origin and spread of photosynthesis based upon conserved sequence features in key bacteriochlorophyll biosynthesis proteins.

    PubMed

    Gupta, Radhey S

    2012-11-01

    The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been

  12. TRIM24 protein promotes and TRIM32 protein inhibits cardiomyocyte hypertrophy via regulation of dysbindin protein levels

    PubMed Central

    Borlepawar, Ankush; Bernt, Alexander; Christen, Lynn; Sossalla, Samuel; Frank, Derk; Frey, Norbert

    2017-01-01

    We have previously shown that dysbindin is a potent inducer of cardiomyocyte hypertrophy via activation of Rho-dependent serum-response factor (SRF) signaling. We have now performed a yeast two-hybrid screen using dysbindin as bait against a cardiac cDNA library to identify the cardiac dysbindin interactome. Among several putative binding proteins, we identified tripartite motif-containing protein 24 (TRIM24) and confirmed this interaction by co-immunoprecipitation and co-immunostaining. Another tripartite motif (TRIM) family protein, TRIM32, has been reported earlier as an E3 ubiquitin ligase for dysbindin in skeletal muscle. Consistently, we found that TRIM32 also degraded dysbindin in neonatal rat ventricular cardiomyocytes as well. Surprisingly, however, TRIM24 did not promote dysbindin decay but rather protected dysbindin against degradation by TRIM32. Correspondingly, TRIM32 attenuated the activation of SRF signaling and hypertrophy due to dysbindin, whereas TRIM24 promoted these effects in neonatal rat ventricular cardiomyocytes. This study also implies that TRIM32 is a key regulator of cell viability and apoptosis in cardiomyocytes via simultaneous activation of p53 and caspase-3/-7 and inhibition of X-linked inhibitor of apoptosis. In conclusion, we provide here a novel mechanism of post-translational regulation of dysbindin and hypertrophy via TRIM24 and TRIM32 and show the importance of TRIM32 in cardiomyocyte apoptosis in vitro. PMID:28465353

  13. Simple chained guide trees give high-quality protein multiple sequence alignments

    PubMed Central

    Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

    2014-01-01

    Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495

  14. Measles virus minigenomes encoding two autofluorescent proteins reveal cell-to-cell variation in reporter expression dependent on viral sequences between the transcription units.

    PubMed

    Rennick, Linda J; Duprex, W Paul; Rima, Bert K

    2007-10-01

    Transcription from morbillivirus genomes commences at a single promoter in the 3' non-coding terminus, with the six genes being transcribed sequentially. The 3' and 5' untranslated regions (UTRs) of the genes (mRNA sense), together with the intergenic trinucleotide spacer, comprise the non-coding sequences (NCS) of the virus and contain the conserved gene end and gene start signals, respectively. Bicistronic minigenomes containing transcription units (TUs) encoding autofluorescent reporter proteins separated by measles virus (MV) NCS were used to give a direct estimation of gene expression in single, living cells by assessing the relative amounts of each fluorescent protein in each cell. Initially, five minigenomes containing each of the MV NCS were generated. Assays were developed to determine the amount of each fluorescent protein in cells at both cell population and single-cell levels. This revealed significant variations in gene expression between cells expressing the same NCS-containing minigenome. The minigenome containing the M/F NCS produced significantly lower amounts of fluorescent protein from the second TU (TU2), compared with the other minigenomes. A minigenome with a truncated F 5' UTR had increased expression from TU2. This UTR is 524 nt longer than the other MV 5' UTRs. Insertions into the 5' UTR of the enhanced green fluorescent protein gene in the minigenome containing the N/P NCS showed that specific sequences, rather than just the additional length of F 5' UTR, govern this decreased expression from TU2.

  15. HIV-1 Tat protein promotes formation of more-processive elongation complexes.

    PubMed Central

    Marciniak, R A; Sharp, P A

    1991-01-01

    The Tat protein of HIV-1 trans-activates transcription in vitro in a cell-free extract of HeLa nuclei. Quantitative analysis of the efficiency of elongation revealed that a majority of the elongation complexes generated by the HIV-1 promoter were not highly processive and terminated within the first 500 nucleotides. Tat trans-activation of transcription from the HIV-1 promoter resulted from an increase in processive character of the elongation complexes. More specifically, the analysis suggests that there exist two classes of elongation complexes initiating from the HIV promoter: a less-processive form and a more-processive form. Addition of purified Tat protein was found to increase the abundance of the more-processive class of elongation complex. The purine nucleoside analog, 5,6-dichloro-1-beta-D-ribofuranosylbenzimidazole (DRB) inhibits transcription in this reaction by decreasing the efficiency of elongation. Surprisingly, stimulation of transcription elongation by Tat was preferentially inhibited by the addition of DRB. Images PMID:1756726

  16. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB

    PubMed Central

    Dunbrack, Roland L.

    2012-01-01

    Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020

  17. Prevalence of transcription promoters within archaeal operons and coding sequences

    PubMed Central

    Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

    2009-01-01

    Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ∼64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements. PMID:19536208

  18. Sequence analysis and expression of the M1 and M2 matrix protein genes of hirame rhabdovirus (HIRRV)

    USGS Publications Warehouse

    Nishizawa, T.; Kurath, G.; Winton, J.R.

    1997-01-01

    We have cloned and sequenced a 2318 nucleotide region of the genomic RNA of hirame rhabdovirus (HIRRV), an important viral pathogen of Japanese flounder Paralichthys olivaceus. This region comprises approximately two-thirds of the 3' end of the nucleocapsid protein (N) gene and the complete matrix protein (M1 and M2) genes with the associated intergenic regions. The partial N gene sequence was 812 nucleotides in length with an open reading frame (ORF) that encoded the carboxyl-terminal 250 amino acids of the N protein. The M1 and M2 genes were 771 and 700 nucleotides in length, respectively, with ORFs encoding proteins of 227 and 193 amino acids. The M1 gene sequence contained an additional small ORF that could encode a highly basic, arginine-rich protein of 25 amino acids. Comparisons of the N, M1, and M2 gene sequences of HIRRV with the corresponding sequences of the fish rhabdoviruses, infectious hematopoietic necrosis virus (IHNV) or viral hemorrhagic septicemia virus (VHSV) indicated that HIRRV was more closely related to IHNV than to VHSV, but was clearly distinct from either. The putative consensus gene termination sequence for IHNV and VHSV, AGAYAG(A)(7), was present in the N-M1, M1-M2, and M2-G intergenic regions of HIRRV as were the putative transcription initiation sequences YGGCAC and AACA. An Escherichia coli expression system was used to produce recombinant proteins from the M1 and M2 genes of HIRRV. These were the same size as the authentic M1 and M2 proteins and reacted with anti-HIRRV rabbit serum in western blots. These reagents can be used for further study of the fish immune response and to test novel control methods.

  19. Molecular cloning of actin genes in Trichomonas vaginalis and phylogeny inferred from actin sequences.

    PubMed

    Bricheux, G; Brugerolle, G

    1997-08-01

    The parasitic protozoan Trichomonas vaginalis is known to contain the ubiquitous and highly conserved protein actin. A genomic library and a cDNA library have been screened to identify and clone the actin gene(s) of T. vaginalis. The nucleotide sequence of one gene and its flanking regions have been determined. The open reading frame encodes a protein of 376 amino acids. The sequence is not interrupted by any introns and the promoter could be represented by a 10 bp motif close to a consensus motif also found upstream of most sequenced T. vaginalis genes. The five different clones isolated from the cDNA library have similar sequences and encode three actin proteins differing only by one or two amino acids. A phylogenetic analysis of 31 actin sequences by distance matrix and parsimony methods, using centractin as outgroup, gives congruent trees with Parabasala branching above Diplomonadida.

  20. Characterization of four heat-shock protein genes from Nile tilapia (Oreochromis niloticus) and demonstration of the inducible transcriptional activity of Hsp70 promoter.

    PubMed

    Zhang, Lili; Sun, Chengfei; Ye, Xing; Zou, Shuming; Lu, Maixin; Liu, Zhigang; Tian, Yuanyuan

    2014-02-01

    Heat-shock proteins (Hsps), known as stress proteins and extrinsic chaperones, play important roles in the folding, translocation, and refolding/degradation of proteins. In this study, we identified four Hsps in Nile tilapia (Oreochromis niloticus), which display conserved Hsp characteristics in their predicted amino acid sequences. Further analyses on the structures, homology, and phylogenetics revealed that the four Hsps belong to Hsp70 family. One of them does not contain introns and is named Hsp70, while all the other three contain introns and are named Hsc70-1, Hsc70-2, and Hsc70-3. Expressions of the four Hsp proteins were observed in all examined tissues. Six hours after infection of Streptococcus agalactiae in Nile tilapia, the expression of Hsp70 was significantly increased in the liver, head kidney, spleen and gill, while Hsc70s' expression was unchanged in all examined tissues except the head kidney that showed significantly reduced expression of both Hsc70-2 and Hsc70-3. These results suggest that Hsp70 may participate in the defense against S. agalactiae infection. We then isolated the promoter of Hsp70 gene and inserted it into the donor plasmid of Tgf2 transposon system containing green fluorescent protein (GFP) gene. The plasmid was microinjected into zebrafish embryos, where the expression of GFP was induced by heat shock, S. agalactiae immersion challenge, indicating that the isolated Hsp70 promoter has transcriptional activity and is inducible by both heat shock and bacterial challenge. This promoter may facilitate the future construction of disease-resistant transgenic fish. The work also contributes to the further study of immune response of tilapia after bacterial infection.