consensus protein sequence: Topics by Science.gov

Sample records for consensus protein sequence

Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

PubMed Central

Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

1993-01-01

The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
The hypervariable region 1 protein of hepatitis C virus broadly reactive with sera of patients with chronic hepatitis C has a similar amino acid sequence with the consensus sequence.

PubMed

Watanabe, K; Yoshioka, K; Ito, H; Ishigami, M; Takagi, K; Utsunomiya, S; Kobayashi, M; Kishimoto, H; Yano, M; Kakumu, S

1999-11-10

Hypervariable region 1 (HVR1) proteins of hepatitis C virus (HCV) have been reported to react broadly with sera of patients with HCV infection. However, the variability of the broad reactivity of individual HVR1 proteins has not been elucidated. We assessed the reactivity of 25 different HVR1 proteins (genotype 1b) with sera of 81 patients with HCV infection (genotype 1b) by Western blot. HVR1 proteins reacted with 2-60 sera. The number of sera reactive with each HVR1 protein significantly correlated with the number of amino acid residues identical to the consensus sequence defined by Puntoriero et al. (G. Puntoriero, A. Lahm, S. Zucchelli, B. B. Ercole, R. Tafi, M. Penzzanera, M. U. Mondelli, R. Cortese, A. Tramontano, G. Galfre', and A. Nicosia. 1998. EMBO J. 17, 3521-3533. ) (r = 0.561, P < 0.005). The most widely reactive HVR1 protein, 12-22, had a sequence similar to the consensus sequence. The peptide with C-terminal 13-amino-acids sequence of HVR1 protein 12-22 (NH2-CSFTSLFTPGPSQK) was injected into rabbits as an immunogen. The rabbit immune sera reacted with 9 of 25 HVR1 proteins of genotype 1b including HVR1 protein 12-22 and with 3 of 12 proteins of genotype 2a. These results indicate that the HVR1 protein broadly reactive with patients' sera has a sequence similar to the consensus sequence, can induce broadly reactive sera, and could be one of the candidate immunogens in a prophylactic vaccine against HCV. Copyright 1999 Academic Press.
The Functional Human C-Terminome

PubMed Central

Hedden, Michael; Lyon, Kenneth F.; Brooks, Steven B.; David, Roxanne P.; Limtong, Justin; Newsome, Jacklyn M.; Novakovic, Nemanja; Rajasekaran, Sanguthevar; Thapar, Vishal; Williams, Sean R.; Schiller, Martin R.

2016-01-01

All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new “C-terminome” database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3–10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com. PMID:27050421
Simplifying complex sequence information: a PCP-consensus protein binds antibodies against all four Dengue serotypes.

PubMed

Bowen, David M; Lewis, Jessica A; Lu, Wenzhe; Schein, Catherine H

2012-09-14

Designing proteins that reflect the natural variability of a pathogen is essential for developing novel vaccines and drugs. Flaviviruses, including Dengue (DENV) and West Nile (WNV), evolve rapidly and can "escape" neutralizing monoclonal antibodies by mutation. Designing antigens that represent many distinct strains is important for DENV, where infection with a strain from one of the four serotypes may lead to severe hemorrhagic disease on subsequent infection with a strain from another serotype. Here, a DENV physicochemical property (PCP)-consensus sequence was derived from 671 unique sequences from the Flavitrack database. PCP-consensus proteins for domain 3 of the envelope protein (EdomIII) were expressed from synthetic genes in Escherichia coli. The ability of the purified consensus proteins to bind polyclonal antibodies generated in response to infection with strains from each of the four DENV serotypes was determined. The initial consensus protein bound antibodies from DENV-1-3 in ELISA and Western blot assays. This sequence was altered in 3 steps to incorporate regions of maximum variability, identified as significant changes in the PCPs, characteristic of DENV-4 strains. The final protein was recognized by antibodies against all four serotypes. Two amino acids essential for efficient binding to all DENV antibodies are part of a discontinuous epitope previously defined for a neutralizing monoclonal antibody. The PCP-consensus method can significantly reduce the number of experiments required to define a multivalent antigen, which is particularly important when dealing with pathogens that must be tested at higher biosafety levels. Copyright © 2012 Elsevier Ltd. All rights reserved.
Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

PubMed Central

Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

2015-01-01

The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945
Efficient and Accurate Algorithm for Cleaved Fragments Prediction (CFPA) in Protein Sequences Dataset Based on Consensus and Its Variants: A Novel Degradomics Prediction Application.

PubMed

El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H

2017-01-01

Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.
Comparative analysis of seven viral nuclear export signals (NESs) reveals the crucial role of nuclear export mediated by the third NES consensus sequence of nucleoprotein (NP) in influenza A virus replication.

PubMed

Chutiwitoonchai, Nopporn; Kakisaka, Michinori; Yamada, Kazunori; Aida, Yoko

2014-01-01

The assembly of influenza virus progeny virions requires machinery that exports viral genomic ribonucleoproteins from the cell nucleus. Currently, seven nuclear export signal (NES) consensus sequences have been identified in different viral proteins, including NS1, NS2, M1, and NP. The present study examined the roles of viral NES consensus sequences and their significance in terms of viral replication and nuclear export. Mutation of the NP-NES3 consensus sequence resulted in a failure to rescue viruses using a reverse genetics approach, whereas mutation of the NS2-NES1 and NS2-NES2 sequences led to a strong reduction in viral replication kinetics compared with the wild-type sequence. While the viral replication kinetics for other NES mutant viruses were also lower than those of the wild-type, the difference was not so marked. Immunofluorescence analysis after transient expression of NP-NES3, NS2-NES1, or NS2-NES2 proteins in host cells showed that they accumulated in the cell nucleus. These results suggest that the NP-NES3 consensus sequence is mostly required for viral replication. Therefore, each of the hydrophobic (Φ) residues within this NES consensus sequence (Φ1, Φ2, Φ3, or Φ4) was mutated, and its viral replication and nuclear export function were analyzed. No viruses harboring NP-NES3 Φ2 or Φ3 mutants could be rescued. Consistent with this, the NP-NES3 Φ2 and Φ3 mutants showed reduced binding affinity with CRM1 in a pull-down assay, and both accumulated in the cell nucleus. Indeed, a nuclear export assay revealed that these mutant proteins showed lower nuclear export activity than the wild-type protein. Moreover, the Φ2 and Φ3 residues (along with other Φ residues) within the NP-NES3 consensus were highly conserved among different influenza A viruses, including human, avian, and swine. Taken together, these results suggest that the Φ2 and Φ3 residues within the NP-NES3 protein are important for its nuclear export function during viral replication.
Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

PubMed

Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

2015-10-01

Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.
Identification and application of self-binding zipper-like sequences in SARS-CoV spike protein.

PubMed

Zhang, Si Min; Liao, Ying; Neo, Tuan Ling; Lu, Yanning; Liu, Ding Xiang; Vahlne, Anders; Tam, James P

2018-05-22

Self-binding peptides containing zipper-like sequences, such as the Leu/Ile zipper sequence within the coiled coil regions of proteins and the cross-β spine steric zippers within the amyloid-like fibrils, could bind to the protein-of-origin through homophilic sequence-specific zipper motifs. These self-binding sequences represent opportunities for the development of biochemical tools and/or therapeutics. Here, we report on the identification of a putative self-binding β-zipper-forming peptide within the severe acute respiratory syndrome-associated coronavirus spike (S) protein and its application in viral detection. Peptide array scanning of overlapping peptides covering the entire length of S protein identified 34 putative self-binding peptides of six clusters, five of which contained octapeptide core consensus sequences. The Cluster I consensus octapeptide sequence GINITNFR was predicted by the Eisenberg's 3D profile method to have high amyloid-like fibrillation potential through steric β-zipper formation. Peptide C6 containing the Cluster I consensus sequence was shown to oligomerize and form amyloid-like fibrils. Taking advantage of this, C6 was further applied to detect the S protein expression in vitro by fluorescence staining. Meanwhile, the coiled-coil-forming Leu/Ile heptad repeat sequences within the S protein were under-represented during peptide array scanning, in agreement with that long peptide lengths were required to attain high helix-mediated interaction avidity. The data suggest that short β-zipper-like self-binding peptides within the S protein could be identified through combining the peptide scanning and predictive methods, and could be exploited as biochemical detection reagents for viral infection. Copyright © 2018. Published by Elsevier Ltd.
Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence

PubMed Central

Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra

2016-01-01

Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895
Embedding strategies for effective use of information from multiple sequence alignments.

PubMed Central

Henikoff, S.; Henikoff, J. G.

1997-01-01

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

PubMed

Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

2015-01-01

Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Modeling repetitive, non‐globular proteins

PubMed Central

Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun

2016-01-01

Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323
Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.

1987-01-01

The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homologymore » (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.« less
Identification and Structural Characterization of the ALIX-Binding Late Domains of Simian Immunodeficiency Virus SIV mac239 and SIV agmTan-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Q Zhai; M Landesman; H Robinson

2011-12-31

Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub MAC239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less
Identification and Structural Characterization of the ALIX-Binding Late Domains of Simian Immunodeficiency Virus SIVmac239 and SIVagmTan-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhai, Q.; Robinson, H.; Landesman, M. B.

2011-01-01

Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub mac239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less
GeneSilico protein structure prediction meta-server.

PubMed

Kurowski, Michal A; Bujnicki, Janusz M

2003-07-01

Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
GeneSilico protein structure prediction meta-server

PubMed Central

Kurowski, Michal A.; Bujnicki, Janusz M.

2003-01-01

Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313
A core phylogeny of Dictyostelia inferred from genomes representative of the eight major and minor taxonomic divisions of the group.

PubMed

Singh, Reema; Schilde, Christina; Schaap, Pauline

2016-11-17

Dictyostelia are a well-studied group of organisms with colonial multicellularity, which are members of the mostly unicellular Amoebozoa. A phylogeny based on SSU rDNA data subdivided all Dictyostelia into four major groups, but left the position of the root and of six group-intermediate taxa unresolved. Recent phylogenies inferred from 30 or 213 proteins from sequenced genomes, positioned the root between two branches, each containing two major groups, but lacked data to position the group-intermediate taxa. Since the positions of these early diverging taxa are crucial for understanding the evolution of phenotypic complexity in Dictyostelia, we sequenced six representative genomes of early diverging taxa. We retrieved orthologs of 47 housekeeping proteins with an average size of 890 amino acids from six newly sequenced and eight published genomes of Dictyostelia and unicellular Amoebozoa and inferred phylogenies from single and concatenated protein sequence alignments. Concatenated alignments of all 47 proteins, and four out of five subsets of nine concatenated proteins all produced the same consensus phylogeny with 100% statistical support. Trees inferred from just two out of the 47 proteins, individually reproduced the consensus phylogeny, highlighting that single gene phylogenies will rarely reflect correct species relationships. However, sets of two or three concatenated proteins again reproduced the consensus phylogeny, indicating that a small selection of genes suffices for low cost classification of as yet unincorporated or newly discovered dictyostelid and amoebozoan taxa by gene amplification. The multi-locus consensus phylogeny shows that groups 1 and 2 are sister clades in branch I, with the group-intermediate taxon D. polycarpum positioned as outgroup to group 2. Branch II consists of groups 3 and 4, with the group-intermediate taxon Polysphondylium violaceum positioned as sister to group 4, and the group-intermediate taxon Dictyostelium polycephalum branching at the base of that whole clade. Given the data, the approximately unbiased test rejects all alternative topologies favoured by SSU rDNA and individual proteins with high statistical support. The test also rejects monophyletic origins for the genera Acytostelium, Polysphondylium and Dictyostelium. The current position of Acytostelium ellipticum in the consensus phylogeny indicates that somatic cells were lost twice in Dictyostelia.
Predictors of natively unfolded proteins: unanimous consensus score to detect a twilight zone between order and disorder in generic datasets.

PubMed

Deiana, Antonio; Giansanti, Andrea

2010-04-21

Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated.

Predictors of natively unfolded proteins: unanimous consensus score to detect a twilight zone between order and disorder in generic datasets

PubMed Central

2010-01-01

Background Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. Results In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Conclusions Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated. PMID:20409339
A possible structural model of members of the CPF family of cuticular proteins implicating binding to components other than chitin

PubMed Central

Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Willis, Judith H.; Hamodrakas, Stavros J.

2010-01-01

The physical properties of cuticle are determined by the structure of its two major components, cuticular proteins (CPs) and chitin, and, also, by their interactions. A common consensus region (extended R&R Consensus) found in the majority of cuticular proteins, the CPRs, binds to chitin. Previous work established that β-pleated sheet predominates in the Consensus region and we proposed that it is responsible for the formation of helicoidal cuticle. Remote sequence similarity between CPRs and a lipocalin, bovine plasma retinol binding protein (RBP), led us to suggest an antiparallel β-sheet half-barrel structure as the basic folding motif of the R&R Consensus. There are several other families of cuticular proteins. One of the best defined is CPF. Its four members in Anopheles gambiae are expressed during the early stages of either pharate pupal or pharate adult development, suggesting that the proteins contribute to the outer regions of the cuticle, the epi- and/or exocuticle. These proteins did not bind to chitin in the same assay used successfully for CPRs. Although CPFs are distinct in sequence from CPRs, the same lipocalin could also be used to derive homology models for one Anopheles gambiae and one Drosophila melanogaster CPF. For the CPFs, the basic folding motif predicted is an eight-stranded, antiparallel β-sheet, full-barrel structure. Possible implications of this structure are discussed and docking experiments were carried out with one possible Drosophila ligand, 7(Z), 11(Z)-heptacosadiene. PMID:20417215
Diversity and Evolution of Bacterial Twin Arginine Translocase Protein, TatC, Reveals a Protein Secretion System That Is Evolving to Fit Its Environmental Niche

PubMed Central

Simone, Domenico; Bay, Denice C.; Leach, Thorin; Turner, Raymond J.

2013-01-01

Background The twin-arginine translocation (Tat) protein export system enables the transport of fully folded proteins across a membrane. This system is composed of two integral membrane proteins belonging to TatA and TatC protein families and in some systems a third component, TatB, a homolog of TatA. TatC participates in substrate protein recognition through its interaction with a twin arginine leader peptide sequence. Methodology/Principal Findings The aim of this study was to explore TatC diversity, evolution and sequence conservation in bacteria to identify how TatC is evolving and diversifying in various bacterial phyla. Surveying bacterial genomes revealed that 77% of all species possess one or more tatC loci and half of these classes possessed only tatC and tatA genes. Phylogenetic analysis of diverse TatC homologues showed that they were primarily inherited but identified a small subset of taxonomically unrelated bacteria that exhibited evidence supporting lateral gene transfer within an ecological niche. Examination of bacilli tatCd/tatCy isoform operons identified a number of known and potentially new Tat substrate genes based on their frequent association to tatC loci. Evolutionary analysis of these Bacilli isoforms determined that TatCy was the progenitor of TatCd. A bacterial TatC consensus sequence was determined and highlighted conserved and variable regions within a three dimensional model of the Escherichia coli TatC protein. Comparative analysis between the TatC consensus sequence and Bacilli TatCd/y isoform consensus sequences revealed unique sites that may contribute to isoform substrate specificity or make TatA specific contacts. Synonymous to non-synonymous nucleotide substitution analyses of bacterial tatC homologues determined that tatC sequence variation differs dramatically between various classes and suggests TatC specialization in these species. Conclusions/Significance TatC proteins appear to be diversifying within particular bacterial classes and its specialization may be driven by the substrates it transports and the environment of its host. PMID:24236045
Prototype foamy virus envelope glycoprotein leader peptide processing is mediated by a furin-like cellular protease, but cleavage is not essential for viral infectivity.

PubMed

Duda, Anja; Stange, Annett; Lüftenegger, Daniel; Stanke, Nicole; Westphal, Dana; Pietschmann, Thomas; Eastman, Scott W; Linial, Maxine L; Rethwilm, Axel; Lindemann, Dirk

2004-12-01

Analogous to cellular glycoproteins, viral envelope proteins contain N-terminal signal sequences responsible for targeting them to the secretory pathway. The prototype foamy virus (PFV) envelope (Env) shows a highly unusual biosynthesis. Its precursor protein has a type III membrane topology with both the N and C terminus located in the cytoplasm. Coexpression of FV glycoprotein and interaction of its leader peptide (LP) with the viral capsid is essential for viral particle budding and egress. Processing of PFV Env into the particle-associated LP, surface (SU), and transmembrane (TM) subunits occur posttranslationally during transport to the cell surface by yet-unidentified cellular proteases. Here we provide strong evidence that furin itself or a furin-like protease and not the signal peptidase complex is responsible for both processing events. N-terminal protein sequencing of the SU and TM subunits of purified PFV Env-immunoglobulin G immunoadhesin identified furin consensus sequences upstream of both cleavage sites. Mutagenesis analysis of two overlapping furin consensus sequences at the PFV LP/SU cleavage site in the wild-type protein confirmed the sequencing data and demonstrated utilization of only the first site. Fully processed SU was almost completely absent in viral particles of mutants having conserved arginine residues replaced by alanines in the first furin consensus sequence, but normal processing was observed upon mutation of the second motif. Although these mutants displayed a significant loss in infectivity as a result of reduced particle release, no correlation to processing inhibition was observed, since another mutant having normal LP/SU processing had a similar defect.
Novel Bioinformatics-Based Approach for Proteomic Biomarkers Prediction of Calpain-2 & Caspase-3 Protease Fragmentation: Application to βII-Spectrin Protein

NASA Astrophysics Data System (ADS)

El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Kobeissy, Firas

2017-01-01

The crucial biological role of proteases has been visible with the development of degradomics discipline involved in the determination of the proteases/substrates resulting in breakdown-products (BDPs) that can be utilized as putative biomarkers associated with different biological-clinical significance. In the field of cancer biology, matrix metalloproteinases (MMPs) have shown to result in MMPs-generated protein BDPs that are indicative of malignant growth in cancer, while in the field of neural injury, calpain-2 and caspase-3 proteases generate BDPs fragments that are indicative of different neural cell death mechanisms in different injury scenarios. Advanced proteomic techniques have shown a remarkable progress in identifying these BDPs experimentally. In this work, we present a bioinformatics-based prediction method that identifies protease-associated BDPs with high precision and efficiency. The method utilizes state-of-the-art sequence matching and alignment algorithms. It starts by locating consensus sequence occurrences and their variants in any set of protein substrates, generating all fragments resulting from cleavage. The complexity exists in space O(mn) as well as in O(Nmn) time, where N, m, and n are the number of protein sequences, length of the consensus sequence, and length per protein sequence, respectively. Finally, the proposed methodology is validated against βII-spectrin protein, a brain injury validated biomarker.
Conservative secondary structure motifs already present in early-stage folding (in silico) as found in serpines family.

PubMed

Brylinski, Michal; Konieczny, Leszek; Kononowicz, Andrzej; Roterman, Irena

2008-03-21

The well-known procedure implemented in ClustalW oriented on the sequence comparison was applied to structure comparison. The consensus sequence as well as consensus structure has been defined for proteins belonging to serpine family. The structure of early stage intermediate was the object for similarity search. The high values of W(sequence) appeared to be accordant with high values of W(structure) making possible structure comparison using common criteria for sequence and structure comparison. Since the early stage structural form has been created according to limited conformational sub-space which does not include the beta-structure (this structure is mediated by C7eq structural form), is particularly important to see, that the C7eq structural form may be treated as the seed for beta-structure present in the final native structure of protein. The applicability of ClustalW procedure to structure comparison makes these two comparisons unified.
Peptide Array X-Linking (PAX): A New Peptide-Protein Identification Approach

PubMed Central

Okada, Hirokazu; Uezu, Akiyoshi; Soderblom, Erik J.; Moseley, M. Arthur; Gertler, Frank B.; Soderling, Scott H.

2012-01-01

Many protein interaction domains bind short peptides based on canonical sequence consensus motifs. Here we report the development of a peptide array-based proteomics tool to identify proteins directly interacting with ligand peptides from cell lysates. Array-formatted bait peptides containing an amino acid-derived cross-linker are photo-induced to crosslink with interacting proteins from lysates of interest. Indirect associations are removed by high stringency washes under denaturing conditions. Covalently trapped proteins are subsequently identified by LC-MS/MS and screened by cluster analysis and domain scanning. We apply this methodology to peptides with different proline-containing consensus sequences and show successful identifications from brain lysates of known and novel proteins containing polyproline motif-binding domains such as EH, EVH1, SH3, WW domains. These results suggest the capacity of arrayed peptide ligands to capture and subsequently identify proteins by mass spectrometry is relatively broad and robust. Additionally, the approach is rapid and applicable to cell or tissue fractions from any source, making the approach a flexible tool for initial protein-protein interaction discovery. PMID:22606326
ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

PubMed

Roca, Alberto I

2014-01-01

The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.

PubMed

Rogan, P K; Schneider, T D

1995-01-01

Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

PubMed Central

Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

1985-01-01

The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815
Molecular cloning of MSSP-2, a c-myc gene single-strand binding protein: characterization of binding specificity and DNA replication activity.

PubMed Central

Takai, T; Nishita, Y; Iguchi-Ariga, S M; Ariga, H

1994-01-01

We have previously reported the human cDNA encoding MSSP-1, a sequence-specific double- and single-stranded DNA binding protein [Negishi, Nishita, Saëgusa, Kakizaki, Galli, Kihara, Tamai, Miyajima, Iguchi-Ariga and Ariga (1994) Oncogene, 9, 1133-1143]. MSSP-1 binds to a DNA replication origin/transcriptional enhancer of the human c-myc gene and has turned out to be identical with Scr2, a human protein which complements the defect of cdc2 kinase in S.pombe [Kataoka and Nojima (1994) Nucleic Acid Res., 22, 2687-2693]. We have cloned the cDNA for MSSP-2, another member of the MSSP family of proteins. The MSSP-2 cDNA shares highly homologous sequences with MSSP-1 cDNA, except for the insertion of 48 bp coding 16 amino acids near the C-terminus. Like MSSP-1, MSSP-2 has RNP-1 consensus sequences. The results of the experiments using bacterially expressed MSSP-2, and its deletion mutants, as histidine fusion proteins suggested that the binding specificity of MSSP-2 to double- and single-stranded DNA is the same as that of MSSP-1, and that the RNP consensus sequences are required for the DNA binding of the protein. MSSP-2 stimulated the DNA replication of an SV40-derived plasmid containing the binding sequence for MSSP-1 or -2. MSSP-2 is hence suggested to play an important role in regulation of DNA replication. Images PMID:7838710
Integration of transcriptomic and proteomic data from a single wheat cultivar provides new tools for understanding the roles of individual alpha gliadin proteins in flour quality and celiac disease

USDA-ARS?s Scientific Manuscript database

One-hundred-thirty-six expressed sequence tags (ESTs) encoding alpha gliadins from Triticum aestivum cv Butte 86 were identified in public databases and assembled into 19 contigs. Consensus sequences for 12 of the contigs encoded complete alpha gliadin proteins, but only two were identical to protei...
Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software.

PubMed

Nakano, Shogo; Asano, Yasuhisa

2015-02-03

Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software

NASA Astrophysics Data System (ADS)

Nakano, Shogo; Asano, Yasuhisa

2015-02-01

Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

PubMed Central

2014-01-01

Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393
The Malarial Host-Targeting Signal Is Conserved in the Irish Potato Famine Pathogen

PubMed Central

Liolios, Konstantinos; Win, Joe; Kanneganti, Thirumala-Devi; Young, Carolyn; Kamoun, Sophien; Haldar, Kasturi

2006-01-01

Animal and plant eukaryotic pathogens, such as the human malaria parasite Plasmodium falciparum and the potato late blight agent Phytophthora infestans, are widely divergent eukaryotic microbes. Yet they both produce secretory virulence and pathogenic proteins that alter host cell functions. In P. falciparum, export of parasite proteins to the host erythrocyte is mediated by leader sequences shown to contain a host-targeting (HT) motif centered on an RxLx (E, D, or Q) core: this motif appears to signify a major pathogenic export pathway with hundreds of putative effectors. Here we show that a secretory protein of P. infestans, which is perceived by plant disease resistance proteins and induces hypersensitive plant cell death, contains a leader sequence that is equivalent to the Plasmodium HT-leader in its ability to export fusion of green fluorescent protein (GFP) from the P. falciparum parasite to the host erythrocyte. This export is dependent on an RxLR sequence conserved in P. infestans leaders, as well as in leaders of all ten secretory oomycete proteins shown to function inside plant cells. The RxLR motif is also detected in hundreds of secretory proteins of P. infestans, Phytophthora sojae, and Phytophthora ramorum and has high value in predicting host-targeted leaders. A consensus motif further reveals E/D residues enriched within ~25 amino acids downstream of the RxLR, which are also needed for export. Together the data suggest that in these plant pathogenic oomycetes, a consensus HT motif may reside in an extended sequence of ~25–30 amino acids, rather than in a short linear sequence. Evidence is presented that although the consensus is much shorter in P. falciparum, information sufficient for vacuolar export is contained in a region of ~30 amino acids, which includes sequences flanking the HT core. Finally, positional conservation between Phytophthora RxLR and P. falciparum RxLx (E, D, Q) is consistent with the idea that the context of their presentation is constrained. These studies provide the first evidence to our knowledge that eukaryotic microbes share equivalent pathogenic HT signals and thus conserved mechanisms to access host cells across plant and animal kingdoms that may present unique targets for prophylaxis across divergent pathogens. PMID:16733545
A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins

PubMed Central

Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.

2013-01-01

The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins). PMID:23326595
Bioinformatics prediction of siRNAs as potential antiviral agents against dengue viruses

PubMed Central

Villegas-Rosales, Paula M; Méndez-Tenorio, Alfonso; Ortega-Soto, Elizabeth; Barrón, Blanca L

2012-01-01

Dengue virus (DENV 1-4) represents the major emerging arthropod-borne viral infection in the world. Currently, there is neither an available vaccine nor a specific treatment. Hence, there is a need of antiviral drugs for these viral infections; we describe the prediction of short interfering RNA (siRNA) as potential therapeutic agents against the four DENV serotypes. Our strategy was to carry out a series of multiple alignments using ClustalX program to find conserved sequences among the four DENV serotype genomes to obtain a consensus sequence for siRNAs design. A highly conserved sequence among the four DENV serotypes, located in the encoding sequence for NS4B and NS5 proteins was found. A total of 2,893 complete DENV genomes were downloaded from the NCBI, and after a depuration procedure to identify identical sequences, 220 complete DENV genomes were left. They were edited to select the NS4B and NS5 sequences, which were aligned to obtain a consensus sequence. Three different servers were used for siRNA design, and the resulting siRNAs were aligned to identify the most prevalent sequences. Three siRNAs were chosen, one targeted the genome region that codifies for NS4B protein and the other two; the region for NS5 protein. Predicted secondary structure for DENV genomes was used to demonstrate that the siRNAs were able to target the viral genome forming double stranded structures, necessary to activate the RNA silencing machinery. PMID:22829722
Sequence specificity of single-stranded DNA-binding proteins: a novel DNA microarray approach

PubMed Central

Morgan, Hugh P.; Estibeiro, Peter; Wear, Martin A.; Max, Klaas E.A.; Heinemann, Udo; Cubeddu, Liza; Gallagher, Maurice P.; Sadler, Peter J.; Walkinshaw, Malcolm D.

2007-01-01

We have developed a novel DNA microarray-based approach for identification of the sequence-specificity of single-stranded nucleic-acid-binding proteins (SNABPs). For verification, we have shown that the major cold shock protein (CspB) from Bacillus subtilis binds with high affinity to pyrimidine-rich sequences, with a binding preference for the consensus sequence, 5′-GTCTTTG/T-3′. The sequence was modelled onto the known structure of CspB and a cytosine-binding pocket was identified, which explains the strong preference for a cytosine base at position 3. This microarray method offers a rapid high-throughput approach for determining the specificity and strength of ss DNA–protein interactions. Further screening of this newly emerging family of transcription factors will help provide an insight into their cellular function. PMID:17488853
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

The combinatorial PP1-binding consensus Motif (R/K)x( (0,1))V/IxFxx(R/K)x(R/K) is a new apoptotic signature.

PubMed

Godet, Angélique N; Guergnon, Julien; Maire, Virginie; Croset, Amélie; Garcia, Alphonse

2010-04-01

Previous studies established that PP1 is a target for Bcl-2 proteins and an important regulator of apoptosis. The two distinct functional PP1 consensus docking motifs, R/Kx((0,1))V/IxF and FxxR/KxR/K, involved in PP1 binding and cell death were previously characterized in the BH1 and BH3 domains of some Bcl-2 proteins. In this study, we demonstrate that DPT-AIF(1), a peptide containing the AIF(562-571) sequence located in a c-terminal domain of AIF, is a new PP1 interacting and cell penetrating molecule. We also showed that DPT-AIF(1) provoked apoptosis in several human cell lines. Furthermore, DPT-APAF(1) a bi-partite cell penetrating peptide containing APAF-1(122-131), a non penetrating sequence from APAF-1 protein, linked to our previously described DPT-sh1 peptide shuttle, is also a PP1-interacting death molecule. Both AIF(562-571) and APAF-1(122-131) sequences contain a common R/Kx((0,1))V/IxFxxR/KxR/K motif, shared by several proteins involved in control of cell survival pathways. This motif combines the two distinct PP1c consensus docking motifs initially identified in some Bcl-2 proteins. Interestingly DPT-AIF(2) and DPT-APAF(2) that carry a F to A mutation within this combinatorial motif, no longer exhibited any PP1c binding or apoptotic effects. Moreover the F to A mutation in DPT-AIF(2) also suppressed cell penetration. These results indicate that the combinatorial PP1c docking motif R/Kx((0,1))V/IxFxxR/KxR/K, deduced from AIF(562-571) and APAF-1(122-131) sequences, is a new PP1c-dependent Apoptotic Signature. This motif is also a new tool for drug design that could be used to characterize potential anti-tumour molecules.
A Bioinformatics-Based Alternative mRNA Splicing Code that May Explain Some Disease Mutations Is Conserved in Animals.

PubMed

Qu, Wen; Cingolani, Pablo; Zeeberg, Barry R; Ruden, Douglas M

2017-01-01

Deep sequencing of cDNAs made from spliced mRNAs indicates that most coding genes in many animals and plants have pre-mRNA transcripts that are alternatively spliced. In pre-mRNAs, in addition to invariant exons that are present in almost all mature mRNA products, there are at least 6 additional types of exons, such as exons from alternative promoters or with alternative polyA sites, mutually exclusive exons, skipped exons, or exons with alternative 5' or 3' splice sites. Our bioinformatics-based hypothesis is that, in analogy to the genetic code, there is an "alternative-splicing code" in introns and flanking exon sequences, analogous to the genetic code, that directs alternative splicing of many of the 36 types of introns. In humans, we identified 42 different consensus sequences that are each present in at least 100 human introns. 37 of the 42 top consensus sequences are significantly enriched or depleted in at least one of the 36 types of introns. We further supported our hypothesis by showing that 96 out of 96 analyzed human disease mutations that affect RNA splicing, and change alternative splicing from one class to another, can be partially explained by a mutation altering a consensus sequence from one type of intron to that of another type of intron. Some of the alternative splicing consensus sequences, and presumably their small-RNA or protein targets, are evolutionarily conserved from 50 plant to animal species. We also noticed the set of introns within a gene usually share the same splicing codes, thus arguing that one sub-type of splicesosome might process all (or most) of the introns in a given gene. Our work sheds new light on a possible mechanism for generating the tremendous diversity in protein structure by alternative splicing of pre-mRNAs.
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

PubMed

Sim, Mikang; Kim, Jaebum

2015-02-01

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
The gamma subunit of transducin is farnesylated.

PubMed Central

Lai, R K; Perez-Sala, D; Cañada, F J; Rando, R R

1990-01-01

Protein prenylation with farnesyl or geranylgeranyl moieties is an important posttranslational modification that affects the activity of such diverse proteins as the nuclear lamins, the yeast mating factor mata, and the ras oncogene products. In this article, we show that whole retinal cultures incorporate radioactive mevalonic acid into proteins of 23-26 kDa and one of 8 kDa. The former proteins are probably the "small" guanine nucleotide-binding regulatory proteins (G proteins) and the 8-kDa protein is the gamma subunit of the well-studied retinal heterotrimeric G protein (transducin). After deprenylating purified transducin and its subunits with Raney nickel or methyl iodide/base, the adducted prenyl group can be identified as an all-trans-farnesyl moiety covalently linked to a cysteine residue. Thus far, prenylation reactions have been found to occur at cysteine in a carboxyl-terminal consensus CAAX sequence, where C is the cysteine, A is an aliphatic amino acid, and X is undefined. Both the alpha and gamma subunits of transducin have this consensus sequence, but only the gamma subunit is prenylated. Therefore, the CAAX motif is not necessary and sufficient to direct prenylation. Finally, since transducin is the best understood G protein, both structurally and mechanistically, the discovery that it is farnesylated should allow for a quantitative understanding of this post-translational modification. Images PMID:2217200
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies.

PubMed

Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

PubMed Central

Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441
Transcription activation mediated by a cyclic AMP receptor protein from Thermus thermophilus HB8.

PubMed

Shinkai, Akeo; Kira, Satoshi; Nakagawa, Noriko; Kashihara, Aiko; Kuramitsu, Seiki; Yokoyama, Shigeyuki

2007-05-01

The extremely thermophilic bacterium Thermus thermophilus HB8, which belongs to the phylum Deinococcus-Thermus, has an open reading frame encoding a protein belonging to the cyclic AMP (cAMP) receptor protein (CRP) family present in many bacteria. The protein named T. thermophilus CRP is highly homologous to the CRP family proteins from the phyla Firmicutes, Actinobacteria, and Cyanobacteria, and it forms a homodimer and interacts with cAMP. CRP mRNA and intracellular cAMP were detected in this strain, which did not drastically fluctuate during cultivation in a rich medium. The expression of several genes was altered upon disruption of the T. thermophilus CRP gene. We found six CRP-cAMP-dependent promoters in in vitro transcription assays involving DNA fragments containing the upstream regions of the genes exhibiting decreased expression in the CRP disruptant, indicating that the CRP is a transcriptional activator. The consensus T. thermophilus CRP-binding site predicted upon nucleotide sequence alignment is 5'-(C/T)NNG(G/T)(G/T)C(A/C)N(A/T)NNTCACAN(G/C)(G/C)-3'. This sequence is unique compared with the known consensus binding sequences of CRP family proteins. A putative -10 hexamer sequence resides at 18 to 19 bp downstream of the predicted T. thermophilus CRP-binding site. The CRP-regulated genes found in this study comprise clustered regularly interspaced short palindromic repeat (CRISPR)-associated (cas) ones, and the genes of a putative transcriptional regulator, a protein containing the exonuclease III-like domain of DNA polymerase, a GCN5-related acetyltransferase homolog, and T. thermophilus-specific proteins of unknown function. These results suggest a role for cAMP signal transduction in T. thermophilus and imply the T. thermophilus CRP is a cAMP-responsive regulator.
Shotgun Protein Sequencing with Meta-contig Assembly*

PubMed Central

Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

2012-01-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278
Shotgun protein sequencing with meta-contig assembly.

PubMed

Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

2012-10-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

PubMed

Reid-Bayliss, Kate S; Loeb, Lawrence A

2017-08-29

Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
Overproduction, purification, and ATPase activity of the Escherichia coli RuvB protein involved in DNA repair.

PubMed Central

Iwasaki, H; Shiba, T; Makino, K; Nakata, A; Shinagawa, H

1989-01-01

The ruvA and ruvB genes of Escherichia coli constitute an operon which belongs to the SOS regulon. Genetic evidence suggests that the products of the ruv operon are involved in DNA repair and recombination. To begin biochemical characterization of these proteins, we developed a plasmid system that overproduced RuvB protein to 20% of total cell protein. Starting from the overproducing system, we purified RuvB protein. The purified RuvB protein behaved like a monomer in gel filtration chromatography and had an apparent relative molecular mass of 38 kilodaltons in sodium dodecyl sulfate-polyacrylamide gel electrophoresis, which agrees with the value predicted from the DNA sequence. The amino acid sequence of the amino-terminal region of the purified protein was analyzed, and the sequence agreed with the one deduced from the DNA sequence. Since the deduced sequence of RuvB protein contained the consensus sequence for ATP-binding proteins, we examined the ATP-binding and ATPase activities of the purified RuvB protein. RuvB protein had a stronger affinity to ADP than to ATP and weak ATPase activity. The results suggest that the weak ATPase activity of RuvB protein is at least partly due to end product inhibition by ADP. Images PMID:2529252
B Cell Receptor Activation Predominantly Regulates AKT-mTORC1/2 Substrates Functionally Related to RNA Processing

PubMed Central

Mohammad, Dara K.; Ali, Raja H.; Turunen, Janne J.; Nore, Beston F.; Smith, C. I. Edvard

2016-01-01

Protein kinase B (AKT) phosphorylates numerous substrates on the consensus motif RXRXXpS/T, a docking site for 14-3-3 interactions. To identify novel AKT-induced phosphorylation events following B cell receptor (BCR) activation, we performed proteomics, biochemical and bioinformatics analyses. Phosphorylated consensus motif-specific antibody enrichment, followed by tandem mass spectrometry, identified 446 proteins, containing 186 novel phosphorylation events. Moreover, we found 85 proteins with up regulated phosphorylation, while in 277 it was down regulated following stimulation. Up regulation was mainly in proteins involved in ribosomal and translational regulation, DNA binding and transcription regulation. Conversely, down regulation was preferentially in RNA binding, mRNA splicing and mRNP export proteins. Immunoblotting of two identified RNA regulatory proteins, RBM25 and MEF-2D, confirmed the proteomics data. Consistent with these findings, the AKT-inhibitor (MK-2206) dramatically reduced, while the mTORC-inhibitor PP242 totally blocked phosphorylation on the RXRXXpS/T motif. This demonstrates that this motif, previously suggested as an AKT target sequence, also is a substrate for mTORC1/2. Proteins with PDZ, PH and/or SH3 domains contained the consensus motif, whereas in those with an HMG-box, H15 domains and/or NF-X1-zinc-fingers, the motif was absent. Proteins carrying the consensus motif were found in all eukaryotic clades indicating that they regulate a phylogenetically conserved set of proteins. PMID:27487157
Molecular cloning of actin genes in Trichomonas vaginalis and phylogeny inferred from actin sequences.

PubMed

Bricheux, G; Brugerolle, G

1997-08-01

The parasitic protozoan Trichomonas vaginalis is known to contain the ubiquitous and highly conserved protein actin. A genomic library and a cDNA library have been screened to identify and clone the actin gene(s) of T. vaginalis. The nucleotide sequence of one gene and its flanking regions have been determined. The open reading frame encodes a protein of 376 amino acids. The sequence is not interrupted by any introns and the promoter could be represented by a 10 bp motif close to a consensus motif also found upstream of most sequenced T. vaginalis genes. The five different clones isolated from the cDNA library have similar sequences and encode three actin proteins differing only by one or two amino acids. A phylogenetic analysis of 31 actin sequences by distance matrix and parsimony methods, using centractin as outgroup, gives congruent trees with Parabasala branching above Diplomonadida.
Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases

PubMed Central

Lu, Stephen M.; Lu, Wuyuan; Qasim, M. A.; Anderson, Stephen; Apostol, Izydor; Ardelt, Wojciech; Bigler, Theresa; Chiang, Yi Wen; Cook, James; James, Michael N. G.; Kato, Ikunoshin; Kelly, Clyde; Kohr, William; Komiyama, Tomoko; Lin, Tiao-Yin; Ogawa, Michio; Otlewski, Jacek; Park, Soon-Jae; Qasim, Sabiha; Ranjbar, Michael; Tashiro, Misao; Warne, Nicholas; Whatley, Harry; Wieczorek, Anna; Wieczorek, Maciej; Wilusz, Tadeusz; Wynn, Richard; Zhang, Wenlei; Laskowski, Michael

2001-01-01

An additivity-based sequence to reactivity algorithm for the interaction of members of the Kazal family of protein inhibitors with six selected serine proteinases is described. Ten consensus variable contact positions in the inhibitor were identified, and the 19 possible variants at each of these positions were expressed. The free energies of interaction of these variants and the wild type were measured. For an additive system, this data set allows for the calculation of all possible sequences, subject to some restrictions. The algorithm was extensively tested. It is exceptionally fast so that all possible sequences can be predicted. The strongest, the most specific possible, and the least specific inhibitors were designed, and an evolutionary problem was solved. PMID:11171964
AMS 4.0: consensus prediction of post-translational modifications in protein sequences.

PubMed

Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit

2012-08-01

We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing.
A FRET Biosensor for ROCK Based on a Consensus Substrate Sequence Identified by KISS Technology.

PubMed

Li, Chunjie; Imanishi, Ayako; Komatsu, Naoki; Terai, Kenta; Amano, Mutsuki; Kaibuchi, Kozo; Matsuda, Michiyuki

2017-01-11

Genetically-encoded biosensors based on Förster/fluorescence resonance energy transfer (FRET) are versatile tools for studying the spatio-temporal regulation of signaling molecules within not only the cells but also tissues. Perhaps the hardest task in the development of a FRET biosensor for protein kinases is to identify the kinase-specific substrate peptide to be used in the FRET biosensor. To solve this problem, we took advantage of kinase-interacting substrate screening (KISS) technology, which deduces a consensus substrate sequence for the protein kinase of interest. Here, we show that a consensus substrate sequence for ROCK identified by KISS yielded a FRET biosensor for ROCK, named Eevee-ROCK, with high sensitivity and specificity. By treating HeLa cells with inhibitors or siRNAs against ROCK, we show that a substantial part of the basal FRET signal of Eevee-ROCK was derived from the activities of ROCK1 and ROCK2. Eevee-ROCK readily detected ROCK activation by epidermal growth factor, lysophosphatidic acid, and serum. When cells stably-expressing Eevee-ROCK were time-lapse imaged for three days, ROCK activity was found to increase after the completion of cytokinesis, concomitant with the spreading of cells. Eevee-ROCK also revealed a gradual increase in ROCK activity during apoptosis. Thus, Eevee-ROCK, which was developed from a substrate sequence predicted by the KISS technology, will pave the way to a better understanding of the function of ROCK in a physiological context.
ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

DOE PAGES

Li, Jie; Overall, Christopher C.; Johnson, Rudd C.; ...

2015-09-21

The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Jie; Overall, Christopher C.; Johnson, Rudd C.

The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
Cofactor specificity switch in Shikimate dehydrogenase by rational design and consensus engineering.

PubMed

García-Guevara, Fernando; Bravo, Iris; Martínez-Anaya, Claudia; Segovia, Lorenzo

2017-08-01

Consensus engineering has been used to design more stable variants using the most frequent amino acid at each site of a multiple sequence alignment; sometimes consensus engineering modifies function, but efforts have mainly been focused on studying stability. Here we constructed a consensus Rossmann domain for the Shikimate dehydrogenase enzyme; separately we decided to switch the cofactor specificity through rational design in the Escherichia coli Shikimate dehydrogenase enzyme and then analyzed the effect of consensus mutations on top of our design. We found that consensus mutations closest to the 2' adenine moiety increased the activity in our design. Consensus engineering has been shown to result in more stable proteins and our findings suggest it could also be used as a complementary tool for increasing or modifying enzyme activity during design. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
σ54-Dependent Response to Nitrogen Limitation and Virulence in Burkholderia cenocepacia Strain H111

PubMed Central

Lardi, Martina; Aguilar, Claudio; Pedrioli, Alessandro; Omasits, Ulrich; Suppiger, Angela; Cárcamo-Oyarce, Gerardo; Schmid, Nadine; Ahrens, Christian H.

2015-01-01

Members of the genus Burkholderia are versatile bacteria capable of colonizing highly diverse environmental niches. In this study, we investigated the global response of the opportunistic pathogen Burkholderia cenocepacia H111 to nitrogen limitation at the transcript and protein expression levels. In addition to a classical response to nitrogen starvation, including the activation of glutamine synthetase, PII proteins, and the two-component regulatory system NtrBC, B. cenocepacia H111 also upregulated polyhydroxybutyrate (PHB) accumulation and exopolysaccharide (EPS) production in response to nitrogen shortage. A search for consensus sequences in promoter regions of nitrogen-responsive genes identified a σ54 consensus sequence. The mapping of the σ54 regulon as well as the characterization of a σ54 mutant suggests an important role of σ54 not only in control of nitrogen metabolism but also in the virulence of this organism. PMID:25841012

Characterization of cis-acting elements required for autorepression of the equine herpesvirus 1 IE gene

PubMed Central

Kim, Seongman; Dai, Gan; O’Callaghan, Dennis J.; Kim, Seong Kee

2012-01-01

The immediate-early protein (IEP), the major regulatory protein encoded by the IE gene of equine herpesvirus 1 (EHV-1), plays a crucial role as both transcription activator and repressor during a productive lytic infection. To investigate the mechanism by which the EHV-1 IEP inhibits its own promoter, IE promoter-luciferase reporter plasmids containing wild-type and mutant IEP-binding site (IEBS) were constructed and used for luciferase reporter assays. The IEP inhibited transcription from its own promoter in the presence of a consensus IEBS (5’-ATCGT-3’) located near the transcription initiation site but did not inhibit when the consensus sequence was deleted. To determine whether the distance between the TATA box and the IEBS affects transcriptional repression, the IEBS was displaced from the original site by the insertion of synthetic DNA sequences. Luciferase reporter assays revealed that the IEP is able to repress its own promoter when the IEBS is located within 26-bp from the TATA box. We also found that the proper orientation and position of the IEBS were required for the repression by the IEP. Interestingly, the level of repression was significantly reduced when a consensus TATA sequence was deleted from the promoter region, indicating that the IEP efficiently inhibits its own promoter in a TATA box-dependent manner. Taken together, these results suggest that the EHV-1 IEP delicately modulates autoregulation of its gene through the consensus IEBS that is near the transcription initiation site and the TATA box. PMID:22265772
Characterization of cis-acting elements required for autorepression of the equine herpesvirus 1 IE gene.

PubMed

Kim, Seongman; Dai, Gan; O'Callaghan, Dennis J; Kim, Seong Kee

2012-04-01

The immediate-early protein (IEP), the major regulatory protein encoded by the IE gene of equine herpesvirus 1 (EHV-1), plays a crucial role as both transcription activator and repressor during a productive lytic infection. To investigate the mechanism by which the EHV-1 IEP inhibits its own promoter, IE promoter-luciferase reporter plasmids containing wild-type and mutant IEP-binding site (IEBS) were constructed and used for luciferase reporter assays. The IEP inhibited transcription from its own promoter in the presence of a consensus IEBS (5'-ATCGT-3') located near the transcription initiation site but did not inhibit when the consensus sequence was deleted. To determine whether the distance between the TATA box and the IEBS affects transcriptional repression, the IEBS was displaced from the original site by the insertion of synthetic DNA sequences. Luciferase reporter assays revealed that the IEP is able to repress its own promoter when the IEBS is located within 26-bp from the TATA box. We also found that the proper orientation and position of the IEBS were required for the repression by the IEP. Interestingly, the level of repression was significantly reduced when a consensus TATA sequence was deleted from the promoter region, indicating that the IEP efficiently inhibits its own promoter in a TATA box-dependent manner. Taken together, these results suggest that the EHV-1 IEP delicately modulates autoregulation of its gene through the consensus IEBS that is near the transcription initiation site and the TATA box. Copyright © 2012. Published by Elsevier B.V.
Mapping protein-protein interactions with phage-displayed combinatorial peptide libraries.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kay, B. K.; Castagnoli, L.; Biosciences Division

This unit describes the process and analysis of affinity selecting bacteriophage M13 from libraries displaying combinatorial peptides fused to either a minor or major capsid protein. Direct affinity selection uses target protein bound to a microtiter plate followed by purification of selected phage by ELISA. Alternatively, there is a bead-based affinity selection method. These methods allow one to readily isolate peptide ligands that bind to a protein target of interest and use the consensus sequence to search proteomic databases for putative interacting proteins.
CABS-fold: Server for the de novo and consensus-based prediction of protein structure.

PubMed

Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej

2013-07-01

The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold.
CABS-fold: server for the de novo and consensus-based prediction of protein structure

PubMed Central

Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej

2013-01-01

The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold. PMID:23748950
A conserved mechanism for replication origin recognition and binding in archaea.

PubMed

Majerník, Alan I; Chong, James P J

2008-01-15

To date, methanogens are the only group within the archaea where firing DNA replication origins have not been demonstrated in vivo. In the present study we show that a previously identified cluster of ORB (origin recognition box) sequences do indeed function as an origin of replication in vivo in the archaeon Methanothermobacter thermautotrophicus. Although the consensus sequence of ORBs in M. thermautotrophicus is somewhat conserved when compared with ORB sequences in other archaea, the Cdc6-1 protein from M. thermautotrophicus (termed MthCdc6-1) displays sequence-specific binding that is selective for the MthORB sequence and does not recognize ORBs from other archaeal species. Stabilization of in vitro MthORB DNA binding by MthCdc6-1 requires additional conserved sequences 3' to those originally described for M. thermautotrophicus. By testing synthetic sequences bearing mutations in the MthORB consensus sequence, we show that Cdc6/ORB binding is critically dependent on the presence of an invariant guanine found in all archaeal ORB sequences. Mutation of a universally conserved arginine residue in the recognition helix of the winged helix domain of archaeal Cdc6-1 shows that specific origin sequence recognition is dependent on the interaction of this arginine residue with the invariant guanine. Recognition of a mutated origin sequence can be achieved by mutation of the conserved arginine residue to a lysine or glutamine residue. Thus despite a number of differences in protein and DNA sequences between species, the mechanism of origin recognition and binding appears to be conserved throughout the archaea.
[Screening specific recognition motif of RNA-binding proteins by SELEX in combination with next-generation sequencing technique].

PubMed

Zhang, Lu; Xu, Jinhao; Ma, Jinbiao

2016-07-25

RNA-binding protein exerts important biological function by specifically recognizing RNA motif. SELEX (Systematic evolution of ligands by exponential enrichment), an in vitro selection method, can obtain consensus motif with high-affinity and specificity for many target molecules from DNA or RNA libraries. Here, we combined SELEX with next-generation sequencing to study the protein-RNA interaction in vitro. A pool of RNAs with 20 bp random sequences were transcribed by T7 promoter, and target protein was inserted into plasmid containing SBP-tag, which can be captured by streptavidin beads. Through only one cycle, the specific RNA motif can be obtained, which dramatically improved the selection efficiency. Using this method, we found that human hnRNP A1 RRMs domain (UP1 domain) bound RNA motifs containing AGG and AG sequences. The EMSA experiment indicated that hnRNP A1 RRMs could bind the obtained RNA motif. Taken together, this method provides a rapid and effective method to study the RNA binding specificity of proteins.
New consensus nomenclature for mammalian keratins

PubMed Central

Schweizer, Jürgen; Bowden, Paul E.; Coulombe, Pierre A.; Langbein, Lutz; Lane, E. Birgitte; Magin, Thomas M.; Maltais, Lois; Omary, M. Bishr; Parry, David A.D.; Rogers, Michael A.; Wright, Mathew W.

2006-01-01

Keratins are intermediate filament–forming proteins that provide mechanical support and fulfill a variety of additional functions in epithelial cells. In 1982, a nomenclature was devised to name the keratin proteins that were known at that point. The systematic sequencing of the human genome in recent years uncovered the existence of several novel keratin genes and their encoded proteins. Their naming could not be adequately handled in the context of the original system. We propose a new consensus nomenclature for keratin genes and proteins that relies upon and extends the 1982 system and adheres to the guidelines issued by the Human and Mouse Genome Nomenclature Committees. This revised nomenclature accommodates functional genes and pseudogenes, and although designed specifically for the full complement of human keratins, it offers the flexibility needed to incorporate additional keratins from other mammalian species. PMID:16831889
Sequence and structural implications of a bovine corneal keratan sulfate proteoglycan core protein. Protein 37B represents bovine lumican and proteins 37A and 25 are unique

NASA Technical Reports Server (NTRS)

Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

1993-01-01

Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.
MobiDB-lite: fast and highly specific consensus prediction of intrinsic disorder in proteins.

PubMed

Necci, Marco; Piovesan, Damiano; Dosztányi, Zsuzsanna; Tosatto, Silvio C E

2017-05-01

Intrinsic disorder (ID) is established as an important feature of protein sequences. Its use in proteome annotation is however hampered by the availability of many methods with similar performance at the single residue level, which have mostly not been optimized to predict long ID regions of size comparable to domains. Here, we have focused on providing a single consensus-based prediction, MobiDB-lite, optimized for highly specific (i.e. few false positive) predictions of long disorder. The method uses eight different predictors to derive a consensus which is then filtered for spurious short predictions. Consensus prediction is shown to outperform the single methods when annotating long ID regions. MobiDB-lite can be useful in large-scale annotation scenarios and has indeed already been integrated in the MobiDB, DisProt and InterPro databases. MobiDB-lite is available as part of the MobiDB database from URL: http://mobidb.bio.unipd.it/. An executable can be downloaded from URL: http://protein.bio.unipd.it/mobidblite/. silvio.tosatto@unipd.it. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
A synthetic promoter library for constitutive gene expression in Lactobacillus plantarum.

PubMed

Rud, Ida; Jensen, Peter Ruhdal; Naterstad, Kristine; Axelsson, Lars

2006-04-01

A synthetic promoter library (SPL) for Lactobacillus plantarum has been developed, which generalizes the approach for obtaining synthetic promoters. The consensus sequence, derived from rRNA promoters extracted from the L. plantarum WCFS1 genome, was kept constant, and the non-consensus sequences were randomized. Construction of the SPL was performed in a vector (pSIP409) previously developed for high-level, inducible gene expression in L. plantarum and Lactobacillus sakei. A wide range of promoter strengths was obtained with the approach, covering 3-4 logs of expression levels in small increments of activity. The SPL was evaluated for the ability to drive beta-glucuronidase (GusA) and aminopeptidase N (PepN) expression. Protein production from the synthetic promoters was constitutive, and the most potent promoters gave high protein production with levels comparable to those of native rRNA promoters, and production of PepN protein corresponding to approximately 10-15 % of the total cellular protein. High correlation was obtained between the activities of promoters when tested in L. sakei and L. plantarum, which indicates the potential of the SPL for other Lactobacillus species. The SPL enables fine-tuning of stable gene expression for various applications in L. plantarum.
Identification of the regulatory autophosphorylation site of autophosphorylation-dependent protein kinase (auto-kinase). Evidence that auto-kinase belongs to a member of the p21-activated kinase family.

PubMed

Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D

1998-08-15

Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2.
Identification of the regulatory autophosphorylation site of autophosphorylation-dependent protein kinase (auto-kinase). Evidence that auto-kinase belongs to a member of the p21-activated kinase family.

PubMed Central

Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D

1998-01-01

Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2. PMID:9693111
DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

PubMed

You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

2018-06-06

As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.
CHROMA: consensus-based colouring of multiple alignments for publication.

PubMed

Goodstadt, L; Ponting, C P

2001-09-01

CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to CHROMA@lg.ndirect.co.uk.
RNA-Seq analysis and transcriptome assembly for blackberry (Rubus sp. Var. Lochness) fruit.

PubMed

Garcia-Seco, Daniel; Zhang, Yang; Gutierrez-Mañero, Francisco J; Martin, Cathie; Ramos-Solano, Beatriz

2015-01-22

There is an increasing interest in berries, especially blackberries in the diet, because of recent reports of their health benefits due to their high content of flavonoids. A broad range of genomic tools are available for other Rosaceae species but these tools are still lacking in the Rubus genus, thus limiting gene discovery and the breeding of improved varieties. De novo RNA-seq of ripe blackberries grown under field conditions was performed using Illumina Hiseq 2000. Almost 9 billion nucleotide bases were sequenced in total. Following assembly, 42,062 consensus sequences were detected. For functional annotation, 33,040 (NR), 32,762 (NT), 21,932 (Swiss-Prot), 20,134 (KEGG), 13,676 (COG), 24,168 (GO) consensus sequences were annotated using different databases; in total 34,552 annotated sequences were identified. For protein prediction analysis, the number of coding DNA sequences (CDS) that mapped to the protein database was 32,540. Non redundant (NR), annotation showed that 25,418 genes (73.5%) has the highest similarity with Fragaria vesca subspecies vesca. Reanalysis was undertaken by aligning the reads with this reference genome for a deeper analysis of the transcriptome. We demonstrated that de novo assembly, using Trinity and later annotation with Blast using different databases, were complementary to alignment to the reference sequence using SOAPaligner/SOAP2. The Fragaria reference genome belongs to a species in the same family as blackberry (Rosaceae) but to a different genus. Since blackberries are tetraploids, the possibility of artefactual gene chimeras resulting from mis-assembly was tested with one of the genes sequenced by RNAseq, Chalcone Synthase (CHS). cDNAs encoding this protein were cloned and sequenced. Primers designed to the assembled sequences accurately distinguished different contigs, at least for chalcone synthase genes. We prepared and analysed transcriptome data from ripe blackberries, for which prior genomic information was limited. This new sequence information will improve the knowledge of this important and healthy fruit, providing an invaluable new tool for biological research.
Brain cDNA clone for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

McTiernan, C.; Adkins, S.; Chatonnet, A.

1987-10-01

A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
Revisiting and re-engineering the classical zinc finger peptide: consensus peptide-1 (CP-1).

PubMed

Besold, Angelique N; Widger, Leland R; Namuswe, Frances; Michalek, Jamie L; Michel, Sarah L J; Goldberg, David P

2016-04-01

Zinc plays key structural and catalytic roles in biology. Structural zinc sites are often referred to as zinc finger (ZF) sites, and the classical ZF contains a Cys2His2 motif that is involved in coordinating Zn(II). An optimized Cys2His2 ZF, named consensus peptide 1 (CP-1), was identified more than 20 years ago using a limited set of sequenced proteins. We have reexamined the CP-1 sequence, using our current, much larger database of sequenced proteins that have been identified from high-throughput sequencing methods, and found the sequence to be largely unchanged. The CCHH ligand set of CP-1 was then altered to a CAHH motif to impart hydrolytic activity. This ligand set mimics the His2Cys ligand set of peptide deformylase (PDF), a hydrolytically active M(II)-centered (M = Zn or Fe) protein. The resultant peptide [CP-1(CAHH)] was evaluated for its ability to coordinate Zn(II) and Co(II) ions, adopt secondary structure, and promote hydrolysis. CP-1(CAHH) was found to coordinate Co(II) and Zn(II) and a pentacoordinate geometry for Co(II)-CP-1(CAHH) was implicated from UV-vis data. This suggests a His2Cys(H2O)2 environment at the metal center. The Zn(II)-bound CP-1(CAHH) was shown to adopt partial secondary structure by 1-D (1)H NMR spectroscopy. Both Zn(II)-CP-1(CAHH) and Co(II)-CP-1(CAHH) show good hydrolytic activity toward the test substrate 4-nitrophenyl acetate, exhibiting faster rates than most active synthetic Zn(II) complexes.
hPDI: a database of experimental human protein-DNA interactions.

PubMed

Xie, Zhi; Hu, Shaohui; Blackshaw, Seth; Zhu, Heng; Qian, Jiang

2010-01-15

The human protein DNA Interactome (hPDI) database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The unique characteristics of hPDI are that it contains consensus DNA-binding sequences not only for nearly 500 human transcription factors but also for >500 unconventional DNA-binding proteins, which are completely uncharacterized previously. Users can browse, search and download a subset or the entire data via a web interface. This database is freely accessible for any academic purposes. http://bioinfo.wilmer.jhu.edu/PDI/.
Screening of matrix metalloproteinases available from the protein data bank: insights into biological functions, domain organization, and zinc binding groups.

PubMed

Nicolotti, Orazio; Miscioscia, Teresa Fabiola; Leonetti, Francesco; Muncipinto, Giovanni; Carotti, Angelo

2007-01-01

A total of 142 matrix metalloproteinase (MMP) X-ray crystallographic structures were retrieved from the Protein Data Bank (PDB) and analyzed by an automated and efficient routine, developed in-house, with a series of bioinformatic tools. Highly informative heat maps and hierarchical clusterograms provided a reliable and comprehensive representation of the relationships existing among MMPs, enlarging and complementing the current knowledge in the field. Multiple sequence and structural alignments permitted better location and display of key MMP motifs and quantification of the residue consensus at each amino acid position in the most critical binding subsites of MMPs. The MMP active site consensus sequences, the C-alpha root-mean-square deviation (RMSd) analysis of diverse enzymatic subsites, and the examination of the chemical nature, binding topologies, and zinc binding groups (ZBGs) of ligands extracted from crystallographic complexes provided useful insights on the structural arrangements of the most potent MMP inhibitors.

Transcriptional activation of the Escherichia coli adaptive response gene aidB is mediated by binding of methylated Ada protein. Evidence for a new consensus sequence for Ada-binding sites.

PubMed

Landini, P; Volkert, M R

1995-04-07

The Escherichia coli aidB gene is part of the adaptive response to DNA methylation damage. Genes belonging to the adaptive response are positively regulated by the ada gene; the Ada protein acts as a transcriptional activator when methylated in one of its cysteine residues at position 69. Through DNaseI protection assays, we show that methylated Ada (meAda) is able to bind a DNA sequence between 40 and 60 base pairs upstream of the aidB transcriptional startpoint. Binding of meAda is necessary to activate transcription of the adaptive response genes; accordingly, in vitro transcription of aidB is dependent on the presence of meAda. Unmethylated Ada protein shows no protection against DNaseI digestion in the aidB promoter region nor does it promote aidB in vitro transcription. The aidB Ada-binding site shows only weak homology to the proposed consensus sequences for Ada-binding sites in E. coli (AAANNAA and AAAGCGCA) but shares a higher degree of similarity with the Ada-binding regions from other bacterial species, such as Salmonella typhimurium and Bacillus subtilis. Based on the comparison of five different Ada-dependent promoter regions, we suggest that a possible recognition sequence for meAda might be AATnnnnnnG-CAA. Higher concentrations of Ada are required for the binding of aidB than for the ada promoter, suggesting lower affinity of the protein for the aidB Ada-binding site. Common features in the Ada-binding regions of ada and aidB are a high A/T content, the presence of an inverted repeat structure, and their position relative to the transcriptional start site. We propose that these elements, in addition to the proposed recognition sequence, are important for binding of the Ada protein.
STAT1:DNA sequence-dependent binding modulation by phosphorylation, protein:protein interactions and small-molecule inhibition

PubMed Central

Bonham, Andrew J.; Wenta, Nikola; Osslund, Leah M.; Prussin, Aaron J.; Vinkemeier, Uwe; Reich, Norbert O.

2013-01-01

The DNA-binding specificity and affinity of the dimeric human transcription factor (TF) STAT1, were assessed by total internal reflectance fluorescence protein-binding microarrays (TIRF-PBM) to evaluate the effects of protein phosphorylation, higher-order polymerization and small-molecule inhibition. Active, phosphorylated STAT1 showed binding preferences consistent with prior characterization, whereas unphosphorylated STAT1 showed a weak-binding preference for one-half of the GAS consensus site, consistent with recent models of STAT1 structure and function in response to phosphorylation. This altered-binding preference was further tested by use of the inhibitor LLL3, which we show to disrupt STAT1 binding in a sequence-dependent fashion. To determine if this sequence-dependence is specific to STAT1 and not a general feature of human TF biology, the TF Myc/Max was analysed and tested with the inhibitor Mycro3. Myc/Max inhibition by Mycro3 is sequence independent, suggesting that the sequence-dependent inhibition of STAT1 may be specific to this system and a useful target for future inhibitor design. PMID:23180800
Human adenovirus serotype 12 virion precursors pMu and pVI are cleaved at amino-terminal and carboxy-terminal sites that conform to the adenovirus 2 endoproteinase cleavage consensus sequence.

PubMed

Freimuth, P; Anderson, C W

1993-03-01

The sequence of a 1158-base pair fragment of the human adenovirus serotype 12 (Ad12) genome was determined. This segment encodes the precursors for virion components Mu and VI. Both Ad12 precursors contain two sequences that conform to a consensus sequence motif for cleavage by the endoproteinase of adenovirus 2 (Ad2). Analysis of the amino terminus of VI and of the peptide fragments found in Ad12 virions demonstrated that these sites are cleaved during Ad12 maturation. This observation suggests that the recognition motif for adenovirus endoproteinases is highly conserved among human serotypes. The adenovirus 2 endoproteinase polypeptide requires additional co-factors for activity (C. W. Anderson, Protein Expression Purif., 1993, 4, 8-15). Synthetic Ad12 or Ad2 pVI carboxy-terminal peptides each permitted efficient cleavage of an artificial endoproteinase substrate by recombinant Ad2 endoproteinase polypeptide.
Novel isoprenylated proteins identified by an expression library screen.

PubMed

Biermann, B J; Morehead, T A; Tate, S E; Price, J R; Randall, S K; Crowell, D N

1994-10-14

Isoprenylated proteins are involved in eukaryotic cell growth and signal transduction. The protein determinant for prenylation is a short carboxyl-terminal motif containing a cysteine, to which the isoprenoid is covalently attached via thioether linkage. To date, isoprenylated proteins have almost all been identified by demonstrating the attachment of an isoprenoid to previously known proteins. Thus, many isoprenylated proteins probably remain undiscovered. To identify novel isoprenylated proteins for subsequent biochemical study, colony blots of a Glycine max cDNA expression library were [3H]farnesyl-labeled in vitro. Proteins identified by this screen contained several different carboxyl termini that conform to consensus farnesylation motifs. These proteins included known farnesylated proteins (DnaJ homologs) and several novel proteins, two of which contained six or more tandem repeats of a hexapeptide having the consensus sequence (E/G)(G/P)EK(P/K)K. Thus, plants contain a diverse array of genes encoding farnesylated proteins, and our results indicate that fundamental differences in the identities of farnesylated proteins may exist between plants and other eukaryotes. Expression library screening by direct labeling can be adapted to identify isoprenylated proteins from other organisms, as well as proteins with other post-translational modifications.
Mechanisms of radiation-induced gene responses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woloschak, G.E.; Paunesku, T.

1996-10-01

In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Cloning and Expression of the Erwinia carotovora subsp. carotovora Gene Encoding the Low-Molecular-Weight Bacteriocin Carocin S1▿

PubMed Central

Chuang, Duen-yau; Chien, Yung-chei; Wu, Huang-Pin

2007-01-01

The purpose of this study was to clone the carocin S1 gene and express it in a non-carocin-producing strain of Erwinia carotovora. A mutant, TH22-10, which produced a high-molecular-weight bacteriocin but not a low-molecular-weight bacteriocin, was obtained by Tn5 insertional mutagenesis using H-rif-8-2 (a spontaneous rifampin-resistant mutant of Erwinia carotovora subsp. carotovora 89-H-4). Using thermal asymmetric interlaced PCR, the DNA sequence from the Tn5 insertion site and the DNA sequence of the contiguous 2,280-bp region were determined. Two complete open reading frames (ORF), designated ORF2 and ORF3, were identified within the sequence fragment. ORF2 and ORF3 were identified with the carocin S1 genes, caroS1K (ORF2) and caroS1I (ORF3), which, respectively, encode a killing protein (CaroS1K) and an immunity protein (CaroS1I). These genes were homologous to the pyocin S3 gene and the pyocin AP41 gene. Carocin S1 was expressed in E. carotovora subsp. carotovora Ea1068 and replicated in TH22-10 but could not be expressed in Escherichia coli (JM101) because a consensus sequence resembling an SOS box was absent. A putative sequence similar to the consensus sequence for the E. coli cyclic AMP receptor protein binding site (−312 bp) was found upstream of the start codon. Production of this bacteriocin was also induced by glucose and lactose. The homology search results indicated that the carocin S1 gene (between bp 1078 and bp 1704) was homologous to the pyocin S3 and pyocin AP41 genes in Pseudomonas aeruginosa. These genes encode proteins with nuclease activity (domain 4). This study found that carocin S1 also has nuclease activity. PMID:17071754
ClubSub-P: Cluster-Based Subcellular Localization Prediction for Gram-Negative Bacteria and Archaea

PubMed Central

Paramasivam, Nagarajan; Linke, Dirk

2011-01-01

The subcellular localization (SCL) of proteins provides important clues to their function in a cell. In our efforts to predict useful vaccine targets against Gram-negative bacteria, we noticed that misannotated start codons frequently lead to wrongly assigned SCLs. This and other problems in SCL prediction, such as the relatively high false-positive and false-negative rates of some tools, can be avoided by applying multiple prediction tools to groups of homologous proteins. Here we present ClubSub-P, an online database that combines existing SCL prediction tools into a consensus pipeline from more than 600 proteomes of fully sequenced microorganisms. On top of the consensus prediction at the level of single sequences, the tool uses clusters of homologous proteins from Gram-negative bacteria and from Archaea to eliminate false-positive and false-negative predictions. ClubSub-P can assign the SCL of proteins from Gram-negative bacteria and Archaea with high precision. The database is searchable, and can easily be expanded using either new bacterial genomes or new prediction tools as they become available. This will further improve the performance of the SCL prediction, as well as the detection of misannotated start codons and other annotation errors. ClubSub-P is available online at http://toolkit.tuebingen.mpg.de/clubsubp/ PMID:22073040
cWINNOWER algorithm for finding fuzzy dna motifs

NASA Technical Reports Server (NTRS)

Liang, S.; Samanta, M. P.; Biegel, B. A.

2004-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

NASA Technical Reports Server (NTRS)

Liang, Shoudan

2003-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design

PubMed Central

Rose, Timothy M.; Henikoff, Jorja G.; Henikoff, Steven

2003-01-01

We have developed a new primer design strategy for PCR amplification of distantly related gene sequences based on consensus-degenerate hybrid oligonucleotide primers (CODEHOPs). An interactive program has been written to design CODEHOP PCR primers from conserved blocks of amino acids within multiply-aligned protein sequences. Each CODEHOP consists of a pool of related primers containing all possible nucleotide sequences encoding 3–4 highly conserved amino acids within a 3′ degenerate core. A longer 5′ non-degenerate clamp region contains the most probable nucleotide predicted for each flanking codon. CODEHOPs are used in PCR amplification to isolate distantly related sequences encoding the conserved amino acid sequence. The primer design software and the CODEHOP PCR strategy have been utilized for the identification and characterization of new gene orthologs and paralogs in different plant, animal and bacterial species. In addition, this approach has been successful in identifying new pathogen species. The CODEHOP designer (http://blocks.fhcrc.org/codehop.html) is linked to BlockMaker and the Multiple Alignment Processor within the Blocks Database World Wide Web (http://blocks.fhcrc.org). PMID:12824413
Intramolecular control of transcriptional activity by the NK2-specific domain in NK-2 homeodomain proteins

PubMed Central

Watada, Hirotaka; Mirmira, Raghavendra G.; Kalamaras, Julie; German, Michael S.

2000-01-01

The developmentally important homeodomain transcription factors of the NK-2 class contain a highly conserved region, the NK2-specific domain (NK2-SD). The function of this domain, however, remains unknown. The primary structure of the NK2-SD suggests that it might function as an accessory DNA-binding domain or as a protein–protein interaction interface. To assess the possibility that the NK2-SD may contribute to DNA-binding specificity, we used a PCR-based approach to identify a consensus DNA-binding sequences for Nkx2.2, an NK-2 family member involved in pancreas and central nervous system development. The consensus sequence (TCTAAGTGAGCTT) is similar to the known binding sequences for other NK-2 homeodomain proteins, but we show that the NK2-SD does not contribute significantly to specific DNA binding to this sequence. To determine whether the NK2-SD contributes to transactivation, we used GAL4-Nkx2.2 fusion constructs to map a powerful transcriptional activation domain in the C-terminal region beyond the conserved NK2-SD. Interestingly, this C-terminal region functions as a transcriptional activator only in the absence of an intact NK2-SD. The NK2-SD also can mask transactivation from the paired homeodomain transcription factor Pax6, but it has no effect on transcription by itself. These results demonstrate that the NK2-SD functions as an intramolecular regulator of the C-terminal activation domain in Nkx2.2 and support a model in which interactions through the NK2-SD regulate the ability of NK-2-class proteins to activate specific genes during development. PMID:10944215
Mosaic protein and nucleic acid vaccines against hepatitis C virus

DOEpatents

Yusim, Karina; Korber, Bette T. M.; Kuiken, Carla L.; Fischer, William M.

2013-06-11

The invention relates to immunogenic compositions useful as HCV vaccines. Provided are HCV mosaic polypeptide and nucleic acid compositions which provide higher levels of T-cell epitope coverage while minimizing the occurrence of unnatural and rare epitopes compared to natural HCV polypeptides and consensus HCV sequences.
Sequence of a second gene encoding bovine submaxillary mucin: implication for mucin heterogeneity and cloning.

PubMed

Jiang, W; Woitach, J T; Gupta, D; Bhavanandan, V P

1998-10-20

Secreted epithelial mucins are extremely large and heterogeneous glycoproteins. We report the 5 kilobase DNA sequence of a second gene, BSM2, which encodes bovine submaxillary mucin. The determined nucleotide and deduced amino acid sequences of BSM2 are 95.2% and 92. 2% identical, respectively, to those of the previously described BSM1 gene isolated from the same cow. Further, the five predicted protein domains of the two genes are 100%, 94%, 93%, 77%, and 88% identical. Based on the above results, we propose that expression of multiple homologous core proteins from a single animal is a factor in generating diversity of saccharides in mucins and in providing resistance of the molecules to proteolysis. In addition, this work raises several important issues in mucin cloning such as assembling sequences from seemingly overlapping clones and deducing consensus sequences for nearly identical tandem repeats. Copyright 1998 Academic Press.
Polypeptide p41 of a Norwalk-Like Virus Is a Nucleic Acid-Independent Nucleoside Triphosphatase

PubMed Central

Pfister, Thomas; Wimmer, Eckard

2001-01-01

Southampton virus (SHV) is a member of the Norwalk-like viruses (NLVs), one of four genera of the family Caliciviridae. The genome of SHV contains three open reading frames (ORFs). ORF 1 encodes a polyprotein that is autocatalytically processed into six proteins, one of which is p41. p41 shares sequence motifs with protein 2C of picornaviruses and superfamily 3 helicases. We have expressed p41 of SHV in bacteria. Purified p41 exhibited nucleoside triphosphate (NTP)-binding and NTP hydrolysis activities. The NTPase activity was not stimulated by single-stranded nucleic acids. SHV p41 had no detectable helicase activity. Protein sequence comparison between the consensus sequences of NLV p41 and enterovirus protein 2C revealed regions of high similarity. According to secondary structure prediction, the conserved regions were located within a putative central domain of alpha helices and beta strands. This study reveals for the first time an NTPase activity associated with a calicivirus-encoded protein. Based on enzymatic properties and sequence information, a functional relationship between NLV p41 and enterovirus 2C is discussed in regard to the role of 2C-like proteins in virus replication. PMID:11160659
Characterization of protein--DNA interactions using surface plasmon resonance spectroscopy with various assay schemes.

PubMed

Teh, Huey Fang; Peh, Wendy Y X; Su, Xiaodi; Thomsen, Jane S

2007-02-27

Specific protein-DNA interactions play a central role in transcription and other biological processes. A comprehensive characterization of protein-DNA interactions should include information about binding affinity, kinetics, sequence specificity, and binding stoichiometry. In this study, we have used surface plasmon resonance spectroscopy (SPR) to study the interactions between human estrogen receptors (ER, alpha and beta subtypes) and estrogen response elements (ERE), with four assay schemes. First, we determined the sequence-dependent receptors' binding capacity by monitoring the binding of ER to various ERE sequences immobilized on a sensor surface (assay format denoted as the direct assay). Second, we screened the relative affinity of ER for various ERE sequences using a competition assay, in which the receptors bind to an ERE-immobilized surface in the presence of competitor ERE sequences. Third, we monitored the assembly of ER-ERE complexes on a SPR surface and thereafter the removal and/or dissociation of the ER (assay scheme denoted as the dissociation assay) to determine the binding stoichiometry. Last, a sandwich assay (ER binding to ERE followed by anti-ER recognition of a specific ER subtype) was performed in an effort to understand how ERalpha and ERbeta may associate and compete when binding to the DNA. With these assay schemes, we reaffirmed that (1) ERalpha is more sensitive than ERbeta to base pair change(s) in the consensus ERE, (2) ERalpha and ERbeta form a heterodimer when they bind to the consensus ERE, and (3) the binding stoichiometry of both ERalpha- and ERbeta-ERE complexes is dependent on salt concentration. With this study, we demonstrate the versatility of the SPR analysis. With the involvement of various assay arrangements, the SPR analysis can be further extended to more than kinetics and affinity study.
Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences.

PubMed

Mi, Tian; Merlin, Jerlin Camilus; Deverasetty, Sandeep; Gryk, Michael R; Bill, Travis J; Brooks, Andrew W; Lee, Logan Y; Rathnayake, Viraj; Ross, Christian A; Sargeant, David P; Strong, Christy L; Watts, Paula; Rajasekaran, Sanguthevar; Schiller, Martin R

2012-01-01

Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.
Genotypic and Functional Impact of HIV-1 Adaptation to Its Host Population during the North American Epidemic

PubMed Central

Carlson, Jonathan M.; Chan, Benjamin; Chopera, Denis R.; Brumme, Chanson J.; Markle, Tristan J.; Martin, Eric; Shahid, Aniqa; Anmole, Gursev; Mwimanzi, Philip; Nassab, Pauline; Penney, Kali A.; Rahman, Manal A.; Milloy, M.-J.; Schechter, Martin T.; Markowitz, Martin; Carrington, Mary; Walker, Bruce D.; Wagner, Theresa; Buchbinder, Susan; Fuchs, Jonathan; Koblin, Beryl; Mayer, Kenneth H.; Harrigan, P. Richard; Brockman, Mark A.; Poon, Art F. Y.; Brumme, Zabrina L.

2014-01-01

HLA-restricted immune escape mutations that persist following HIV transmission could gradually spread through the viral population, thereby compromising host antiviral immunity as the epidemic progresses. To assess the extent and phenotypic impact of this phenomenon in an immunogenetically diverse population, we genotypically and functionally compared linked HLA and HIV (Gag/Nef) sequences from 358 historic (1979–1989) and 382 modern (2000–2011) specimens from four key cities in the North American epidemic (New York, Boston, San Francisco, Vancouver). Inferred HIV phylogenies were star-like, with approximately two-fold greater mean pairwise distances in modern versus historic sequences. The reconstructed epidemic ancestral (founder) HIV sequence was essentially identical to the North American subtype B consensus. Consistent with gradual diversification of a “consensus-like” founder virus, the median “background” frequencies of individual HLA-associated polymorphisms in HIV (in individuals lacking the restricting HLA[s]) were ∼2-fold higher in modern versus historic HIV sequences, though these remained notably low overall (e.g. in Gag, medians were 3.7% in the 2000s versus 2.0% in the 1980s). HIV polymorphisms exhibiting the greatest relative spread were those restricted by protective HLAs. Despite these increases, when HIV sequences were analyzed as a whole, their total average burden of polymorphisms that were “pre-adapted” to the average host HLA profile was only ∼2% greater in modern versus historic eras. Furthermore, HLA-associated polymorphisms identified in historic HIV sequences were consistent with those detectable today, with none identified that could explain the few HIV codons where the inferred epidemic ancestor differed from the modern consensus. Results are therefore consistent with slow HIV adaptation to HLA, but at a rate unlikely to yield imminent negative implications for cellular immunity, at least in North America. Intriguingly, temporal changes in protein activity of patient-derived Nef (though not Gag) sequences were observed, suggesting functional implications of population-level HIV evolution on certain viral proteins. PMID:24762668
Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

PubMed Central

Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

2010-01-01

Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085
Sequence and structural analyses of nuclear export signals in the NESdb database

PubMed Central

Xu, Darui; Farmer, Alicia; Collett, Garen; Grishin, Nick V.; Chook, Yuh Min

2012-01-01

We compiled >200 nuclear export signal (NES)–containing CRM1 cargoes in a database named NESdb. We analyzed the sequences and three-dimensional structures of natural, experimentally identified NESs and of false-positive NESs that were generated from the database in order to identify properties that might distinguish the two groups of sequences. Analyses of amino acid frequencies, sequence logos, and agreement with existing NES consensus sequences revealed strong preferences for the Φ1-X3-Φ2-X2-Φ3-X-Φ4 pattern and for negatively charged amino acids in the nonhydrophobic positions of experimentally identified NESs but not of false positives. Strong preferences against certain hydrophobic amino acids in the hydrophobic positions were also revealed. These findings led to a new and more precise NES consensus. More important, three-dimensional structures are now available for 68 NESs within 56 different cargo proteins. Analyses of these structures showed that experimentally identified NESs are more likely than the false positives to adopt α-helical conformations that transition to loops at their C-termini and more likely to be surface accessible within their protein domains or be present in disordered or unobserved parts of the structures. Such distinguishing features for real NESs might be useful in future NES prediction efforts. Finally, we also tested CRM1-binding of 40 NESs that were found in the 56 structures. We found that 16 of the NES peptides did not bind CRM1, hence illustrating how NESs are easily misidentified. PMID:22833565
Protein consensus-based surface engineering (ProCoS): a computer-assisted method for directed protein evolution.

PubMed

Shivange, Amol V; Hoeffken, Hans Wolfgang; Haefner, Stefan; Schwaneberg, Ulrich

2016-12-01

Protein consensus-based surface engineering (ProCoS) is a simple and efficient method for directed protein evolution combining computational analysis and molecular biology tools to engineer protein surfaces. ProCoS is based on the hypothesis that conserved residues originated from a common ancestor and that these residues are crucial for the function of a protein, whereas highly variable regions (situated on the surface of a protein) can be targeted for surface engineering to maximize performance. ProCoS comprises four main steps: ( i ) identification of conserved and highly variable regions; ( ii ) protein sequence design by substituting residues in the highly variable regions, and gene synthesis; ( iii ) in vitro DNA recombination of synthetic genes; and ( iv ) screening for active variants. ProCoS is a simple method for surface mutagenesis in which multiple sequence alignment is used for selection of surface residues based on a structural model. To demonstrate the technique's utility for directed evolution, the surface of a phytase enzyme from Yersinia mollaretii (Ymphytase) was subjected to ProCoS. Screening just 1050 clones from ProCoS engineering-guided mutant libraries yielded an enzyme with 34 amino acid substitutions. The surface-engineered Ymphytase exhibited 3.8-fold higher pH stability (at pH 2.8 for 3 h) and retained 40% of the enzyme's specific activity (400 U/mg) compared with the wild-type Ymphytase. The pH stability might be attributed to a significantly increased (20 percentage points; from 9% to 29%) number of negatively charged amino acids on the surface of the engineered phytase.

Synthetic signal sequences that enable efficient secretory protein production in the yeast Kluyveromyces marxianus.

PubMed

Yarimizu, Tohru; Nakamura, Mikiko; Hoshida, Hisashi; Akada, Rinji

2015-02-14

Targeting of cellular proteins to the extracellular environment is directed by a secretory signal sequence located at the N-terminus of a secretory protein. These signal sequences usually contain an N-terminal basic amino acid followed by a stretch containing hydrophobic residues, although no consensus signal sequence has been identified. In this study, simple modeling of signal sequences was attempted using Gaussia princeps secretory luciferase (GLuc) in the yeast Kluyveromyces marxianus, which allowed comprehensive recombinant gene construction to substitute synthetic signal sequences. Mutational analysis of the GLuc signal sequence revealed that the GLuc hydrophobic peptide length was lower limit for effective secretion and that the N-terminal basic residue was indispensable. Deletion of the 16th Glu caused enhanced levels of secreted protein, suggesting that this hydrophilic residue defined the boundary of a hydrophobic peptide stretch. Consequently, we redesigned this domain as a repeat of a single hydrophobic amino acid between the N-terminal Lys and C-terminal Glu. Stretches consisting of Phe, Leu, Ile, or Met were effective for secretion but the number of residues affected secretory activity. A stretch containing sixteen consecutive methionine residues (M16) showed the highest activity; the M16 sequence was therefore utilized for the secretory production of human leukemia inhibitory factor protein in yeast, resulting in enhanced secreted protein yield. We present a new concept for the provision of secretory signal sequence ability in the yeast K. marxianus, determined by the number of residues of a single hydrophobic residue located between N-terminal basic and C-terminal acidic amino acid boundaries.
Improving consensus contact prediction via server correlation reduction.

PubMed

Gao, Xin; Bu, Dongbo; Xu, Jinbo; Li, Ming

2009-05-06

Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.
DNA sequence requirements for the accurate transcription of a protein-coding plastid gene in a plastid in vitro system from mustard (Sinapis alba L.)

PubMed Central

Link, Gerhard

1984-01-01

A nuclease-treated plastid extract from mustard (Sinapis alba L.) allows efficient transcription of cloned plastid DNA templates. In this in vitro system, the major runoff transcript of the truncated gene for the 32 000 mol. wt. photosystem II protein was accurately initiated from a site close to or identical with the in vivo start site. By using plasmids with deletions in the 5'-flanking region of this gene as templates, a DNA region required for efficient and selective initiation was detected ˜28-35 nucleotides upstream of the transcription start site. This region contains the sequence element TTGACA, which matches the consensus sequence for prokaryotic `−35' promoter elements. In the absence of this region, a region ˜13-27 nucleotides upstream of the start site still enables a basic level of specific transcription. This second region contains the sequence element TATATAA, which matches the consensus sequence for the `TATA' box of genes transcribed by RNA polymerase II (or B). The region between the `TATA'-like element and the transcription start site is not sufficient but may be required for specific transcription of the plastid gene. This latter region contains the sequence element TATACT, which resembles the prokaryotic `−10' (Pribnow) box. Based on the structural and transcriptional features of the 5' upstream region, a `promoter switch' mechanism is proposed, which may account for the developmentally regulated expression of this plastid gene. ImagesFig. 1.Fig. 2.Fig. 3.Fig. 4.Figure 5. PMID:16453540
Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach

NASA Astrophysics Data System (ADS)

Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.

2012-10-01

In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.
The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence.

PubMed

Lahr, Roni M; Mack, Seshat M; Héroux, Annie; Blagden, Sarah P; Bousquet-Antonelli, Cécile; Deragon, Jean-Marc; Berman, Andrea J

2015-09-18

La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence

DOE PAGES

Lahr, Roni M.; Mack, Seshat M.; Heroux, Annie; ...

2015-07-22

La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. Amore » putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. Ultimately, these studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.« less
Genetic dissection of the consensus sequence for the class 2 and class 3 flagellar promoters

PubMed Central

Wozniak, Christopher E.; Hughes, Kelly T.

2008-01-01

Summary Computational searches for DNA binding sites often utilize consensus sequences. These search models make assumptions that the frequency of a base pair in an alignment relates to the base pair’s importance in binding and presume that base pairs contribute independently to the overall interaction with the DNA binding protein. These two assumptions have generally been found to be accurate for DNA binding sites. However, these assumptions are often not satisfied for promoters, which are involved in additional steps in transcription initiation after RNA polymerase has bound to the DNA. To test these assumptions for the flagellar regulatory hierarchy, class 2 and class 3 flagellar promoters were randomly mutagenized in Salmonella. Important positions were then saturated for mutagenesis and compared to scores calculated from the consensus sequence. Double mutants were constructed to determine how mutations combined for each promoter type. Mutations in the binding site for FlhD4C2, the activator of class 2 promoters, better satisfied the assumptions for the binding model than did mutations in the class 3 promoter, which is recognized by the σ28 transcription factor. These in vivo results indicate that the activator sites within flagellar promoters can be modeled using simple assumptions but that the DNA sequences recognized by the flagellar sigma factor require more complex models. PMID:18486950
Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola

PubMed Central

Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Lee, Siu Fai

2017-01-01

Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora, we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution. PMID:28717646
Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola.

PubMed

Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Mukhtar, Hamid; Lee, Siu Fai; Saleem, Muhammad

2017-01-01

Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora , we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution.
Randomization and In Vivo Selection Reveal a GGRG Motif Essential for Packaging Human Immunodeficiency Virus Type 2 RNA ▿ †

PubMed Central

Baig, Tayyba T.; Lanchy, Jean-Marc; Lodmell, J. Stephen

2009-01-01

The packaging signal (ψ) of human immunodeficiency virus type 2 (HIV-2) is present in the 5′ noncoding region of RNA and contains a 10-nucleotide palindrome (pal; 5′-392-GGAGUGCUCC) located upstream of the dimerization signal stem-loop 1 (SL1). pal has been shown to be functionally important in vitro and in vivo. We previously showed that the 3′ side of pal (GCUCC-3′) is involved in base-pairing interactions with a sequence downstream of SL1 to make an extended SL1, which is important for replication in vivo and the regulation of dimerization in vitro. However, the role of the 5′ side of pal (5′-GGAGU) was less clear. Here, we characterized this role using an in vivo SELEX approach. We produced a population of HIV-2 DNA genomes with random sequences within the 5′ side of pal and transfected these into COS-7 cells. Viruses from COS-7 cells were used to infect C8166 permissive cells. After several weeks of serial passage in C8166 cells, surviving viruses were sequenced. On the 5′ side of pal there was a striking convergence toward a GGRGN consensus sequence. Individual clones with consensus and nonconsensus sequences were tested in infectivity and packaging assays. Analysis of individuals that diverged from the consensus sequence showed normal viral RNA and protein synthesis but had replication defects and impaired RNA packaging. These findings clearly indicate that the GGRG motif is essential for viral replication and genomic RNA packaging. PMID:18971263
Consensus generation and variant detection by Celera Assembler.

PubMed

Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger

2008-04-15

We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
Archaebacterial rhodopsin sequences: Implications for evolution

NASA Technical Reports Server (NTRS)

Lanyi, J. K.

1991-01-01

It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
Sequence analysis and expression of the M1 and M2 matrix protein genes of hirame rhabdovirus (HIRRV)

USGS Publications Warehouse

Nishizawa, T.; Kurath, G.; Winton, J.R.

1997-01-01

We have cloned and sequenced a 2318 nucleotide region of the genomic RNA of hirame rhabdovirus (HIRRV), an important viral pathogen of Japanese flounder Paralichthys olivaceus. This region comprises approximately two-thirds of the 3' end of the nucleocapsid protein (N) gene and the complete matrix protein (M1 and M2) genes with the associated intergenic regions. The partial N gene sequence was 812 nucleotides in length with an open reading frame (ORF) that encoded the carboxyl-terminal 250 amino acids of the N protein. The M1 and M2 genes were 771 and 700 nucleotides in length, respectively, with ORFs encoding proteins of 227 and 193 amino acids. The M1 gene sequence contained an additional small ORF that could encode a highly basic, arginine-rich protein of 25 amino acids. Comparisons of the N, M1, and M2 gene sequences of HIRRV with the corresponding sequences of the fish rhabdoviruses, infectious hematopoietic necrosis virus (IHNV) or viral hemorrhagic septicemia virus (VHSV) indicated that HIRRV was more closely related to IHNV than to VHSV, but was clearly distinct from either. The putative consensus gene termination sequence for IHNV and VHSV, AGAYAG(A)(7), was present in the N-M1, M1-M2, and M2-G intergenic regions of HIRRV as were the putative transcription initiation sequences YGGCAC and AACA. An Escherichia coli expression system was used to produce recombinant proteins from the M1 and M2 genes of HIRRV. These were the same size as the authentic M1 and M2 proteins and reacted with anti-HIRRV rabbit serum in western blots. These reagents can be used for further study of the fish immune response and to test novel control methods.
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.

PubMed Central

Sasaki, H; Yokoyama, E; Kuroiwa, A

1990-01-01

The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Comparison of Immunogenicity in Rhesus Macaques of Transmitted-Founder, HIV-1 Group M Consensus, and Trivalent Mosaic Envelope Vaccines Formulated as a DNA Prime, NYVAC, and Envelope Protein Boost

PubMed Central

Hulot, Sandrine L.; Korber, Bette; Giorgi, Elena E.; Vandergrift, Nathan; Saunders, Kevin O.; Balachandran, Harikrishnan; Mach, Linh V.; Lifton, Michelle A.; Pantaleo, Giuseppe; Tartaglia, Jim; Phogat, Sanjay; Jacobs, Bertram; Kibler, Karen; Perdiguero, Beatriz; Gomez, Carmen E.; Esteban, Mariano; Rosati, Margherita; Felber, Barbara K.; Pavlakis, George N.; Parks, Robert; Lloyd, Krissey; Sutherland, Laura; Scearce, Richard; Letvin, Norman L.; Seaman, Michael S.; Alam, S. Munir; Montefiori, David; Liao, Hua-Xin; Haynes, Barton F.

2015-01-01

ABSTRACT An effective human immunodeficiency virus type 1 (HIV-1) vaccine must induce protective antibody responses, as well as CD4+ and CD8+ T cell responses, that can be effective despite extraordinary diversity of HIV-1. The consensus and mosaic immunogens are complete but artificial proteins, computationally designed to elicit immune responses with improved cross-reactive breadth, to attempt to overcome the challenge of global HIV diversity. In this study, we have compared the immunogenicity of a transmitted-founder (T/F) B clade Env (B.1059), a global group M consensus Env (Con-S), and a global trivalent mosaic Env protein in rhesus macaques. These antigens were delivered using a DNA prime-recombinant NYVAC (rNYVAC) vector and Env protein boost vaccination strategy. While Con-S Env was a single sequence, mosaic immunogens were a set of three Envs optimized to include the most common forms of potential T cell epitopes. Both Con-S and mosaic sequences retained common amino acids encompassed by both antibody and T cell epitopes and were central to globally circulating strains. Mosaics and Con-S Envs expressed as full-length proteins bound well to a number of neutralizing antibodies with discontinuous epitopes. Also, both consensus and mosaic immunogens induced significantly higher gamma interferon (IFN-γ) enzyme-linked immunosorbent spot assay (ELISpot) responses than B.1059 immunogen. Immunization with these proteins, particularly Con-S, also induced significantly higher neutralizing antibodies to viruses than B.1059 Env, primarily to tier 1 viruses. Both Con-S and mosaics stimulated more potent CD8-T cell responses against heterologous Envs than did B.1059. Both antibody and cellular data from this study strengthen the concept of using in silico-designed centralized immunogens for global HIV-1 vaccine development strategies. IMPORTANCE There is an increasing appreciation for the importance of vaccine-induced anti-Env antibody responses for preventing HIV-1 acquisition. This nonhuman primate study demonstrates that in silico-designed global HIV-1 immunogens, designed for a human clinical trial, are capable of eliciting not only T lymphocyte responses but also potent anti-Env antibody responses. PMID:25855741
The General Definition of the p97/Valosin-containing Protein (VCP)-interacting Motif (VIM) Delineates a New Family of p97 Cofactors*

PubMed Central

Stapf, Christopher; Cartwright, Edward; Bycroft, Mark; Hofmann, Kay; Buchberger, Alexander

2011-01-01

Cellular functions of the essential, ubiquitin-selective AAA ATPase p97/valosin-containing protein (VCP) are controlled by regulatory cofactors determining substrate specificity and fate. Most cofactors bind p97 through a ubiquitin regulatory X (UBX) or UBX-like domain or linear sequence motifs, including the hitherto ill defined p97/VCP-interacting motif (VIM). Here, we present the new, minimal consensus sequence RX5AAX2R as a general definition of the VIM that unites a novel family of known and putative p97 cofactors, among them UBXD1 and ZNF744/ANKZF1. We demonstrate that this minimal VIM consensus sequence is necessary and sufficient for p97 binding. Using NMR chemical shift mapping, we identified several residues of the p97 N-terminal domain (N domain) that are critical for VIM binding. Importantly, we show that cellular stress resistance conferred by the yeast VIM-containing cofactor Vms1 depends on the physical interaction between its VIM and the critical N domain residues of the yeast p97 homolog, Cdc48. Thus, the VIM-N domain interaction characterized in this study is required for the physiological function of Vms1 and most likely other members of the newly defined VIM family of cofactors. PMID:21896481
Regulation of the alpha-glucuronidase-encoding gene ( aguA) from Aspergillus niger.

PubMed

de Vries, R P; van de Vondervoort, P J I; Hendriks, L; van de Belt, M; Visser, J

2002-09-01

The alpha-glucuronidase gene aguA from Aspergillus niger was cloned and characterised. Analysis of the promoter region of aguA revealed the presence of four putative binding sites for the major carbon catabolite repressor protein CREA and one putative binding site for the transcriptional activator XLNR. In addition, a sequence motif was detected which differed only in the last nucleotide from the XLNR consensus site. A construct in which part of the aguA coding region was deleted still resulted in production of a stable mRNA upon transformation of A. niger. The putative XLNR binding sites and two of the putative CREA binding sites were mutated individually in this construct and the effects on expression were examined in A. niger transformants. Northern analysis of the transformants revealed that the consensus XLNR site is not actually functional in the aguA promoter, whereas the sequence that diverges from the consensus at a single position is functional. This indicates that XLNR is also able to bind to the sequence GGCTAG, and the XLNR binding site consensus should therefore be changed to GGCTAR. Both CREA sites are functional, indicating that CREA has a strong influence on aguA expression. A detailed expression analysis of aguA in four genetic backgrounds revealed a second regulatory system involved in activation of aguA gene expression. This system responds to the presence of glucuronic and galacturonic acids, and is not dependent on XLNR.
Identification of a DNA sequence motif required for expression of iron-regulated genes in pseudomonads.

PubMed

Rombel, I T; McMorran, B J; Lamont, I L

1995-02-20

Many bacteria respond to a lack of iron in the environment by synthesizing siderophores, which act as iron-scavenging compounds. Fluorescent pseudomonads synthesize strain-specific but chemically related siderophores called pyoverdines or pseudobactins. We have investigated the mechanisms by which iron controls expression of genes involved in pyoverdine metabolism in Pseudomonas aeruginosa. Transcription of these genes is repressed by the presence of iron in the growth medium. Three promoters from these genes were cloned and the activities of the promoters were dependent on the amounts of iron in the growth media. Two of the promoters were sequenced and the transcriptional start site were identified by S1 nuclease analysis. Sequences similar to the consensus binding site for the Fur repressor protein, which controls expression of iron-repressible genes in several gram-negative species, were not present in the promoters, suggesting that they are unlikely to have a high affinity for Fur. However, comparison of the promoter sequences with those of iron-regulated genes from other Pseudomonas species and also the iron-regulated exotoxin gene of P. aeruginosa allowed identification of a shared sequence element, with the consensus sequence (G/C)CTAAAT-CCC, which is likely to act as a binding site for a transcriptional activator protein. Mutations in this sequence greatly reduced the activities of the promoters characterized here as well as those of other iron-regulated promoters. The requirement for this motif in the promoters of iron-regulated genes of different Pseudomonas species indicates that similar mechanisms are likely to be involved in controlling expression of a range of iron-regulated genes in pseudomonads.
Identification of high-specificity H-NS binding site in LEE5 promoter of enteropathogenic Esherichia coli (EPEC).

PubMed

Bhat, Abhay Prasad; Shin, Minsang; Choy, Hyon E

2014-07-01

Histone-like nucleoid structuring protein (H-NS) is a small but abundant protein present in enteric bacteria and is involved in compaction of the DNA and regulation of the transcription. Recent reports have suggested that H-NS binds to a specific AT rich DNA sequence than to intrinsically curved DNA in sequence independent manner. We detected two high-specificity H-NS binding sites in LEE5 promoter of EPEC centered at -110 and -138, which were close to the proposed consensus H-NS binding motif. To identify H-NS binding sequence in LEE5 promoter, we took a random mutagenesis approach and found the mutations at around -138 were specifically defective in the regulation by H-NS. It was concluded that H-NS exerts maximum repression via the specific sequence at around -138 and subsequently contacts a subunit of RNAP through oligomerization.
SubCellProt: predicting protein subcellular localization using machine learning approaches.

PubMed

Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan

2009-01-01

High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.

Understanding the mechanisms of protein-DNA interactions

NASA Astrophysics Data System (ADS)

Lavery, Richard

2004-03-01

Structural, biochemical and thermodynamic data on protein-DNA interactions show that specific recognition cannot be reduced to a simple set of binary interactions between the partners (such as hydrogen bonds, ion pairs or steric contacts). The mechanical properties of the partners also play a role and, in the case of DNA, variations in both conformation and flexibility as a function of base sequence can be a significant factor in guiding a protein to the correct binding site. All-atom molecular modeling offers a means of analyzing the role of different binding mechanisms within protein-DNA complexes of known structure. This however requires estimating the binding strengths for the full range of sequences with which a given protein can interact. Since this number grows exponentially with the length of the binding site it is necessary to find a method to accelerate the calculations. We have achieved this by using a multi-copy approach (ADAPT) which allows us to build a DNA fragment with a variable base sequence. The results obtained with this method correlate well with experimental consensus binding sequences. They enable us to show that indirect recognition mechanisms involving the sequence dependent properties of DNA play a significant role in many complexes. This approach also offers a means of predicting protein binding sites on the basis of binding energies, which is complementary to conventional lexical techniques.
Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

PubMed Central

Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

2003-01-01

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Applying the Concept of Peptide Uniqueness to Anti-Polio Vaccination.

PubMed

Kanduc, Darja; Fasano, Candida; Capone, Giovanni; Pesce Delfino, Antonella; Calabrò, Michele; Polimeno, Lorenzo

2015-01-01

Although rare, adverse events may associate with anti-poliovirus vaccination thus possibly hampering global polio eradication worldwide. To design peptide-based anti-polio vaccines exempt from potential cross-reactivity risks and possibly able to reduce rare potential adverse events such as the postvaccine paralytic poliomyelitis due to the tendency of the poliovirus genome to mutate. Proteins from poliovirus type 1, strain Mahoney, were analyzed for amino acid sequence identity to the human proteome at the pentapeptide level, searching for sequences that (1) have zero percent of identity to human proteins, (2) are potentially endowed with an immunologic potential, and (3) are highly conserved among poliovirus strains. Sequence analyses produced a set of consensus epitopic peptides potentially able to generate specific anti-polio immune responses exempt from cross-reactivity with the human host. Peptide sequences unique to poliovirus proteins and conserved among polio strains might help formulate a specific and universal anti-polio vaccine able to react with multiple viral strains and exempt from the burden of possible cross-reactions with human proteins. As an additional advantage, using a peptide-based vaccine instead of current anti-polio DNA vaccines would eliminate the rare post-polio poliomyelitis cases and other disabling symptoms that may appear following vaccination.
A cell-free stock of simian-human immunodeficiency virus that causes AIDS in pig-tailed macaques has a limited number of amino acid substitutions in both SIVmac and HIV-1 regions of the genome and has offered cytotropism.

PubMed

Stephens, E B; Mukherjee, S; Sahni, M; Zhuge, W; Raghavan, R; Singh, D K; Leung, K; Atkinson, B; Li, Z; Joag, S V; Liu, Z Q; Narayan, O

1997-05-12

We have examined both the sequence changes in the LTR, gag, vif, vpr, vpx, tat, rev, vpu, env, and nef genes and the cell tropism of a cell-free stock of chimeric simian-human immunodeficiency virus (SHIV) isolated from the cerebrospinal fluid of a pig-tailed macaque (PNb) that developed AIDS. This virus (SHIVKU-1) is highly pathogenic when inoculated into other macaques. DNA sequence analysis of PCR-amplified products revealed a total of 5 nucleotide changes in the LTR while vif had 2 consensus amino acid changes. The gag, vif, and vpx had no consensus amino acid substitutions, whereas vpr had 1 consensus substitution. The tat and rev genes of the HXB2 region of SHIVKU-1 had 2 and 1 consensus amino acid changes, respectively. The vpu gene of the HXB2 region of SHIV, which originally had an ACG at the beginning of the gene, reverted to an initiation ATG codon and in addition contained a consensus amino acid substitution at position 69 of this protein. As expected, the majority of the nucleotide substitutions were found in the env and nef genes. Thirteen and 5 amino acid changes were predicted for the corresponding Env and Nef proteins, respectively. In addition, one-third of the env gene clones isolated from the SHIVKU-1 stock had a 5-amino-acid deletion in the V4 region. Using three independent assays, we determined that the changes in the SHIVKU-1 were associated with an increase in the efficiency of replication in macrophages. The strikingly few consensus changes in the virus suggest that conversion of this virus to one capable of causing AIDS in pig-tailed macaques was associated with relatively few changes in the viral envelope and/or accessory genes. These results will provide the basis for the development of a pathogenic, molecular clone of SHIV capable of causing AIDS in pig-tailed macaques.
Expressed sequence tags from the oomycete fish pathogen Saprolegnia parasitica reveal putative virulence factors

PubMed Central

Torto-Alalibo, Trudy; Tian, Miaoying; Gajendran, Kamal; Waugh, Mark E; van West, Pieter; Kamoun, Sophien

2005-01-01

Background The oomycete Saprolegnia parasitica is one of the most economically important fish pathogens. There is a dramatic recrudescence of Saprolegnia infections in aquaculture since the use of the toxic organic dye malachite green was banned in 2002. Little is known about the molecular mechanisms underlying pathogenicity in S. parasitica and other animal pathogenic oomycetes. In this study we used a genomics approach to gain a first insight into the transcriptome of S. parasitica. Results We generated 1510 expressed sequence tags (ESTs) from a mycelial cDNA library of S. parasitica. A total of 1279 consensus sequences corresponding to 525944 base pairs were assembled. About half of the unigenes showed similarities to known protein sequences or motifs. The S. parasitica sequences tended to be relatively divergent from Phytophthora sequences. Based on the sequence alignments of 18 conserved proteins, the average amino acid identity between S. parasitica and three Phytophthora species was 77% compared to 93% within Phytophthora. Several S. parasitica cDNAs, such as those with similarity to fungal type I cellulose binding domain proteins, PAN/Apple module proteins, glycosyl hydrolases, proteases, as well as serine and cysteine protease inhibitors, were predicted to encode secreted proteins that could function in virulence. Some of these cDNAs were more similar to fungal proteins than to other eukaryotic proteins confirming that oomycetes and fungi share some virulence components despite their evolutionary distance Conclusion We provide a first glimpse into the gene content of S. parasitica, a reemerging oomycete fish pathogen. These resources will greatly accelerate research on this important pathogen. The data is available online through the Oomycete Genomics Database [1]. PMID:16076392
Cloning, sequencing, and expression of dnaK-operon proteins from the thermophilic bacterium Thermus thermophilus.

PubMed

Osipiuk, J; Joachimiak, A

1997-09-12

We propose that the dnaK operon of Thermus thermophilus HB8 is composed of three functionally linked genes: dnaK, grpE, and dnaJ. The dnaK and dnaJ gene products are most closely related to their cyanobacterial homologs. The DnaK protein sequence places T. thermophilus in the plastid Hsp70 subfamily. In contrast, the grpE translated sequence is most similar to GrpE from Clostridium acetobutylicum, a Gram-positive anaerobic bacterium. A single promoter region, with homology to the Escherichia coli consensus promoter sequences recognized by the sigma70 and sigma32 transcription factors, precedes the postulated operon. This promoter is heat-shock inducible. The dnaK mRNA level increased more than 30 times upon 10 min of heat shock (from 70 degrees C to 85 degrees C). A strong transcription terminating sequence was found between the dnaK and grpE genes. The individual genes were cloned into pET expression vectors and the thermophilic proteins were overproduced at high levels in E. coli and purified to homogeneity. The recombinant T. thermophilus DnaK protein was shown to have a weak ATP-hydrolytic activity, with an optimum at 90 degrees C. The ATPase was stimulated by the presence of GrpE and DnaJ. Another open reading frame, coding for ClpB heat-shock protein, was found downstream of the dnaK operon.
Sequencing and functional analysis of the nifENXorf1orf2 gene cluster of Herbaspirillum seropedicae.

PubMed

Klassen, G; Pedrosa, F O; Souza, E M; Yates, M G; Rigo, L U

1999-12-01

A 5.1-kb DNA fragment from the nifHDK region of H. seropedicae was isolated and sequenced. Sequence analysis showed the presence of nifENXorf1orf2 but nifTY were not present. No nif or consensus promoter was identified. Furthermore, orf1 expression occurred only under nitrogen-fixing conditions and no promoter activity was detected between nifK and nifE, suggesting that these genes are expressed from the upstream nifH promoter and are parts of a unique nif operon. Mutagenesis studies indicate that nifN was essential for nitrogenase activity whereas nifXorf1orf2 were not. High homology between the C-terminal region of the NifX and NifB proteins from H. seropedicae was observed. Since the NifX and NifY proteins are important for FeMo cofactor (FeMoco) synthesis, we propose that alternative proteins with similar activities exist in H. seropedicae.
Identification of natural and artificial DNA substrates for the light-activated LOV-HTH transcription factor EL222

PubMed Central

Rivera-Cancel, Giomar; Motta-Mena, Laura B.; Gardner, Kevin H.

2012-01-01

Light-oxygen-voltage (LOV) domains serve as the photosensory modules for a wide range of plant and bacterial proteins, conferring blue light dependent regulation to effector activities as diverse as enzymes and DNA binding. LOV domains can also be engineered into a variety of exogenous targets, enabling similar regulation for new protein-based reagents. Common to these proteins is the ability for LOV domains to reversibly form a photochemical adduct between an internal flavin chromophore and the surrounding protein, using this to trigger conformational changes that affect output activity. Using the Erythrobacter litoralis protein EL222 model system which links LOV regulation to a helix-turn-helix (HTH) DNA binding domain, we demonstrated that the LOV domain binds and inhibits the HTH domain in the dark, releasing these interactions upon illumination [Nash et al. (2011) Proc. Natl. Acad. Sci. USA 108, 9449–9454]. Here we combine genomic and in vitro selection approaches to identify optimal DNA binding sites for EL222. Within the bacterial host, we observe binding several genomic sites using a 12 bp sequence consensus that is also found by in vitro selection methods. Sequence-specific alterations in the DNA consensus reduce EL222-binding affinity in a manner consistent with the expected binding mode: a protein dimer binding to two repeats. Finally, we demonstrate the light-dependent activation of transcription of two genes adjacent to an EL222 binding site. Taken together, these results shed light on the native function of EL222 and provide useful reagents for further basic and applications research of this versatile protein. PMID:23205774
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

PubMed

Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

2011-06-02

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Arg-Pro-X-Ser/Thr is a Consensus Phosphoacceptor Sequence for the Meiosis-Specific Ime2 Protein Kinase in Saccharomyces cerevisiae†

PubMed Central

Moore, Michael; Shin, Marcus; Bruning, Adrian; Schindler, Karen; Vershon, Andrew; Winter, Edward

2008-01-01

Ime2 is a meiosis-specific protein kinase in Saccharomyces cerevisiae that is functionally related to cyclin-dependent kinase. Although Ime2 regulates multiple steps in meiosis, only a few of its substrates have been identified. Here we show that Ime2 phosphorylates Sum1, a repressor of meiotic gene transcription, on Thr-306. Ime2 protein kinase assays on Sum1 mutants and synthetic peptides define a consensus motif Arg-Pro-X-Ser/Thr that is required for efficient phosphorylation by Ime2. The carboxyl residue adjacent to the phosphoacceptor (+1 position) also influences the efficiency of Ime2 phosphorylation with alanine being a preferred residue. This information has predictive value in identifying new potential Ime2 targets as shown by the ability of Ime2 to phosphorylate Sgs1 and Gip1 in vitro, and could be important in differentiating mitotic and meiotic regulatory pathways. PMID:17198398
Evaluation of Phage Display Discovered Peptides as Ligands for Prostate-Specific Membrane Antigen (PSMA)

PubMed Central

Edwards, W. Barry

2013-01-01

The aim of this study was to identify potential ligands of PSMA suitable for further development as novel PSMA-targeted peptides using phage display technology. The human PSMA protein was immobilized as a target followed by incubation with a 15-mer phage display random peptide library. After one round of prescreening and two rounds of screening, high-stringency screening at the third round of panning was performed to identify the highest affinity binders. Phages which had a specific binding activity to PSMA in human prostate cancer cells were isolated and the DNA corresponding to the 15-mers were sequenced to provide three consensus sequences: GDHSPFT, SHFSVGS and EVPRLSLLAVFL as well as other sequences that did not display consensus. Two of the peptide sequences deduced from DNA sequencing of binding phages, SHSFSVGSGDHSPFT and GRFLTGGTGRLLRIS were labeled with 5-carboxyfluorescein and shown to bind and co-internalize with PSMA on human prostate cancer cells by fluorescence microscopy. The high stringency requirements yielded peptides with affinities KD∼1 µM or greater which are suitable starting points for affinity maturation. While these values were less than anticipated, the high stringency did yield peptide sequences that apparently bound to different surfaces on PSMA. These peptide sequences could be the basis for further development of peptides for prostate cancer tumor imaging and therapy. PMID:23935860
Expression of the Caulobacter heat shock gene dnaK is developmentally controlled during growth at normal temperatures.

PubMed Central

Gomes, S L; Gober, J W; Shapiro, L

1990-01-01

Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Site directed recombination

DOEpatents

Jurka, Jerzy W.

1997-01-01

Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.
Maximizing the sensitivity and reliability of peptide identification in large-scale proteomic experiments by harnessing multiple search engines.

PubMed

Yu, Wen; Taylor, J Alex; Davis, Michael T; Bonilla, Leo E; Lee, Kimberly A; Auger, Paul L; Farnsworth, Chris C; Welcher, Andrew A; Patterson, Scott D

2010-03-01

Despite recent advances in qualitative proteomics, the automatic identification of peptides with optimal sensitivity and accuracy remains a difficult goal. To address this deficiency, a novel algorithm, Multiple Search Engines, Normalization and Consensus is described. The method employs six search engines and a re-scoring engine to search MS/MS spectra against protein and decoy sequences. After the peptide hits from each engine are normalized to error rates estimated from the decoy hits, peptide assignments are then deduced using a minimum consensus model. These assignments are produced in a series of progressively relaxed false-discovery rates, thus enabling a comprehensive interpretation of the data set. Additionally, the estimated false-discovery rate was found to have good concordance with the observed false-positive rate calculated from known identities. Benchmarking against standard proteins data sets (ISBv1, sPRG2006) and their published analysis, demonstrated that the Multiple Search Engines, Normalization and Consensus algorithm consistently achieved significantly higher sensitivity in peptide identifications, which led to increased or more robust protein identifications in all data sets compared with prior methods. The sensitivity and the false-positive rate of peptide identification exhibit an inverse-proportional and linear relationship with the number of participating search engines.
Homochiral stereochemistry: the missing link of structure to energetics in protein folding.

PubMed

Kumar, Anil; Ramakrishnan, Vibin; Ranbhor, Ranjit; Patel, Kirti; Durani, Susheel

2009-12-24

The notion is tested that homochiral stereochemistry being ubiquitous to protein structure could be critical to protein folding as well, causing it to become frustrated energetically providing the basis for its solvent- and sequence-mediated control. The proof in support of the notion is found in a consensus of experiment and computation according to which suitable oligopeptides are in their folding-unfolding equilibria, at both macrostate and microstate levels, susceptible to dielectric because of the conflict of peptide-chain electrostatics with interpeptide hydrogen bonds when the structure is poly-L but not when it is alternating-L,D. The argument is thus made that homochiral stereochemistry may in protein folding provide the unifying basis for its solvent- and sequence-mediated control based on screening of peptide-chain electrostatics under conflict with folding of the chain due to homochiral stereochemistry. Dielectric is brought into spotlight as the effect comparatively obscure but presumably critical to the folding in protein structure for its control.
Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

PubMed Central

2011-01-01

Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398
Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks

PubMed Central

Abrusán, György; Szilágyi, András; Zhang, Yang; Papp, Balázs

2013-01-01

The numerous discovered cases of domesticated transposable element (TE) proteins led to the recognition that TEs are a significant source of evolutionary innovation. However, much less is known about the reverse process, whether and to what degree the evolution of TEs is influenced by the genome of their hosts. We addressed this issue by searching for cases of incorporation of host genes into the sequence of TEs and examined the systems-level properties of these genes using the Saccharomyces cerevisiae and Drosophila melanogaster genomes. We identified 51 cases where the evolutionary scenario was the incorporation of a host gene fragment into a TE consensus sequence, and we show that both the yeast and fly homologues of the incorporated protein sequences have central positions in the cellular networks. An analysis of selective pressure (Ka/Ks ratio) detected significant selection in 37% of the cases. Recent research on retrovirus-host interactions shows that virus proteins preferentially target hubs of the host interaction networks enabling them to take over the host cell using only a few proteins. We propose that TEs face a similar evolutionary pressure to evolve proteins with high interacting capacities and take some of the necessary protein domains directly from their hosts. PMID:23341038
Structure of human POFUT2: insights into thrombospondin type 1 repeat fold and O-fucosylation

PubMed Central

Chen, Chun-I; Keusch, Jeremy J; Klein, Dominique; Hess, Daniel; Hofsteenge, Jan; Gut, Heinz

2012-01-01

Protein O-fucosylation is a post-translational modification found on serine/threonine residues of thrombospondin type 1 repeats (TSR). The fucose transfer is catalysed by the enzyme protein O-fucosyltransferase 2 (POFUT2) and >40 human proteins contain the TSR consensus sequence for POFUT2-dependent fucosylation. To better understand O-fucosylation on TSR, we carried out a structural and functional analysis of human POFUT2 and its TSR substrate. Crystal structures of POFUT2 reveal a variation of the classical GT-B fold and identify sugar donor and TSR acceptor binding sites. Structural findings are correlated with steady-state kinetic measurements of wild-type and mutant POFUT2 and TSR and give insight into the catalytic mechanism and substrate specificity. By using an artificial mini-TSR substrate, we show that specificity is not primarily encoded in the TSR protein sequence but rather in the unusual 3D structure of a small part of the TSR. Our findings uncover that recognition of distinct conserved 3D fold motifs can be used as a mechanism to achieve substrate specificity by enzymes modifying completely folded proteins of very wide sequence diversity and biological function. PMID:22588082
SSMART: Sequence-structure motif identification for RNA-binding proteins.

PubMed

Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe

2018-06-11

RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Identification of an estrogen response element in the 3'-flanking region of the murine c-fos protooncogene.

PubMed

Hyder, S M; Stancel, G M; Nawaz, Z; McDonnell, D P; Loose-Mitchell, D S

1992-09-05

We have used transient transfection assays with reporter plasmids expressing chloramphenicol acetyltransferase, linked to regions of mouse c-fos, to identify a specific estrogen response element (ERE) in this protooncogene. This element is located in the untranslated 3'-flanking region of the c-fos gene, 5 kilobases (kb) downstream from the c-fos promoter and 1.5 kb downstream of the poly(A) signal. This element confers estrogen responsiveness to chloramphenicol acetyltransferase reporters linked to both the herpes simplex virus thymidine kinase promoter and the homologous c-fos promoter. Deletion analysis localized the response element to a 200-base pair fragment which contains the element GGTCACCACAGCC that resembles the consensus ERE sequence GGTCACAGTGACC originally identified in Xenopus vitellogenin A2 gene. A synthetic 36-base pair oligodeoxynucleotide containing this c-fos sequence conferred estrogen inducibility to the thymidine kinase promoter. The corresponding sequence also induced reporter activity when present in the c-fos gene fragment 3 kb from the thymidine kinase promoter. Gel-shift experiments demonstrated that synthetic oligonucleotides containing either the consensus ERE or the c-fos element bind human estrogen receptor obtained from a yeast expression system. However, the mobility of the shifted band is faster for the fos-ERE-complex than the consensus ERE complex suggesting that the three-dimensional structure of the protein-DNA complexes is different or that other factors are differentially involved in the two reactions. When the 5'-GGTCA sequence present in the c-fos ERE is mutated to 5'-TTTCA, transcriptional activation and receptor binding activities are both lost. Mutation of the CAGCC-3' element corresponding to the second half-site of the c-fos sequence also led to the loss of receptor binding activity, suggesting that both half-sites of this element are involved in this function. The estrogen induction mediated by either the c-fos or the consensus ERE was blunted by the antiestrogen tamoxifen. Based on these studies, we believe the 3'-fos ERE sequence we have identified may be a major cis-acting element involved in the physiological regulation of the gene by estrogens in vivo.

Expression of ADP-ribosylation factor (ARF)-like protein 6 during mouse embryonic development.

PubMed

Takada, Tatsuyuki; Iida, Keiko; Sasaki, Hiroshi; Taira, Masanori; Kimura, Hiroshi

2005-01-01

ADP-ribosylation factor (ARF)-like protein 6 (ARL6) is a member of the ARF-like protein (ARL) subfamily of small GTPases (Moss, 1995; Chavrier, 1999). ARLs are highly conserved through evolution and most of them possess the consensus sequence required for GTP binding and hydrolysis (Pasquallato, 2002). Among ARLs, ARL6 which was initially isolated from a J2E erythroleukemic cell line is divergent in its consensus sequences and its expression has been shown to be limited to the brain and kidney in adult mouse (Ingley, 1999). Recently, it was reported that mutations of the ARL6 gene cause type 3 Bardet-Biedl syndrome in humans and that ARL6 is involved in ciliary transport in C. elegans (Chiang, 2004; Fan, 2004). Here, we investigated the expression pattern of ARL6 during early mouse development by whole-mount in situ hybridization and found that interestingly, ARL6 mRNA was localized around the node at 7.0-7.5 days post coitum (dpc) embryos, while weak expression was also found in the ectoderm. At the later stage (8.5 dpc) ARL6 was expressed in the neural plate and probably in the somites. Based on these results, a possible role of ARL6 in early development is discussed in relation to the findings in human and C. elegans (Chiang, 2004; Fan, 2004).
Structural determinants of nuclear export signal orientation in binding to exportin CRM1

DOE PAGES

Fung, Ho Yee Joyce; Fu, Szu -Chin; Brautigam, Chad A.; ...

2015-09-08

The Chromosome Region of Maintenance 1 (CRM1) protein mediates nuclear export of hundreds of proteins through recognition of their nuclear export signals (NESs), which are highly variable in sequence and structure. The plasticity of the CRM1-NES interaction is not well understood, as there are many NES sequences that seem incompatible with structures of the NES-bound CRM1 groove. Crystal structures of CRM1 bound to two different NESs with unusual sequences showed the NES peptides binding the CRM1 groove in the opposite orientation (minus) to that of previously studied NESs (plus). A comparison of minus and plus NESs identified structural and sequencemore » determinants for NES orientation. The binding of NESs to CRM1 in both orientations results in a large expansion in NES consensus patterns and therefore a corresponding expansion of potential NESs in the proteome.« less
BASIC PENTACYSTEINE Proteins Mediate MADS Domain Complex Binding to the DNA for Tissue-Specific Expression of Target Genes in Arabidopsis[W

PubMed Central

Simonini, Sara; Roig-Villanova, Irma; Gregis, Veronica; Colombo, Bilitis; Colombo, Lucia; Kater, Martin M.

2012-01-01

BASIC PENTACYSTEINE (BPC) transcription factors have been identified in a large variety of plant species. In Arabidopsis thaliana there are seven BPC genes, which, except for BPC5, are expressed ubiquitously. BPC genes are functionally redundant in a wide range of developmental processes. Recently, we reported that BPC1 binds to guanine and adenine (GA)–rich consensus sequences in the SEEDSTICK (STK) promoter in vitro and induces conformational changes. Here we show by chromatin immunoprecipitation experiments that in vivo BPCs also bind to the consensus boxes, and when these were mutated, expression from the STK promoter was derepressed, resulting in ectopic expression in the inflorescence. We also reveal that SHORT VEGETATIVE PHASE (SVP) is a direct regulator of STK. SVP is a floral meristem identity gene belonging to the MADS box gene family. The SVP-APETALA1 (AP1) dimer recruits the SEUSS (SEU)-LEUNIG (LUG) transcriptional cosuppressor to repress floral homeotic gene expression in the floral meristem. Interestingly, we found that GA consensus sequences in the STK promoter to which BPCs bind are essential for recruitment of the corepressor complex to this promoter. Our data suggest that we have identified a new regulatory mechanism controlling plant gene expression that is probably generally used, when considering BPCs’ wide expression profile and the frequent presence of consensus binding sites in plant promoters. PMID:23054472
bfr1+, a novel gene of Schizosaccharomyces pombe which confers brefeldin A resistance, is structurally related to the ATP-binding cassette superfamily.

PubMed Central

Nagao, K; Taguchi, Y; Arioka, M; Kadokura, H; Takatsuki, A; Yoda, K; Yamasaki, M

1995-01-01

We have isolated a Schizosaccharomyces pombe gene, bfr1+, which on a multicopy plasmid vector, pDB248', confers resistance to brefeldin A (BFA), an inhibitor of intracellular protein transport. This gene encodes a novel protein of 1,531 amino acids with an intramolecular duplicated structure, each half containing a single ATP-binding consensus sequence and a set of six transmembrane sequences. This structural characteristic of bfr1+ protein resembles that of mammalian P-glycoprotein, which, by exporting a variety of anticancer drugs, has been shown to be responsible for multidrug resistance in tumor cells. Consistent with this is that S. pombe cells harboring bfr1+ on pDB248' are resistant to actinomycin D, cerulenin, and cytochalasin B, as well as to BFA. The relative positions of the ATP-binding sequences and the clusters of transmembrane sequences within the bfr1+ protein are, however, transposed in comparison with those in P-glycoprotein; the bfr1+ protein has N-terminal ATP-binding sequence followed by transmembrane segments in each half of the molecule. The bfr1+ protein exhibited significant homology in primary and secondary structures with two recently identified multidrug resistance gene products of Saccharomyces cerevisiae, Snq2 and Sts1/Pdr5/Ydr1. The bfr1+ gene is not essential for cell growth or mating, but a delta bfr1 mutant exhibited hypersensitivity to BFA. We propose that the bfr1+ protein is another member of the ATP-binding cassette superfamily and serves as an efflux pump of various antibiotics. PMID:7883711
To Clone or Not To Clone: Method Analysis for Retrieving Consensus Sequences In Ancient DNA Samples

PubMed Central

Winters, Misa; Barta, Jodi Lynn; Monroe, Cara; Kemp, Brian M.

2011-01-01

The challenges associated with the retrieval and authentication of ancient DNA (aDNA) evidence are principally due to post-mortem damage which makes ancient samples particularly prone to contamination from “modern” DNA sources. The necessity for authentication of results has led many aDNA researchers to adopt methods considered to be “gold standards” in the field, including cloning aDNA amplicons as opposed to directly sequencing them. However, no standardized protocol has emerged regarding the necessary number of clones to sequence, how a consensus sequence is most appropriately derived, or how results should be reported in the literature. In addition, there has been no systematic demonstration of the degree to which direct sequences are affected by damage or whether direct sequencing would provide disparate results from a consensus of clones. To address this issue, a comparative study was designed to examine both cloned and direct sequences amplified from ∼3,500 year-old ancient northern fur seal DNA extracts. Majority rules and the Consensus Confidence Program were used to generate consensus sequences for each individual from the cloned sequences, which exhibited damage at 31 of 139 base pairs across all clones. In no instance did the consensus of clones differ from the direct sequence. This study demonstrates that, when appropriate, cloning need not be the default method, but instead, should be used as a measure of authentication on a case-by-case basis, especially when this practice adds time and cost to studies where it may be superfluous. PMID:21738625
Sequence patterns mediating functions of disordered proteins.

PubMed

Exarchos, Konstantinos P; Kourou, Konstantina; Exarchos, Themis P; Papaloukas, Costas; Karamouzis, Michalis V; Fotiadis, Dimitrios I

2015-01-01

Disordered proteins lack specific 3D structure in their native state and have been implicated with numerous cellular functions as well as with the induction of severe diseases, e.g., cardiovascular and neurodegenerative diseases as well as diabetes. Due to their conformational flexibility they are often found to interact with a multitude of protein molecules; this one-to-many interaction which is vital for their versatile functioning involves short consensus protein sequences, which are normally detected using slow and cumbersome experimental procedures. In this work we exploit information from disorder-oriented protein interaction networks focused specifically on humans, in order to assemble, by means of overrepresentation, a set of sequence patterns that mediate the functioning of disordered proteins; hence, we are able to identify how a single protein achieves such functional promiscuity. Next, we study the sequential characteristics of the extracted patterns, which exhibit a striking preference towards a very limited subset of amino acids; specifically, residues leucine, glutamic acid, and serine are particularly frequent among the extracted patterns, and we also observe a nontrivial propensity towards alanine and glycine. Furthermore, based on the extracted patterns we set off to infer potential functional implications in order to verify our findings and potentially further extrapolate our knowledge regarding the functioning of disordered proteins. We observe that the extracted patterns are primarily involved with regulation, binding and posttranslational modifications, which constitute the most prominent functions of disordered proteins.
Full trans-activation mediated by the immediate-early protein of equine herpesvirus 1 requires a consensus TATA box, but not its cognate binding sequence.

PubMed

Kim, Seong K; Shakya, Akhalesh K; O'Callaghan, Dennis J

2016-01-04

The immediate-early protein (IEP) of equine herpesvirus 1 (EHV-1) has extensive homology to the IEP of alphaherpesviruses and possesses domains essential for trans-activation, including an acidic trans-activation domain (TAD) and binding domains for DNA, TFIIB, and TBP. Our data showed that the IEP directly interacted with transcription factor TFIIA, which is known to stabilize the binding of TBP and TFIID to the TATA box of core promoters. When the TATA box of the EICP0 promoter was mutated to a nonfunctional TATA box, IEP-mediated trans-activation was reduced from 22-fold to 7-fold. The IEP trans-activated the viral promoters in a TATA motif-dependent manner. Our previous data showed that the IEP is able to repress its own promoter when the IEP-binding sequence (IEBS) is located within 26-bp from the TATA box. When the IEBS was located at 100 bp upstream of the TATA box, IEP-mediated trans-activation was very similar to that of the minimal IE(nt -89 to +73) promoter lacking the IEBS. As the distance from the IEBS to the TATA box decreased, IEP-mediated trans-activation progressively decreased, indicating that the IEBS located within 100 bp from the TATA box sequence functions as a distance-dependent repressive element. These results indicated that IEP-mediated full trans-activation requires a consensus TATA box of core promoters, but not its binding to the cognate sequence (IEBS). Copyright © 2015 Elsevier B.V. All rights reserved.
Full trans–activation mediated by the immediate–early protein of equine herpesvirus 1 requires a consensus TATA box, but not its cognate binding sequence

PubMed Central

Kim, Seong K.; Shakya, Akhalesh K.; O'Callaghan, Dennis J.

2015-01-01

The immediate-early protein (IEP) of equine herpesvirus 1 (EHV-1) has extensive homology to the IEP of alphaherpesviruses and possesses domains essential for trans-activation, including an acidic trans-activation domain (TAD) and binding domains for DNA, TFIIB, and TBP. Our data showed that the IEP directly interacted with transcription factor TFIIA, which is known to stabilize the binding of TBP and TFIID to the TATA box of core promoters. When the TATA box of the EICP0 promoter was mutated to a nonfunctional TATA box, IEP-mediated trans-activation was reduced from 22-fold to 7-fold. The IEP trans-activated the viral promoters in a TATA motif-dependent manner. Our previous data showed that the IEP is able to repress its own promoter when the IEP-binding sequence (IEBS) is located within 26-bp from the TATA box. When the IEBS was located at 100 bp upstream of the TATA box, IEP-mediated trans-activation was very similar to that of the minimal IE(nt −89 to +73) promoter lacking the IEBS. As the distance from the IEBS to the TATA box decreased, IEP-mediated trans-activation progressively decreased, indicating that the IEBS located within 100 bp from the TATA box sequence functions as a distance-dependent repressive element. These results indicated that IEP-mediated full trans-activation requires a consensus TATA box of core promoters, but not its binding to the cognate sequence (IEBS). PMID:26541315
Human Ro60 (SSA2) genomic organization and sequence alterations, examined in cutaneous lupus erythematosus.

PubMed

Millard, T P; Ashton, G H S; Kondeatis, E; Vaughan, R W; Hughes, G R V; Khamashta, M A; Hawk, J L M; McGregor, J M; McGrath, J A

2002-02-01

The Ro 60 kDa protein (Ro60 or SSA2) is the major component of the Ro ribonucleoprotein (Ro RNP) complex, to which an immune response is a specific feature of several autoimmune diseases. The genomic organization and any sequence variation within the DNA encoding Ro60 are unknown. To characterize the Ro60 gene structure and to assess whether any sequence alterations might be associated with serum anti-Ro antibody in subacute cutaneous lupus erythematosus (SCLE), thus potentially providing new insight into disease pathogenesis. The cDNA sequence for Ro60 was obtained from the NCBI database and used for a BLAST search for a clone containing the entire genomic sequence. The intron-exon borders were confirmed by designing intronic primer pairs to flank each exon, which were then used to amplify genomic DNA for automated sequencing from 36 caucasian patients with SCLE (anti-Ro positive) and 49 with discoid LE (DLE, anti-Ro negative), in addition to 36 healthy caucasian controls. Heteroduplex analysis of polymerase chain reaction (PCR) products from patients and controls spanning all Ro60 exons (1-8) revealed a common bandshift in the PCR products spanning exon 7. Sequencing of the corresponding PCR products demonstrated an A > G substitution at nucleotide position 1318-7, within the consensus acceptor splice site of exon 7 (GenBank XM001901). The allele frequencies were major allele A (0.71) and minor allele G (0.29) in 72 control chromosomes, with no significant differences found between SCLE patients, DLE patients and controls. The genomic organization of the DNA encoding the Ro60 protein is described, including a common polymorphism within the consensus acceptor splice site of exon 7. Our delineation of a strategy for the genomic amplification of Ro60 forms a basis for further examination of the pathological functions of the Ro RNP in autoimmune disease.
Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats.

PubMed

Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe

2010-11-26

Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.
Applying the Concept of Peptide Uniqueness to Anti-Polio Vaccination

PubMed Central

Kanduc, Darja; Fasano, Candida; Capone, Giovanni; Pesce Delfino, Antonella; Calabrò, Michele; Polimeno, Lorenzo

2015-01-01

Background. Although rare, adverse events may associate with anti-poliovirus vaccination thus possibly hampering global polio eradication worldwide. Objective. To design peptide-based anti-polio vaccines exempt from potential cross-reactivity risks and possibly able to reduce rare potential adverse events such as the postvaccine paralytic poliomyelitis due to the tendency of the poliovirus genome to mutate. Methods. Proteins from poliovirus type 1, strain Mahoney, were analyzed for amino acid sequence identity to the human proteome at the pentapeptide level, searching for sequences that (1) have zero percent of identity to human proteins, (2) are potentially endowed with an immunologic potential, and (3) are highly conserved among poliovirus strains. Results. Sequence analyses produced a set of consensus epitopic peptides potentially able to generate specific anti-polio immune responses exempt from cross-reactivity with the human host. Conclusion. Peptide sequences unique to poliovirus proteins and conserved among polio strains might help formulate a specific and universal anti-polio vaccine able to react with multiple viral strains and exempt from the burden of possible cross-reactions with human proteins. As an additional advantage, using a peptide-based vaccine instead of current anti-polio DNA vaccines would eliminate the rare post-polio poliomyelitis cases and other disabling symptoms that may appear following vaccination. PMID:26568962
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zemla, A; Lang, D; Kostova, T

2010-11-29

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less
Improving transmembrane protein consensus topology prediction using inter-helical interaction.

PubMed

Wang, Han; Zhang, Chao; Shi, Xiaohu; Zhang, Li; Zhou, You

2012-11-01

Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html. Copyright © 2012 Elsevier B.V. All rights reserved.
Identification of a maize nucleic acid-binding protein (NBP) belonging to a family of nuclear-encoded chloroplast proteins.

PubMed Central

Cook, W B; Walker, J C

1992-01-01

A cDNA encoding a nuclear-encoded chloroplast nucleic acid-binding protein (NBP) has been isolated from maize. Identified as an in vitro DNA-binding activity, NBP belongs to a family of nuclear-encoded chloroplast proteins which share a common domain structure and are thought to be involved in posttranscriptional regulation of chloroplast gene expression. NBP contains an N-terminal chloroplast transit peptide, a highly acidic domain and a pair of ribonucleoprotein consensus sequence domains. NBP is expressed in a light-dependent, organ-specific manner which is consistent with its involvement in chloroplast biogenesis. The relationship of NBP to the other members of this protein family and their possible regulatory functions are discussed. Images PMID:1346929
Structure and sequence analyses of Bacteroides proteins BVU_4064 and BF1687 reveal presence of two novel predominantly-beta domains, predicted to be involved in lipid and cell surface interactions

DOE PAGES

Natarajan, Padmaja; Punta, Marco; Kumar, Abhinav; ...

2015-01-16

N-terminal domains of BVU_4064 and BF1687 proteins from Bacteroides vulgatus and Bacteroides fragilis respectively are members of the Pfam family PF12985 (DUF3869). Proteins containing a domain from this family can be found in most Bacteroides species and, in large numbers, in all human gut microbiome samples. Both BVU_4064 and BF1687 proteins have a consensus lipobox motif implying they are anchored to the membrane, but their functions are otherwise unknown. The C-terminal half of BVU_4064 is assigned to protein family PF12986 (DUF3870); the equivalent part of BF1687 was unclassified.
Bioengineered Chimeric Spider Silk-Uranium Binding Proteins

PubMed Central

Krishnaji, Sreevidhya Tarakkad; Kaplan, David L.

2014-01-01

Heavy metals constitute a source of environmental pollution. Here, novel functional hybrid biomaterials for specific interactions with heavy metals are designed by bioengineering consensus sequence repeats from spider silk of Nephila clavipes with repeats of a uranium peptide recognition motif from a mutated 33-residue of calmodulin protein from Paramecium tetraurelia. The self-assembly features of the silk to control nanoscale organic/inorganic material interfaces provides new biomaterials for uranium recovery. With subsequent enzymatic digestion of the silk to concentrate the sequestered metals, options can be envisaged to use these new chimeric protein systems in environmental engineering, including to remediate environments contaminated by uranium. PMID:23212989
Identification of a penicillin-sensitive carboxypeptidase in the cellular slime mold Dictyostelium discoideum.

PubMed

Yasukawa, Hiro; Kuroita, Toshihiro; Tamura, Kentaro; Yamaguchi, Kazuo

2003-07-01

Penicillin binding proteins (PBPs) are penicillin-sensitive DD-peptidases catalyzing the terminal stages of bacterial cell wall assembly. We identified a Dictyostelium discoideum gene that encodes a protein of 522 amino acids showing similarity to Escherichia coli PBP4. The D. discoideum protein conserves three consensus sequences (SXXK, SXN and KTG) that are responsible for the catalytic activities of PBPs. The gene product prepared in the cell-free translation system showed carboxypeptidase activity but the activity was not detected in the presence of penicillin G. These results demonstrate that the D. discoideum gene encodes a eukaryotic form of penicillin-sensitive carboxypeptidase.
Effect of the linkers between the zinc fingers in zinc finger protein 809 on gene silencing and nuclear localization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ichida, Yu, E-mail: ichida-y@ncchd.go.jp; Utsunomiya, Yuko; Onodera, Masafumi

2016-03-18

Zinc finger protein 809 (ZFP809) belongs to the Kruppel-associated box-containing zinc finger protein (KRAB-ZFP) family and functions in repressing the expression of Moloney murine leukemia virus (MoMLV). ZFP809 binds to the primer-binding site (PBS)located downstream of the MoMLV-long terminal repeat (LTR) and induces epigenetic modifications at integration sites, such as repressive histone modifications and de novo DNA methylation. KRAB-ZFPs contain consensus TGEKP linkers between C2H2 zinc fingers. The phosphorylation of threonine residues within linkers leads to the inactivation of zinc finger binding to target sequences. ZFP809 also contains consensus linkers between zinc fingers. However, the function of ZFP809 linkers remainsmore » unknown. In the present study, we constructed ZFP809 proteins containing mutated linkers and examined their ability to silence transgene expression driven by MLV, binding ability to MLV PBS, and cellular localization. The results of the present study revealed that the linkers affected the ability of ZFP809 to silence transgene expression. Furthermore, this effect could be partly attributed to changes in the localization of ZFP809 proteins containing mutated linkers. Further characterization of ZFP809 linkers is required for understanding the functions and features of KRAB-ZFP-containing linkers. - Highlights: • ZFP809 has three consensus linkers between the zinc fingers. • Linkers are required for ZFP809 to silence transgene expression driven by MLV-LTR. • Linkers affect the precise nuclear localization of ZFP809.« less
A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data.

PubMed

Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

2018-02-01

To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Euglena gracilis chloroplast DNA: analysis of a 1.6 kb intron of the psb C gene containing an open reading frame of 458 codons.

PubMed

Montandon, P E; Vasserot, A; Stutz, E

1986-01-01

We retrieved a 1.6 kbp intron separating two exons of the psb C gene which codes for the 44 kDa reaction center protein of photosystem II. This intron is 3 to 4 times the size of all previously sequenced Euglena gracilis chloroplast introns. It contains an open reading frame of 458 codons potentially coding for a basic protein of 54 kDa of yet unknown function. The intron boundaries follow consensus sequences established for chloroplast introns related to class II and nuclear pre-mRNA introns. Its 3'-terminal segment has structural features similar to class II mitochondrial introns with an invariant base A as possible branch point for lariat formation.

QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information.

PubMed

Benkert, Pascal; Schwede, Torsten; Tosatto, Silvio Ce

2009-05-20

The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Complete sequence of the genome of avian paramyxovirus type 9 and comparison with other paramyxoviruses

PubMed Central

Samuel, Arthur S.; Kumar, Sachin; Madhuri, Subbiah; Collins, Peter L.; Samal, Siba K.

2009-01-01

The complete genome consensus sequence was determined for avian paramyxovirus (APMV) serotype 9 prototype strain PMV-9/domestic Duck/New York/22/78. The genome is 15,438 nucleotides (nt) long and encodes six non-overlapping genes in the order of 3′-N-P/V/W-M-F-HN-L-5′ with intergenic regions of 0–30 nt. The genome length follows the “rule of six” and contains a 55-nt leader sequence at the 3′ end and a 47-nt trailer sequence at the 5′ end. The cleavage site of the F protein is I-R-E-G-R-I↓F, which does not conform to the conventional cleavage site of the ubiquitous cellular protease furin. The virus required exogenous protease for in vitro replication and grew only in a few established cell lines, indicating a restricted host range. Alignment and phylogenetic analysis of the predicted amino acid sequences of APMV-9 proteins with the cognate proteins of viruses of all five genera of family Paramyxoviridae showed that APMV-9 is more closely related to APMV-1 than to other APMVs. The mean death time in embryonated chicken eggs was found to be more than 120 h, indicating APMV-9 to be avirulent for chickens. PMID:19185593
Identification and cloning of a gamma 3 subunit splice variant of the human GABA(A) receptor.

PubMed

Poulsen, C F; Christjansen, K N; Hastrup, S; Hartvig, L

2000-05-31

cDNA sequences encoding two forms of the GABA(A) gamma 3 receptor subunit were cloned from human hippocampus. The nucleotide sequences differ by the absence (gamma 3S) or presence (gamma 3L) of 18 bp located in the presumed intracellular loop between transmembrane region (TM) III and IV. The extra 18 bp in the gamma 3L subunit generates a consensus site for phosphorylation by protein kinase C (PKC). Analysis of human genomic DNA encoding the gamma 3 subunit reveals that the 18 bp insert is contiguous with the upstream proximal exon.
The pig CYP2E1 promoter is activated by COUP-TF1 and HNF-1 and is inhibited by androstenone.

PubMed

Tambyrajah, Winston S; Doran, Elena; Wood, Jeffrey D; McGivan, John D

2004-11-15

Functional analysis of the pig cytochrome P4502E1 (CYP2E1) promoter identified two major activating elements. One corresponded to the hepatic nuclear factor 1 (HNF-1) consensus binding sequence at nucleotides -128/-98 and the other was located in the region -292/-266. The binding of proteins in pig liver nuclear extracts to a synthetic double-stranded oligonucleotide corresponding to this more distal activating sequence was studied by electrophoretic mobility shift assay. The minimum protein binding sequence was identified as TGTTCTGACCTCTGGG. Gel super-shift assays identified the protein binding to this site as chick ovalbumin upstream promoter transcription factor 1 (COUP-TF1). Androstenone inhibited promoter activity in transfection experiments only with constructs which included the COUP-TF1 binding site. Androstenone inhibited COUP-TF1 binding to synthetic oligonucleotides but did not affect HNF-1 binding. The results offer an explanation for the inhibition of CYP2E1 protein expression by androstenone in isolated pig hepatocytes and may be relevant to the low expression of hepatic CYP2E1 in those pigs which accumulate high levels of androstenone in vivo.
Elevated expression of ribosomal protein genes L37, RPP-1, and S2 in the presence of mutant p53.

PubMed

Loging, W T; Reisman, D

1999-11-01

The wild-type p53 protein is a DNA-binding transcription factor that activates genes such as p21, MDM2, GADD45, and Bax that are required for the regulation of cell cycle progression or apoptosis in response to DNA damage. Mutant forms of p53, which are transforming oncogenes and are expressed at high levels in tumor cells, generally have a reduced binding affinity for the consensus DNA sequence. Interestingly, some p53 mutants that are no longer effective at binding to the consensus DNA sequence and transactivating promoters containing this target site have acquired the ability to transform cells in culture, in part through their ability to transactivate promoters of a number of genes that are not targets of the wild-type protein. Certain p53 mutants are therefore considered to be gain-of-function mutants and appear to be promoting proliferation or transforming cells through their ability to alter the expression of novel sets of genes. Our goal is to identify genes that have altered expression in the presence of a specific mutant p53 (Arg to Trp mutation at codon 248) protein. Through examining differential gene expression in cells devoid of p53 expression and in cells that express high levels of mutant p53 protein, we have identified three ribosomal protein genes that have elevated expression in response to mutant p53. Consistent with these findings, the overexpression of a number of ribosomal protein genes in human tumors and evidence for their contribution to oncogenic transformation have been reported previously, although the mechanism leading to this overexpression has remained elusive. We show results that indicate that expression of these specific ribosomal protein genes is increased in the presence of the R248W p53 mutant, which provides a mechanism for their overexpression in human tumors.
Molecular cloning and characterization of RGA1 encoding a G protein alpha subunit from rice (Oryza sativa L. IR-36).

PubMed

Seo, H S; Kim, H Y; Jeong, J Y; Lee, S Y; Cho, M J; Bahk, J D

1995-03-01

A cDNA clone, RGA1, was isolated by using a GPA1 cDNA clone of Arabidopsis thaliana G protein alpha subunit as a probe from a rice (Oryza sativa L. IR-36) seedling cDNA library from roots and leaves. Sequence analysis of genomic clone reveals that the RGA1 gene has 14 exons and 13 introns, and encodes a polypeptide of 380 amino acid residues with a calculated molecular weight of 44.5 kDa. The encoded protein exhibits a considerable degree of amino acid sequence similarity to all the other known G protein alpha subunits. A putative TATA sequence (ATATGA), a potential CAAT box sequence (AGCAATAC), and a cis-acting element, CCACGTGG (ABRE), known to be involved in ABA induction are found in the promoter region. The RGA1 protein contains all the consensus regions of G protein alpha subunits except the cysteine residue near the C-terminus for ADP-ribosylation by pertussis toxin. The RGA1 polypeptide expressed in Escherichia coli was, however, ADP-ribosylated by 10 microM [adenylate-32P] NAD and activated cholera toxin. Southern analysis indicates that there are no other genes similar to the RGA1 gene in the rice genome. Northern analysis reveals that the RGA1 mRNA is 1.85 kb long and expressed in vegetative tissues, including leaves and roots, and that its expression is regulated by light.
Cloning and characterization of the gene encoding IMP dehydrogenase from Arabidopsis thaliana.

PubMed

Collart, F R; Osipiuk, J; Trent, J; Olsen, G J; Huberman, E

1996-10-03

We have cloned and characterized the gene encoding inosine monophosphate dehydrogenase (IMPDH) from Arabidopsis thaliana (At). The transcription unit of the At gene spans approximately 1900 bp and specifies a protein of 503 amino acids with a calculated relative molecular mass (M(r)) of 54,190. The gene is comprised of a minimum of four introns and five exons with all donor and acceptor splice sequences conforming to previously proposed consensus sequences. The deduced IMPDH amino-acid sequence from At shows a remarkable similarity to other eukaryotic IMPDH sequences, with a 48% identity to human Type II enzyme. Allowing for conservative substitutions, the enzyme is 69% similar to human Type II IMPDH. The putative active-site sequence of At IMPDH conforms to the IMP dehydrogenase/guanosine monophosphate reductase motif and contains an essential active-site cysteine residue.
Characterization of the molecular chaperone calnexin in the channel catfish, Ictalurus punctatus, and its association with MHC class II molecules.

PubMed

Fuller, James R; Pitzer, Joshua E; Godwin, Ulla; Albertino, Mark; Machon, Benjamin D; Kearse, Kelly P; McConnell, Thomas J

2004-05-17

Folding and assembly of MHC molecules in mammals occurs in the endoplasmic reticulum (ER), but has not been studied in teleosts. Calnexin (CNX) is an ER chaperone that associates with glycoproteins bearing a monoglucosylated N-linked oligosaccharide side chain. Here we report the first identification and characterization of a full-length CNX cDNA clone in a teleost, and the association of the CNX chaperone with MHC class II in a channel catfish T cell line. The 1.8 kb CNX clone encodes a protein of 607 amino acids that is 72% identical to the consensus sequence of mammalian CNXs. The association of CNX with class II is of particular interest because the native MHC class II alpha chain of Ictalurus punctatus does not bear any N-linked oligosaccharide consensus glycosylation sequences. Thus the assembly of class II molecules in the catfish probably proceeds via different steps than occurs in mammals. Copyright 2003 Elsevier Ltd.
Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

PubMed

Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

2012-01-01

Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.
Mass Spectrometry to Identify New Biomarkers of Nerve Agent Exposure

DTIC Science & Technology

2010-04-01

target for oganophosphorus agent (OP) binding to enzymes is the active site serine in the consensus sequence GlyXSerXGly of acetylcholinesterase. By...human plasma. Task 6. Use a second method, for example enzyme activity assays or immunoprecipitation, to confirm the identity of soman-labeled proteins...spectrometry identifies covalent binding of soman, sarin, chlorpyrifos oxon, diisopropyl fluorophosphate, and FP-biotin to tyrosines on tubulin: a potential
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

PubMed

Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

2001-08-15

This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

PubMed Central

Pujar, Shashikant; O’Leary, Nuala A; Farrell, Catherine M; Mudge, Jonathan M; Wallin, Craig; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bult, Carol J; Frankish, Adam; Pruitt, Kim D

2018-01-01

Abstract The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. PMID:29126148
Flexible DNA binding of the BTB/POZ-domain protein FBI-1.

PubMed

Pessler, Frank; Hernandez, Nouria

2003-08-01

POZ-domain transcription factors are characterized by the presence of a protein-protein interaction domain called the POZ or BTB domain at their N terminus and zinc fingers at their C terminus. Despite the large number of POZ-domain transcription factors that have been identified to date and the significant insights that have been gained into their cellular functions, relatively little is known about their DNA binding properties. FBI-1 is a BTB/POZ-domain protein that has been shown to modulate HIV-1 Tat trans-activation and to repress transcription of some cellular genes. We have used various viral and cellular FBI-1 binding sites to characterize the interaction of a POZ-domain protein with DNA in detail. We find that FBI-1 binds to inverted sequence repeats downstream of the HIV-1 transcription start site. Remarkably, it binds efficiently to probes carrying these repeats in various orientations and spacings with no particular rotational alignment, indicating that its interaction with DNA is highly flexible. Indeed, FBI-1 binding sites in the adenovirus 2 major late promoter, the c-fos gene, and the c-myc P1 and P2 promoters reveal variously spaced direct, inverted, and everted sequence repeats with the consensus sequence G(A/G)GGG(T/C)(C/T)(T/C)(C/T) for each repeat.
[Engineered spider silk: the intelligent biomaterial of the future. Part I].

PubMed

Florczak, Anna; Piekoś, Konrad; Kaźmierska, Katarzyna; Mackiewicz, Andrzej; Dams-Kozłowska, Hanna

2011-06-17

The unique properties of spider silk such as strength, extensibility, toughness, biocompatibility and biodegradability are the reasons for the recent development in silk biomaterial technology. For a long time scientific progress was impeded by limited access to spider silk. However, the development of the molecular biology strategy was a breaking point in synthetic spider silk protein design. The sequences of engineered spider silk are based on the consensus motives of the corresponding natural equivalents. Moreover, the engineered silk proteins may be modified in order to gain a new function. The strategy of the hybrid proteins constructed on the DNA level combines the sequence of engineered silk, which is responsible for the biomaterial structure, with the sequence of polypeptide which allows functionalization of the silk biomaterial. The functional domains may comprise receptor binding sites, enzymes, metal or sugar binding sites and others. Currently, advanced research is being conducted, which on the one hand focuses on establishing the particular silk structure and understanding the process of silk thread formation in nature. On the other hand, there are attempts to improve methods of engineered spider silk protein production. Due to acquired knowledge and recent progress in synthetic protein technology, the engineered silk will turn into intelligent biomaterial of the future, while its industrial production scale will trigger a biotechnological revolution.
Cyclosporin A and FK-506 both affect DNA binding of regulatory nuclear proteins to the human interleukin-2 promoter.

PubMed

Baumann, G; Geisse, S; Sullivan, M

1991-03-01

The structurally unrelated immunosuppressive drugs cyclosporin A (Sandimmun) and FK-506 both interfere with the process of T-cell proliferation by blocking the transcription of the T-cell growth factor interleukin-2 (IL-2). Here we demonstrate that the transcriptional activation of this gene requires the binding of regulatory nuclear proteins to a promoter element with sequence similarity to the consensus binding site for NF-kappa B-related transcription factors. We present evidence that the binding by regulatory nuclear proteins to the kappa B element of the IL-2 promoter is affected negatively by cyclosporin A and FK-506 at concentrations paralleling their immunosuppressive activity in vivo. The decrease in DNA-protein complex formation induced by the immunosuppressive drugs correlates with a decrease in IL-2 production. FK-506 is 10 to 100 times more potent than cyclosporin A in its ability to inhibit sequence-specific DNA binding and IL-2 production. Our findings suggest that the actions of both drugs converge at the level of DNA-protein interaction.
Molecular cloning, sequencing, and expression of the outer membrane protein P2 gene of Haemophilus parasuis.

PubMed

Li, Peng; Bai, Juan; Li, Jun-xing; Zhang, Guo-long; Song, Yan-hua; Li, Yu-feng; Wang, Xian-wei; Jiang, Ping

2012-10-01

Haemophilus parasuis is the etiological agent of Glässer's disease characterized by fibrinous polyserositis, polyarthritis, and meningitis in young pigs. But it is difficult to develop universal serological diagnostic tools and effective vaccines against this disease because of the serovar diversity of the isolates. In this study, enterobacterial repetitive intergenic consensus-polymerase chain reaction, were performed to investigate the gene profile of 111 isolates of H. parasuis from China. And a specific common gene of H. parasuis was cloned and identified as the outer-membrane protein (OMP) P2 gene. Sequencing results of OMP P2 genes of 22 isolates showed that they had high homology and could be divided into 2 genetic types. Moreover, the OMPP2 protein was expressed in Escherichia coli expressing system. And the purified recombinant protein provided partial protection against H. parasuis infection in mice. It suggested the OMP P2 was an immunogenic protein and had great potential to serve as a vaccine and diagnostic antigen. Copyright © 2011 Elsevier Ltd. All rights reserved.
The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides

PubMed Central

Tsirigos, Konstantinos D.; Peters, Christoph; Shu, Nanjiang; Käll, Lukas; Elofsson, Arne

2015-01-01

TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions. PMID:25969446
The structure of TON1937 from archaeon Thermococcus onnurineus NA1 reveals a eukaryotic HEAT-like architecture.

PubMed

Jeong, Jae-Hee; Kim, Yi-Seul; Rojviriya, Catleya; Cha, Hyung Jin; Ha, Sung-Chul; Kim, Yeon-Gil

2013-10-01

The members of the ARM/HEAT repeat-containing protein superfamily in eukaryotes have been known to mediate protein-protein interactions by using their concave surface. However, little is known about the ARM/HEAT repeat proteins in prokaryotes. Here we report the crystal structure of TON1937, a hypothetical protein from the hyperthermophilic archaeon Thermococcus onnurineus NA1. The structure reveals a crescent-shaped molecule composed of a double layer of α-helices with seven anti-parallel α-helical repeats. A structure-based sequence alignment of the α-helical repeats identified a conserved pattern of hydrophobic or aliphatic residues reminiscent of the consensus sequence of eukaryotic HEAT repeats. The individual repeats of TON1937 also share high structural similarity with the canonical eukaryotic HEAT repeats. In addition, the concave surface of TON1937 is proposed to be its potential binding interface based on this structural comparison and its surface properties. These observations lead us to speculate that the archaeal HEAT-like repeats of TON1937 have evolved to engage in protein-protein interactions in the same manner as eukaryotic HEAT repeats. Copyright © 2013 Elsevier B.V. All rights reserved.
Molecular identification and characterization of clustered regularly interspaced short palindromic repeat (CRISPR) gene cluster in Taylorella equigenitalis.

PubMed

Hara, Yasushi; Hayashi, Kyohei; Nakajima, Takuya; Kagawa, Shizuko; Tazumi, Akihiro; Moore, John E; Matsuda, Motoo

2013-09-01

Clustered regularly interspaced short palindromic repeats (CRISPRs), of approximately 10,000 base pairs (bp) in length, were shown to occur in the Japanese Taylorella equigenitalis strain, EQ59. The locus was composed of the putative CRISPRs-associated with 5 (cas5), RAMP csd1, csd2, recB, cas1, a leader region, 13 CRISPR consensus sequence repeats (each 32 bp; 5'-TCAGCCACGTTCGCGTGGCTGTGTGTTTAAAG-3'). These were in turn separated by 12 non repetitive unique spacer regions of similar length. In addition, a leader region, a transposase/IS protein, a leader region, and cas3 were also seen. All seven putative open reading frames carry their ribosome binding sites. Promoter consensus sequences at the -35 and -10 regions and putative intrinsic ρ-independent transcription terminator regions also occurred. A possible long overlap of 170 bp in length occurred between the recB and cas1 loci. Positive reverse transcription PCR signals of cas5, RAMP csd1, csd2-recB/cas1, and cas3 were generated. A putative secondary structure of the CRISPR consensus repeats was constructed. Following this, CRISPR results of the T. equigenitalis EQ59 isolate were subsequently compared with those from the Taylorella asinigenitalis MCE3 isolate.
Modulation of Protein Phosphorylation, N-Glycosylation and Lys-Acetylation in Grape (Vitis vinifera) Mesocarp and Exocarp Owing to Lobesia botrana Infection*

PubMed Central

Melo-Braga, Marcella N.; Verano-Braga, Thiago; León, Ileana R.; Antonacci, Donato; Nogueira, Fábio C. S.; Thelen, Jay J.; Larsen, Martin R.; Palmisano, Giuseppe

2012-01-01

Grapevine (Vitis vinifera) is an economically important fruit crop that is subject to many types of insect and pathogen attack. To better elucidate the plant response to Lobesia botrana pathogen infection, we initiated a global comparative proteomic study monitoring steady-state protein expression as well as changes in N-glycosylation, phosphorylation, and Lys-acetylation in control and infected mesocarp and exocarp from V. vinifera cv Italia. A multi-parallel, large-scale proteomic approach employing iTRAQ labeling prior to three peptide enrichment techniques followed by tandem mass spectrometry led to the identification of a total of 3059 proteins, 1135 phosphorylation sites, 323 N-linked glycosylation sites and 138 Lys-acetylation sites. Of these, we could identify changes in abundance of 899 proteins. The occupancy of 110 phosphorylation sites, 10 N-glycosylation sites and 20 Lys-acetylation sites differentially changed during L. botrana infection. Sequence consensus analysis for phosphorylation sites showed eight significant motifs, two of which containing up-regulated phosphopeptides (X-G-S-X and S-X-X-D) and two containing down-regulated phosphopeptides (R-X-X-S and S-D-X-E) in response to pathogen infection. Topographical distribution of phosphorylation sites within primary sequences reveal preferential phosphorylation at both the N- and C termini, and a clear preference for C-terminal phosphorylation in response to pathogen infection suggesting induction of region-specific kinase(s). Lys-acetylation analysis confirmed the consensus X-K-Y-X motif previously detected in mammals and revealed the importance of this modification in plant defense. The importance of N-linked protein glycosylation in plant response to biotic stimulus was evident by an up-regulated glycopeptide belonging to the disease resistance response protein 206. This study represents a substantial step toward the understanding of protein and PTMs-mediated plant-pathogen interaction shedding light on the mechanisms underlying the grape infection. PMID:22778145

Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases

PubMed Central

Llorens, Carlos; Futami, Ricardo; Renaud, Gabriel; Moya, Andrés

2009-01-01

Background Clan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity. Results In this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database . Conclusion The pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives. Reviewers This article was reviewed by Daniel H. Haft, Vladimir Kapitonov (nominated by Jerry Jurka), and Ben M. Dunn (nominated by Claus Wilke). PMID:19173708
BioWord: A sequence manipulation suite for Microsoft Word

PubMed Central

2012-01-01

Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms. PMID:22676326
BioWord: a sequence manipulation suite for Microsoft Word.

PubMed

Anzaldi, Laura J; Muñoz-Fernández, Daniel; Erill, Ivan

2012-06-07

The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
Multiple splicing defects in an intronic false exon.

PubMed

Sun, H; Chasin, L A

2000-09-01

Splice site consensus sequences alone are insufficient to dictate the recognition of real constitutive splice sites within the typically large transcripts of higher eukaryotes, and large numbers of pseudoexons flanked by pseudosplice sites with good matches to the consensus sequences can be easily designated. In an attempt to identify elements that prevent pseudoexon splicing, we have systematically altered known splicing signals, as well as immediately adjacent flanking sequences, of an arbitrarily chosen pseudoexon from intron 1 of the human hprt gene. The substitution of a 5' splice site that perfectly matches the 5' consensus combined with mutation to match the CAG/G sequence of the 3' consensus failed to get this model pseudoexon included as the central exon in a dhfr minigene context. Provision of a real 3' splice site and a consensus 5' splice site and removal of an upstream inhibitory sequence were necessary and sufficient to confer splicing on the pseudoexon. This activated context also supported the splicing of a second pseudoexon sequence containing no apparent enhancer. Thus, both the 5' splice site sequence and the polypyrimidine tract of the pseudoexon are defective despite their good agreement with the consensus. On the other hand, the pseudoexon body did not exert a negative influence on splicing. The introduction into the pseudoexon of a sequence selected for binding to ASF/SF2 or its replacement with beta-globin exon 2 only partially reversed the effect of the upstream negative element and the defective polypyrimidine tract. These results support the idea that exon-bridging enhancers are not a prerequisite for constitutive exon definition and suggest that intrinsically defective splice sites and negative elements play important roles in distinguishing the real splicing signal from the vast number of false splicing signals.
Identification of a novel plant amalgavirus (Amalgavirus, Amalgaviridae) genome sequence in Cistus incanus.

PubMed

Goh, C J; Park, D; Lee, J S; Sebastiani, F; Hahn, Y

2018-01-01

Amalgaviridae is a family of double-stranded, monosegmented RNA viruses that are associated with plants, fungi, microsporidians, and animals. A sequence contig derived from the transcriptome of a eudicot, Cistus incanus (the family Cistaceae; commonly known as hoary rockrose), was identified as the genome sequence of a novel plant RNA virus and named Cistus incanus RNA virus 1 (CiRV1). Sequence comparison and phylogenetic analysis indicated that CiRV1 is a novel species of the genus Amalgavirus in the family Amalgaviridae. The CiRV1 genome contig has two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. An ORF1+2 fusion protein, which functions in viral RNA replication, is produced by a +1 programmed ribosomal frameshifting (PRF) mechanism. A +1 PRF motif UUU_CGU, which matches the conserved amalgavirus +1 PRF consensus sequence UUU_CGN, was found at the boundary of CiRV1 ORF1 and ORF2. Comparison of 25 amalgavirus ORF1+2 fusion proteins revealed that only three different positions within a 13-amino acid segment were recurrently used at the boundary, possibly being selected so as not to interfere with correct folding and function of the fusion protein. CiRV1 is the first virus found to be associated with the Cistus species and may be useful for studying amalgaviruses.
Armored RNA Technology for Production of Ribonuclease-Resistant Viral RNA Controls and Standards

PubMed Central

Pasloske, Brittan L.; Walkerpeach, Cindy R.; Obermoeller, R. Dawn; Winkler, Matthew; DuBois, Dwight B.

1998-01-01

The widespread use of sensitive assays for the detection of viral and cellular RNA sequences has created a need for stable, well-characterized controls and standards. We describe the development of a versatile, novel system for creating RNase-resistant RNA. “Armored RNA” is a complex of MS2 bacteriophage coat protein and RNA produced in Escherichia coli by the induction of an expression plasmid that encodes the coat protein and an RNA standard sequence. The RNA sequences are completely protected from RNase digestion within the bacteriophage-like complexes. As a prototype, a 172-base consensus sequence from a portion of the human immunodeficiency virus type 1 (HIV-1) gag gene was synthesized and cloned into the packaging vector used to produce the bacteriophage-like particles. After production and purification, the resulting HIV-1 Armored RNA particles were shown to be resistant to degradation in human plasma and produced reproducible results in the Amplicor HIV-1 Monitor assay for 180 days when stored at −20°C or for 60 days at 4°C. Additionally, Armored RNA preparations are homogeneous and noninfectious. PMID:9817878
Characterization and chromosomal mapping of the human TFG gene involved in thyroid carcinoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mencinger, M.; Panagopoulos, I.; Andreasson, P.

1997-05-01

Homology searches in the Expressed Sequence Tag Database were performed using SPYGQ-rich regions as query sequences to find genes encoding protein regions similar to the N-terminal parts of the sarcoma-associated EWS and FUS proteins. Clone 22911 (T74973), encoding a SPYGQ-rich region in its 5{prime} end, and several other clones that overlapped 22911 were selected. The combined data made it possible to assemble a full-length cDNA sequence. This cDNA sequence is 1677 bp, containing an initiation codon ATG, an open reading frame of 400 amino acids, a poly(A) signal, and a poly(A) tail. We found 100% identity between the 5{prime} partmore » of the consensus sequence and the 598-bp-long sequence named TFG. The TFG sequence is fused to the 3{prime} end of NTRK1, generating the TRK-T3 fusion transcript found in papillary thyroid carcinoma. The cDNA therefore represents the full-length transcript of the TFG gene. TFG was localized to 3q11-q12 by fluorescence in situ hybridization. The 3{prime} and the 5{prime} ends of the TFG cDNA probe hybridized to a 2.2-kb band on Northern blot filters in all tissues examined. 28 refs., 5 figs., 1 tab.« less
GFam: a platform for automatic annotation of gene families.

PubMed

Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto

2012-10-01

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Structural and genetic analysis of a mutant of Rhodobacter sphaeroides WS8 deficient in hook length control.

PubMed Central

González-Pedrajo, B; Ballado, T; Campos, A; Sockett, R E; Camarena, L; Dreyfus, G

1997-01-01

Motility in the photosynthetic bacterium Rhodobacter sphaeroides is achieved by the unidirectional rotation of a single subpolar flagellum. In this study, transposon mutagenesis was used to obtain nonmotile flagellar mutants from this bacterium. We report here the isolation and characterization of a mutant that shows a polyhook phenotype. Morphological characterization of the mutant was done by electron microscopy. Polyhooks were obtained by shearing and were used to purify the hook protein monomer (FlgE). The apparent molecular mass of the hook protein was 50 kDa. N-terminal amino acid sequencing and comparisons with the hook proteins of other flagellated bacteria indicated that the Rhodobacter hook protein has consensus sequences common to axial flagellar components. A 25-kb fragment from an R. sphaeroides WS8 cosmid library restored wild-type flagellation and motility to the mutant. Using DNA adjacent to the inserted transposon as a probe, we identified a 4.6-kb SalI restriction fragment that contained the gene responsible for the polyhook phenotype. Nucleotide sequence analysis of this region revealed an open reading frame with a deduced amino acid sequence that was 23.4% identical to that of FliK of Salmonella typhimurium, the polypeptide responsible for hook length control in that enteric bacterium. The relevance of a gene homologous to fliK in the uniflagellated bacterium R. sphaeroides is discussed. PMID:9352903
A charge-dependent mechanism is responsible for the dynamic accumulation of proteins inside nucleoli.

PubMed

Musinova, Yana R; Kananykhina, Eugenia Y; Potashnikova, Daria M; Lisitsyna, Olga M; Sheval, Eugene V

2015-01-01

The majority of known nucleolar proteins are freely exchanged between the nucleolus and the surrounding nucleoplasm. One way proteins are retained in the nucleoli is by the presence of specific amino acid sequences, namely nucleolar localization signals (NoLSs). The mechanism by which NoLSs retain proteins inside the nucleoli is still unclear. Here, we present data showing that the charge-dependent (electrostatic) interactions of NoLSs with nucleolar components lead to nucleolar accumulation as follows: (i) known NoLSs are enriched in positively charged amino acids, but the NoLS structure is highly heterogeneous, and it is not possible to identify a consensus sequence for this type of signal; (ii) in two analyzed proteins (NF-κB-inducing kinase and HIV-1 Tat), the NoLS corresponds to a region that is enriched for positively charged amino acid residues; substituting charged amino acids with non-charged ones reduced the nucleolar accumulation in proportion to the charge reduction, and nucleolar accumulation efficiency was strongly correlated with the predicted charge of the tested sequences; and (iii) sequences containing only lysine or arginine residues (which were referred to as imitative NoLSs, or iNoLSs) are accumulated in the nucleoli in a charge-dependent manner. The results of experiments with iNoLSs suggested that charge-dependent accumulation inside the nucleoli was dependent on interactions with nucleolar RNAs. The results of this work are consistent with the hypothesis that nucleolar protein accumulation by NoLSs can be determined by the electrostatic interaction of positively charged regions with nucleolar RNAs rather than by any sequence-specific mechanism. Copyright © 2014 Elsevier B.V. All rights reserved.
Crystallographic and Modeling Studies of RNase III Suggest a Mechanism for Double-Stranded RNA Cleavage | Center for Cancer Research

Cancer.gov

Background: Ribonuclease III belongs to the family of Mg2+-dependent endonucleases that show specificity for double-stranded RNA (dsRNA). RNase III is conserved in all known bacteria and eukaryotes and has 1–2 copies of a 9-residue consensus sequence, known as the RNase III signature motif. The bacterial RNase III proteins are the simplest, consisting of two domains: an
Mass Spectrometry to Identify New Biomarkers of Nerve Agent Exposure

DTIC Science & Technology

2009-04-01

covalent bond with the active site serine in the consensus sequence GXSXG of esterases and proteases. However, the site of attachment to proteins...that have no active site serine has only recently been recognized as tyrosine. In last year’s report we provided mass spectrometry evidence that...PMID: 18502412 Lockridge O, Xue W, Gaydess A, Grigoryan H, Ding SJ, Schopfer LM, Hinrichs SH, Masson P. Pseudo- esterase activity of human albumin
A novel paired domain DNA recognition motif can mediate Pax2 repression of gene transcription.

PubMed

Håvik, B; Ragnhildstveit, E; Lorens, J B; Saelemyr, K; Fauske, O; Knudsen, L K; Fjose, A

1999-12-20

The paired domain (PD) is an evolutionarily conserved DNA-binding domain encoded by the Pax gene family of developmental regulators. The Pax proteins are transcription factors and are involved in a variety of processes such as brain development, patterning of the central nervous system (CNS), and B-cell development. In this report we demonstrate that the zebrafish Pax2 PD can interact with a novel type of DNA sequences in vitro, the triple-A motif, consisting of a heptameric nucleotide sequence G/CAAACA/TC with an invariant core of three adjacent adenosines. This recognition sequence was found to be conserved in known natural Pax5 repressor elements involved in controlling the expression of the p53 and J-chain genes. By identifying similar high affinity binding sites in potential target genes of the Pax2 protein, including the pax2 gene itself, we obtained further evidence that the triple-A sites are biologically significant. The putative natural target sites also provide a basis for defining an extended consensus recognition sequence. In addition, we observed in transformation assays a direct correlation between Pax2 repressor activity and the presence of triple-A sites. The results suggest that a transcriptional regulatory function of Pax proteins can be modulated by PD binding to different categories of target sequences. Copyright 1999 Academic Press.
Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Soo-Ik; Hammes, G.G.

1989-11-01

Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less
Structure-function analysis of HKE4, a member of the new LIV-1 subfamily of zinc transporters.

PubMed Central

Taylor, Kathryn M; Morgan, Helen E; Johnson, Andrea; Nicholson, Robert I

2004-01-01

The KE4 proteins are an emerging group of proteins with little known functional data. In the present study, we report the first characterization of the recombinant human KE4 protein in mammalian cells. The KE4 sequences are included in the subfamily of ZIP (Zrt-, Irt-like Proteins) zinc transporters, which we have termed LZT (LIV-1 subfamily of ZIP zinc Transporters). All these LZT sequences contain similarities to ZIP transporters, including the consensus sequence in transmembrane domain IV, which is essential for zinc transport. However, the new LZT subfamily can be separated from other ZIP transporters by the presence of a highly conserved potential metalloprotease motif (HEXPHEXGD) in transmembrane domain V. Here we report the location of HKE4 on intracellular membranes, including the endoplasmic reticulum, and its ability to increase the intracellular free zinc as measured with the zinc-specific fluorescent dye, Newport Green, in a time-, temperature- and concentration-dependent manner. This is in contrast with the zinc influx ability of another LZT protein, LIV-1, which was due to its plasma membrane location. Therefore we have added to the functionality of LZT proteins by reporting their ability to increase intracellular-free zinc, whether they are located on the plasma membrane or on intracellular membranes. This result, in combination with the crucial role that zinc plays in cell growth, emphasizes the importance of this new LZT subfamily, including the KE4 sequences, in the control of intracellular zinc homoeostasis, aberrations of which can lead to diseases such as cancer, immunological disorders and neurological dysfunction. PMID:14525538
Acanthamoeba castellanii contains a ribosomal RNA enhancer binding protein which stimulates TIF-IB binding and transcription under stringent conditions.

PubMed

Yang, Q; Radebaugh, C A; Kubaska, W; Geiss, G K; Paule, M R

1995-11-11

The intergenic spacer (IGS) of Acanthamoeba castellanii rRNA genes contains repeated elements which are weak enhancers for transcription by RNA polymerase I. A protein, EBF, was identified and partially purified which binds to the enhancers and to several other sequences within the IGS, but not to other DNA fragments, including the rRNA core promoter. No consensus binding sequence could be discerned in these fragments and bound factor is in rapid equilibrium with unbound. EBF has functional characteristics similar to vertebrate upstream binding factors (UBF). Not only does it bind to the enhancer and other IGS elements, but it also stimulates binding of TIF-IB, the fundamental transcription initiation factor, to the core promoter and stimulates transcription from the promoter. Attempts to identify polypeptides with epitopes similar to rat or Xenopus laevis UBF suggest that structurally the protein from A.castellanii is not closely related to vertebrate UBF.
Acanthamoeba castellanii contains a ribosomal RNA enhancer binding protein which stimulates TIF-IB binding and transcription under stringent conditions.

PubMed Central

Yang, Q; Radebaugh, C A; Kubaska, W; Geiss, G K; Paule, M R

1995-01-01

The intergenic spacer (IGS) of Acanthamoeba castellanii rRNA genes contains repeated elements which are weak enhancers for transcription by RNA polymerase I. A protein, EBF, was identified and partially purified which binds to the enhancers and to several other sequences within the IGS, but not to other DNA fragments, including the rRNA core promoter. No consensus binding sequence could be discerned in these fragments and bound factor is in rapid equilibrium with unbound. EBF has functional characteristics similar to vertebrate upstream binding factors (UBF). Not only does it bind to the enhancer and other IGS elements, but it also stimulates binding of TIF-IB, the fundamental transcription initiation factor, to the core promoter and stimulates transcription from the promoter. Attempts to identify polypeptides with epitopes similar to rat or Xenopus laevis UBF suggest that structurally the protein from A.castellanii is not closely related to vertebrate UBF. Images PMID:7501455
Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

PubMed Central

Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

2014-01-01

Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website. PMID:24778629
Generation and Characterization of HIV-1 Transmitted and Founder Virus Consensus Sequence from Intravenous Drug Users in Xinjiang, China.

PubMed

Li, Fan; Ma, Liying; Feng, Yi; Hu, Jing; Ni, Na; Ruan, Yuhua; Shao, Yiming

2017-06-01

HIV-1 transmission in intravenous drug users (IDUs) has been characterized by high genetic multiplicity and suggests a greater challenge for HIV-1 infection blocking. We investigated a total of 749 sequences of full-length gp160 gene obtained by single genome sequencing (SGS) from 22 HIV-1 early infected IDUs in Xinjiang province, northwest China, and generated a transmitted and founder virus (T/F virus) consensus sequence (IDU.CON). The T/F virus was classified as subtype CRF07_BC and predicted to be CCR5-tropic virus. The variable region (V1, V2, and V4 loop) of IDU.CON showed length variation compared with the heterosexual T/F virus consensus sequence (HSX.CON) and homosexual T/F virus consensus sequence (MSM.CON). A total of 26 N-linked glycosylation sites were discovered in the IDU.CON sequence, which is less than that of MSM.CON and HSX.CON. Characterization of T/F virus from IDUs highlights the genetic make-up and complexity of virus near the moment of transmission or in early infection preceding systemic dissemination and is important toward the development of an effective HIV-1 preventive methods, including vaccines.
Isoprenylation of the plant molecular chaperone ANJ1 facilitates membrane association and function at high temperature.

PubMed

Zhu, J K; Bressan, R A; Hasegawa, P M

1993-09-15

We demonstrate that ANJ1, a higher plant homolog of the bacterial molecular chaperone DnaJ, is a substrate in vitro for protein farnesyl- and geranylgeranyl-transferase activities present in cell extracts of the plant Atriplex nummularia and yeast Saccharomyces cerevisiae. Isoprenylation did not occur when cysteine was replaced by serine in the CAQQ motif at the carboxyl terminus of ANJ1, indicating that this sequence functions as a CaaX consensus sequence for polyisoprenylation (where C is cysteine, a is an aliphatic residue, and X is any amino acid residue). Substitution of leucine for the terminal glutamine did not result in the expected geranylgeranylation as occurs with mammalian proteins containing a carboxyl-terminal leucine. Unlike the wild-type ANJ1, neither of the proteins containing these amino acid substitutions could functionally complement the yeast temperature-sensitive mutant mas5. Farnesylation enhanced the association of ANJ1 with A. nummularia microsomal membranes. Electrophoretic mobility of ANJ1 from the plant indicated that the protein is isoprenylated in vivo.

Isoprenylation of the plant molecular chaperone ANJ1 facilitates membrane association and function at high temperature.

PubMed Central

Zhu, J K; Bressan, R A; Hasegawa, P M

1993-01-01

We demonstrate that ANJ1, a higher plant homolog of the bacterial molecular chaperone DnaJ, is a substrate in vitro for protein farnesyl- and geranylgeranyl-transferase activities present in cell extracts of the plant Atriplex nummularia and yeast Saccharomyces cerevisiae. Isoprenylation did not occur when cysteine was replaced by serine in the CAQQ motif at the carboxyl terminus of ANJ1, indicating that this sequence functions as a CaaX consensus sequence for polyisoprenylation (where C is cysteine, a is an aliphatic residue, and X is any amino acid residue). Substitution of leucine for the terminal glutamine did not result in the expected geranylgeranylation as occurs with mammalian proteins containing a carboxyl-terminal leucine. Unlike the wild-type ANJ1, neither of the proteins containing these amino acid substitutions could functionally complement the yeast temperature-sensitive mutant mas5. Farnesylation enhanced the association of ANJ1 with A. nummularia microsomal membranes. Electrophoretic mobility of ANJ1 from the plant indicated that the protein is isoprenylated in vivo. Images Fig. 1 Fig. 2 Fig. 3 Fig. 5 Fig. 6 Fig. 7 PMID:8378331
The biological activity of ABA-1-like protein from Ascaris lumbricoides.

PubMed

Muto, R; Imai, S; Tezuka, H; Furuhashi, Y; Fujita, K

2001-09-01

The elevation of non-specific IgE (total IgE) in Ascaris infection can be seen one week after infection, and reaches a peak after approximately two weeks. It has been reported that ABA-1 protein is the main constituent in the pseudocoelomic fluid of Ascaris suum. To investigate the effect of the ABA-1-like protein from Ascaris lumbricoides (ALB), the cDNA was cloned by reverse transcriptase polymerase chain reaction, using original primers based on the consensus sequences of ABA-1 and TBA-1, that is an ABA-1-like protein from Toxocara canis. The clone was sequenced, we constructed the recombinant polyprotein of ALB (rALB14 and rALB7) based on the ALB sequence, and rALB was administrated to BALB/c mice. Fourteen days after inoculation with rALB14 which is the full length of ALB, the elevation of total IgE which we supposed to contain non-specific IgE was observed, and the results were as we expected. Furthermore, in an in-vitro experiment, we confirmed that the spleen cells proliferated when stimulated by rALB14 and concanavalin A. Therefore, the whole conformation of ALB is considered to be involved in the elevation of non-specific IgE, and is involved in the activation of T cells.
Coelenterazine-binding protein of Renilla muelleri: cDNA cloning, overexpression, and characterization as a substrate of luciferase.

PubMed

Titushin, Maxim S; Markova, Svetlana V; Frank, Ludmila A; Malikova, Natalia P; Stepanyuk, Galina A; Lee, John; Vysotski, Eugene S

2008-02-01

The Renilla bioluminescent system in vivo is comprised of three proteins--the luciferase, green-fluorescent protein, and coelenterazine-binding protein (CBP), previously called luciferin-binding protein (LBP). This work reports the cloning of the full-size cDNA encoding CBP from soft coral Renilla muelleri, its overexpression and properties of the recombinant protein. The apo-CBP was quantitatively converted to CBP by simple incubation with coelenterazine. The physicochemical properties of this recombinant CBP are determined to be practically the same as those reported for the CBP (LBP) of R. reniformis. CBP is a member of the four-EF-hand Ca(2+)-binding superfamily of proteins with only three of the EF-hand loops having the Ca(2+)-binding consensus sequences. There is weak sequence homology with the Ca(2+)-regulated photoproteins but only as a result of the necessary Ca(2+)-binding loop structure. In combination with Renilla luciferase, addition of only one Ca(2+) is sufficient to release the coelenterazine as a substrate for the luciferase for bioluminescence. This combination of the two proteins generates bioluminescence with higher reaction efficiency than using free coelenterazine alone as the substrate for luciferase. This increased quantum yield, a difference of bioluminescence spectra, and markedly different kinetics, implicate that a CBP-luciferase complex might be involved.
Identification of a new genotype H wild-type mumps virus strain and its molecular relatedness to other virulent and attenuated strains.

PubMed

Amexis, Georgios; Rubin, Steven; Chatterjee, Nando; Carbone, Kathryn; Chumakov, Kostantin

2003-06-01

A single clinical isolate of mumps virus designated 88-1961 was obtained from a patient hospitalized with a clinical history of upper respiratory tract infection, parotitis, severe headache, fever and lymphadenopathy. We have sequenced the full-length genome of 88-1961 and compared it against all available full-length sequences of mumps virus. Based upon its nucleotide sequence of the SH gene 88-1961 was identified as a genotype H mumps strain. The overall extent of nucleotide and amino acid differences between each individual gene and protein of 88-1961 and the full-length mumps samples showed that the missense to silent ratios were unevenly distributed. Upon evaluation of the consensus sequence of 88-1961, four positions were found to be clearly heterogeneous at the nucleotide level (NP 315C/T, NP 318C/T, F 271A/C, and HN 855C/T). Sequence analysis revealed that the amino acid sequences for the NP, M, and the L protein were the most conserved, whereas the SH protein exhibited the highest variability among the compared mumps genotypes A, B, and G. No identifying molecular patterns in the non-coding (intergenic) or coding regions of 88-1961 were found when we compared it against relatively virulent (Urabe AM9 B, Glouc1/UK96, 87-1004 and 87-1005) and non-virulent mumps strains (Jeryl Lynn and all Urabe Am9 A substrains). Copyright 2003 Wiley-Liss, Inc.
Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

PubMed

Singh, Vinod Kumar; Krishnamachari, Annangarachari

2016-09-01

Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.
An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data.

PubMed

Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E; Greenwood, Alex D

2015-11-24

Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.
An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data

PubMed Central

Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E.; Greenwood, Alex D.

2015-01-01

Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals. PMID:26610552
Age-related regulation of genes: slow homeostatic changes and age-dimension technology

NASA Astrophysics Data System (ADS)

Kurachi, Kotoku; Zhang, Kezhong; Huo, Jeffrey; Ameri, Afshin; Kuwahara, Mitsuhiro; Fontaine, Jean-Marc; Yamamoto, Kei; Kurachi, Sumiko

2002-11-01

Through systematic studies of pro- and anti-blood coagulation factors, we have determined molecular mechanisms involving two genetic elements, age-related stability element (ASE), GAGGAAG and age-related increase element (AIE), a unique stretch of dinucleotide repeats (AIE). ASE and AIE are essential for age-related patterns of stable and increased gene expression patterns, respectively. Such age-related gene regulatory mechanisms are also critical for explaining homeostasis in various physiological reactions as well as slow homeostatic changes in them. The age-related increase expression of the human factor IX (hFIX) gene requires the presence of both ASE and AIE, which apparently function additively. The anti-coagulant factor protein C (hPC) gene uses an ASE (CAGGAG) to produce age-related stable expression. Both ASE sequences (G/CAGAAG) share consensus sequence of the transcriptional factor PEA-3 element. No other similar sequences, including another PEA-3 consensus sequence, GAGGATG, function in conferring age-related gene regulation. The age-regulatory mechanisms involving ASE and AIE apparently function universally with different genes and across different animal species. These findings have led us to develop a new field of research and applications, which we named “age-dimension technology (ADT)”. ADT has exciting potential for modifying age-related expression of genes as well as associated physiological processes, and developing novel, more effective prophylaxis or treatments for age-related diseases.
Genomic structure and chromosomal localization of GML (GPI-anchored molecule-like protein), a gene induced by p53

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kimura, Yasutoshi; Furuhata, Tomohisa; Nakamura, Yusuke

1997-05-01

Among its known functions, tumor suppressor gene p53 serves as a transcriptional regulator and mediates various signals through activation of downstream genes. We recently identified a novel gene, GML (glycosylphosphatidylinositol (GPI)-anchored molecule-like protein), whose expression is specifically induced by wildtype p53. To characterize the GML gene further, we determined 35.8 kb of DNA sequence that included a consensus binding sequence for p53 and the entire GML gene. The GML gene consists of four exons, and the p53-binding sequence is present in the 5{prime}-flanking region. In genomic organization this gene resembles genes encoding murine Ly-6 glycoproteins, a human homologue of themore » Ly-6 family called RIG-E, and CD59; products of these genes, known as GPI-anchored proteins, are variously involved in signal transduction, cell-cell adhesion, and cell-matrix attachment. FISH analysis revealed that the GML gene is located on human chromosome 8q24.3. Genes encoding at least two other GPI-anchored molecules, E48 and RIG-E, are also located in this region. 20 refs., 2 figs., 1 tab.« less
Cloning, sequencing, and expression of the gene encoding amylopullulanase from Pyrococcus furiosus and biochemical characterization of the recombinant enzyme.

PubMed Central

Dong, G; Vieille, C; Zeikus, J G

1997-01-01

The gene encoding the Pyrococcus furiosus hyperthermophilic amylopullulanase (APU) was cloned, sequenced, and expressed in Escherichia coli. The gene encoded a single 827-residue polypeptide with a 26-residue signal peptide. The protein sequence had very low homology (17 to 21% identity) with other APUs and enzymes of the alpha-amylase family. In particular, none of the consensus regions present in the alpha-amylase family could be identified. P. furiosus APU showed similarity to three proteins, including the P. furiosus intracellular alpha-amylase and Dictyoglomus thermophilum alpha-amylase A. The mature protein had a molecular weight of 89,000. The recombinant P. furiosus APU remained folded after denaturation at temperatures of < or = 70 degrees C and showed an apparent molecular weight of 50,000 in sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Denaturating temperatures of above 100 degrees C were required for complete unfolding. The enzyme was extremely thermostable, with an optimal activity at 105 degrees C and pH 5.5. Ca2+ increased the enzyme activity, thermostability, and substrate affinity. The enzyme was highly resistant to chemical denaturing reagents, and its activity increased up to twofold in the presence of surfactants. PMID:9293009
Construction of an ultra-high density consensus genetic map, and enhancement of the physical map from genome sequencing in Lupinus angustifolius.

PubMed

Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan

2018-01-01

An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.
In silico methods for evaluating human allergenicity to novel proteins: International Bioinformatics Workshop Meeting Report, 23-24 February 2005.

PubMed

Thomas, Karluss; Bannon, Gary; Hefle, Susan; Herouet, Corinne; Holsapple, Michael; Ladics, Gregory; MacIntosh, Sue; Privalle, Laura

2005-12-01

The ILSI Health and Environmental Sciences Institute (HESI) hosted an expert workshop 22-24 February 2005 in Mallorca, Spain, to review the state-of-the-science for conducting a sequence homology/bioinformatics evaluation in the context of a comprehensive allergenicity assessment for novel proteins, to obtain consensus on the value and role of bioinformatics in evaluating novel proteins, and to discuss the utility and methods of allergen-specific IgE testing in the diagnosis of food allergy. The workshop participants included over forty international experts from academia, industry, and government. The workshop was hosted by the HESI Protein Allergenicity Technical committee, which has established a long-term program whose mission is to advance the scientific understanding of the relevant parameters for characterizing the allergenic potential of novel proteins.
Modulation of the multistate folding of designed TPR proteins through intrinsic and extrinsic factors

PubMed Central

Phillips, J J; Javadi, Y; Millership, C; Main, E R G

2012-01-01

Tetratricopeptide repeats (TPRs) are a class of all alpha-helical repeat proteins that are comprised of 34-aa helix-turn-helix motifs. These stack together to form nonglobular structures that are stabilized by short-range interactions from residues close in primary sequence. Unlike globular proteins, they have few, if any, long-range nonlocal stabilizing interactions. Several studies on designed TPR proteins have shown that this modular structure is reflected in their folding, that is, modular multistate folding is observed as opposed to two-state folding. Here we show that TPR multistate folding can be suppressed to approximate two-state folding through modulation of intrinsic stability or extrinsic environmental variables. This modulation was investigated by comparing the thermodynamic unfolding under differing buffer regimes of two distinct series of consensus-designed TPR proteins, which possess different intrinsic stabilities. A total of nine proteins of differing sizes and differing consensus TPR motifs were each thermally and chemically denatured and their unfolding monitored using differential scanning calorimetry (DSC) and CD/fluorescence, respectively. Analyses of both the DSC and chemical denaturation data show that reducing the total stability of each protein and repeat units leads to observable two-state unfolding. These data highlight the intimate link between global and intrinsic repeat stability that governs whether folding proceeds by an observably two-state mechanism, or whether partial unfolding yields stable intermediate structures which retain sufficient stability to be populated at equilibrium. PMID:22170589
Protection against Multiple Influenza A Virus Strains Induced by Candidate Recombinant Vaccine Based on Heterologous M2e Peptides Linked to Flagellin

PubMed Central

Kovaleva, Anna A.; Potapchuk, Marina V.; Korotkov, Alexandr V.; Sergeeva, Mariia V.; Kasianenko, Marina A.; Kuprianov, Victor V.; Ravin, Nikolai V.; Tsybalova, Liudmila M.; Skryabin, Konstantin G.; Kiselev, Oleg I.

2015-01-01

Matrix 2 protein ectodomain (M2e) is considered a promising candidate for a broadly protective influenza vaccine. M2e-based vaccines against human influenza A provide only partial protection against avian influenza viruses because of differences in the M2e sequences. In this work, we evaluated the possibility of obtaining equal protection and immune response by using recombinant protein on the basis of flagellin as a carrier of the M2e peptides of human and avian influenza A viruses. Recombinant protein was generated by the fusion of two tandem copies of consensus M2e sequence from human influenza A and two copies of M2e from avian A/H5N1 viruses to flagellin (Flg-2M2eh2M2ek). Intranasal immunisation of Balb/c mice with recombinant protein significantly elicited anti-M2e IgG in serum, IgG and sIgA in BAL. Antibodies induced by the fusion protein Flg-2M2eh2M2ek bound efficiently to synthetic peptides corresponding to the human consensus M2e sequence as well as to the M2e sequence of A/Chicken/Kurgan/05/05 RG (H5N1) and recognised native M2e epitopes exposed on the surface of the MDCK cells infected with A/PR/8/34 (H1N1) and A/Chicken/Kurgan/05/05 RG (H5N1) to an equal degree. Immunisation led to both anti-M2e IgG1 and IgG2a response with IgG1 prevalence. We observed a significant intracellular production of IL-4, but not IFN-γ, by CD4+ T-cells in spleen of mice following immunisation with Flg-2M2eh2M2ek. Immunisation with the Flg-2M2eh2M2ek fusion protein provided similar protection from lethal challenge with human influenza A viruses (H1N1, H3N2) and avian influenza virus (H5N1). Immunised mice experienced significantly less weight loss and decreased lung viral titres compared to control mice. The data obtained show the potential for the development of an M2e-flagellin candidate influenza vaccine with broad spectrum protection against influenza A viruses of various origins. PMID:25799221
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

PubMed

Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

2018-01-04

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Towards the Rational Design of a Candidate Vaccine against Pregnancy Associated Malaria: Conserved Sequences of the DBL6ε Domain of VAR2CSA

PubMed Central

Badaut, Cyril; Bertin, Gwladys; Rustico, Tatiana; Fievet, Nadine; Massougbodji, Achille; Gaye, Alioune; Deloron, Philippe

2010-01-01

Background Placental malaria is a disease linked to the sequestration of Plasmodium falciparum infected red blood cells (IRBC) in the placenta, leading to reduced materno-fetal exchanges and to local inflammation. One of the virulence factors of P. falciparum involved in cytoadherence to chondroitin sulfate A, its placental receptor, is the adhesive protein VAR2CSA. Its localisation on the surface of IRBC makes it accessible to the immune system. VAR2CSA contains six DBL domains. The DBL6ε domain is the most variable. High variability constitutes a means for the parasite to evade the host immune response. The DBL6ε domain could constitute a very attractive basis for a vaccine candidate but its reported variability necessitates, for antigenic characterisations, identifying and classifying commonalities across isolates. Methodology/Principal Findings Local alignment analysis of the DBL6ε domain had revealed that it is not as variable as previously described. Variability is concentrated in seven regions present on the surface of the DBL6ε domain. The main goal of our work is to classify and group variable sequences that will simplify further research to determine dominant epitopes. Firstly, variable sequences were grouped following their average percent pairwise identity (APPI). Groups comprising many variable sequences sharing low variability were found. Secondly, ELISA experiments following the IgG recognition of a recombinant DBL6ε domain, and of peptides mimicking its seven variable blocks, allowed to determine an APPI cut-off and to isolate groups represented by a single consensus sequence. Conclusions/Significance A new sequence approach is used to compare variable regions in sequences that have extensive segmental gene relationship. Using this approach, the VAR2CSA DBL6 domain is composed of 7 variable blocks with limited polymorphism. Each variable block is composed of a limited number of consensus types. Based on peptide based ELISA, variable blocks with 85% or greater sequence identity are expected to be recognized equally well by antibody and can be considered the same consensus type. Therefore, the analysis of the antibody response against the classified small number of sequences should be helpful to determine epitopes. PMID:20585655
Structure and stability of the ankyrin domain of the Drosophila Notch receptor.

PubMed

Zweifel, Mark E; Leahy, Daniel J; Hughson, Frederick M; Barrick, Doug

2003-11-01

The Notch receptor contains a conserved ankyrin repeat domain that is required for Notch-mediated signal transduction. The ankyrin domain of Drosophila Notch contains six ankyrin sequence repeats previously identified as closely matching the ankyrin repeat consensus sequence, and a putative seventh C-terminal sequence repeat that exhibits lower similarity to the consensus sequence. To better understand the role of the Notch ankyrin domain in Notch-mediated signaling and to examine how structure is distributed among the seven ankyrin sequence repeats, we have determined the crystal structure of this domain to 2.0 angstroms resolution. The seventh, C-terminal, ankyrin sequence repeat adopts a regular ankyrin fold, but the first, N-terminal ankyrin repeat, which contains a 15-residue insertion, appears to be largely disordered. The structure reveals a substantial interface between ankyrin polypeptides, showing a high degree of shape and charge complementarity, which may be related to homotypic interactions suggested from indirect studies. However, the Notch ankyrin domain remains largely monomeric in solution, demonstrating that this interface alone is not sufficient to promote tight association. Using the structure, we have classified reported mutations within the Notch ankyrin domain that are known to disrupt signaling into those that affect buried residues and those restricted to surface residues. We show that the buried substitutions greatly decrease protein stability, whereas the surface substitutions have only a marginal affect on stability. The surface substitutions are thus likely to interfere with Notch signaling by disrupting specific Notch-effector interactions and map the sites of these interactions.
Mutations of the central tyrosines of putative cholesterol recognition amino acid consensus (CRAC) sequences modify folding, activity, and sterol-sensing of the human ABCG2 multidrug transporter.

PubMed

Gál, Zita; Hegedüs, Csilla; Szakács, Gergely; Váradi, András; Sarkadi, Balázs; Özvegy-Laczka, Csilla

2015-02-01

Human ABCG2 is a plasma membrane glycoprotein causing multidrug resistance in cancer. Membrane cholesterol and bile acids are efficient regulators of ABCG2 function, while the molecular nature of the sterol-sensing sites has not been elucidated. The cholesterol recognition amino acid consensus (CRAC, L/V-(X)(1-5)-Y-(X)(1-5)-R/K) sequence is one of the conserved motifs involved in cholesterol binding in several proteins. We have identified five potential CRAC motifs in the transmembrane domain of the human ABCG2 protein. In order to define their roles in sterol-sensing, the central tyrosines of these CRACs (Y413, 459, 469, 570 and 645) were mutated to S or F and the mutants were expressed both in insect and mammalian cells. We found that mutation in Y459 prevented protein expression; the Y469S and Y645S mutants lost their activity; while the Y570S, Y469F, and Y645F mutants retained function as well as cholesterol and bile acid sensitivity. We found that in the case of the Y413S mutant, drug transport was efficient, while modulation of the ATPase activity by cholesterol and bile acids was significantly altered. We suggest that the Y413 residue within a putative CRAC motif has a role in sterol-sensing and the ATPase/drug transport coupling in the ABCG2 multidrug transporter. Copyright © 2014. Published by Elsevier B.V.
DNA-PK assay

DOEpatents

Anderson, Carl W.; Connelly, Margery A.

2004-10-12

The present invention provides a method for detecting DNA-activated protein kinase (DNA-PK) activity in a biological sample. The method includes contacting a biological sample with a detectably-labeled phosphate donor and a synthetic peptide substrate defined by the following features to provide specific recognition and phosphorylation by DNA-PK: (1) a phosphate-accepting amino acid pair which may include serine-glutamine (Ser-Gln) (SQ), threonine-glutamine (Thr-Gln) (TQ), glutamine-serine (Gln-Ser) (QS), or glutamine-threonine (Gln-Thr) (QT); (2) enhancer amino acids which may include glutamic acid or glutamine immediately adjacent at the amino- or carboxyl- side of the amino acid pair and forming an amino acid pair-enhancer unit; (3) a first spacer sequence at the amino terminus of the amino acid pair-enhancer unit; (4) a second spacer sequence at the carboxyl terminus of the amino acid pair-enhancer unit, which spacer sequences may include any combination of amino acids that does not provide a phosphorylation site consensus sequence motif; and, (5) a tag moiety, which may be an amino acid sequence or another chemical entity that permits separating the synthetic peptide from the phosphate donor. A compostion and a kit for the detection of DNA-PK activity are also provided. Methods for detecting DNA, protein phosphatases and substances that alter the activity of DNA-PK are also provided. The present invention also provides a method of monitoring protein kinase and DNA-PK activity in living cells. -A composition and a kit for monitoring protein kinase activity in vitro and a composition and a kit for monitoring DNA-PK activities in living cells are also provided. A method for identifying agents that alter protein kinase activity in vitro and a method for identifying agents that alter DNA-PK activity in living cells are also provided.
Characterization of the Structural Gene Promoter of Aedes aegypti Densovirus

PubMed Central

Ward, Todd W.; Kimmick, Michael W.; Afanasiev, Boris N.; Carlson, Jonathan O.

2001-01-01

Aedes aegypti densonucleosis virus (AeDNV) has two promoters that have been shown to be active by reporter gene expression analysis (B. N. Afanasiev, Y. V. Koslov, J. O. Carlson, and B. J. Beaty, Exp. Parasitol. 79:322–339, 1994). Northern blot analysis of cells infected with AeDNV revealed two transcripts 1,200 and 3,500 nucleotides in length that are assumed to express the structural protein (VP) gene and nonstructural protein genes, respectively. Primer extension was used to map the transcriptional start site of the structural protein gene. Surprisingly, the structural protein gene transcript began at an initiator consensus sequence, CAGT, 60 nucleotides upstream from the map unit 61 TATAA sequence previously thought to define the promoter. Constructs with the β-galactosidase gene fused to the structural protein gene were used to determine elements necessary for promoter function. Deletion or mutation of the initiator sequence, CAGT, reduced protein expression by 93%, whereas mutation of the TATAA sequence at map unit 61 had little effect. An additional open reading frame was observed upstream of the structural protein gene that can express β-galactosidase at a low level (20% of that of VP fusions). Expression of the AeDNV structural protein gene was shown to be stimulated by the major nonstructural protein NS1 (Afanasiev et al., Exp. parasitol., 1994). To determine the sequences required for transactivation, expression of structural protein gene–β-galactosidase gene fusion constructs differing in AeDNV genome content was measured with and without NS1. The presence of NS1 led to an 8- to 10-fold increase in expression when either genomic end was present, compared to a 2-fold increase with a construct lacking the genomic ends. An even higher (37-fold) increase in expression occurred with both genomic ends present; however, this was in part due to template replication as shown by Southern blot analysis. These data indicate the location and importance of various elements necessary for efficient protein expression and transactivation from the structural protein gene promoter of AeDNV. PMID:11152505

Enterocin T, a novel class IIa bacteriocin produced by Enterococcus sp. 812.

PubMed

Chen, Yi-Sheng; Yu, Chi-Rong; Ji, Si-Hua; Liou, Min-Shiuan; Leong, Kun-Hon; Pan, Shwu-Fen; Wu, Hui-Chung; Lin, Yu-Hsuan; Yu, Bi; Yanagida, Fujitoshi

2013-09-01

Enterococcus sp. 812, isolated from fresh broccoli, was previously found to produce a bacteriocin active against a number of Gram-positive bacteria, including Listeria monocytogenes. Bacteriocin activity decreased slightly after autoclaving (121 °C for 15 min), but was inactivated by protease K. Mass spectrometry analysis revealed the bacteriocin mass to be approximately 4,521.34 Da. N-terminal amino acid sequencing yielded a partial sequence, NH2-ATYYGNGVYXDKKKXWVEWGQA, by Edman degradation, which contained the consensus class IIa bacteriocin motif YGNGV in the N-terminal region. The obtained partial sequence showed high homology with some enterococcal bacteriocins; however, no identical peptide or protein was found. This peptide was therefore considered to be a novel bacteriocin produced by Enterococcus sp. 812 and was termed enterocin T.
In-depth proteomic analysis of a mollusc shell: acid-soluble and acid-insoluble matrix of the limpet Lottia gigantea

PubMed Central

2012-01-01

Background Invertebrate biominerals are characterized by their extraordinary functionality and physical properties, such as strength, stiffness and toughness that by far exceed those of the pure mineral component of such composites. This is attributed to the organic matrix, secreted by specialized cells, which pervades and envelops the mineral crystals. Despite the obvious importance of the protein fraction of the organic matrix, only few in-depth proteomic studies have been performed due to the lack of comprehensive protein sequence databases. The recent public release of the gastropod Lottia gigantea genome sequence and the associated protein sequence database provides for the first time the opportunity to do a state-of-the-art proteomic in-depth analysis of the organic matrix of a mollusc shell. Results Using three different sodium hypochlorite washing protocols before shell demineralization, a total of 569 proteins were identified in Lottia gigantea shell matrix. Of these, 311 were assembled in a consensus proteome comprising identifications contained in all proteomes irrespective of shell cleaning procedure. Some of these proteins were similar in amino acid sequence, amino acid composition, or domain structure to proteins identified previously in different bivalve or gastropod shells, such as BMSP, dermatopontin, nacrein, perlustrin, perlucin, or Pif. In addition there were dozens of previously uncharacterized proteins, many containing repeated short linear motifs or homorepeats. Such proteins may play a role in shell matrix construction or control of mineralization processes. Conclusions The organic matrix of Lottia gigantea shells is a complex mixture of proteins comprising possible homologs of some previously characterized mollusc shell proteins, but also many novel proteins with a possible function in biomineralization as framework building blocks or as regulatory components. We hope that this data set, the most comprehensive available at present, will provide a platform for the further exploration of biomineralization processes in molluscs. PMID:22540284
Rational identification of aggregation hotspots based on secondary structure and amino acid hydrophobicity.

PubMed

Matsui, Daisuke; Nakano, Shogo; Dadashipour, Mohammad; Asano, Yasuhisa

2017-08-25

Insolubility of proteins expressed in the Escherichia coli expression system hinders the progress of both basic and applied research. Insoluble proteins contain residues that decrease their solubility (aggregation hotspots). Mutating these hotspots to optimal amino acids is expected to improve protein solubility. To date, however, the identification of these hotspots has proven difficult. In this study, using a combination of approaches involving directed evolution and primary sequence analysis, we found two rules to help inductively identify hotspots: the α-helix rule, which focuses on the hydrophobicity of amino acids in the α-helix structure, and the hydropathy contradiction rule, which focuses on the difference in hydrophobicity relative to the corresponding amino acid in the consensus protein. By properly applying these two rules, we succeeded in improving the probability that expressed proteins would be soluble. Our methods should facilitate research on various insoluble proteins that were previously difficult to study due to their low solubility.
The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides.

PubMed

Tsirigos, Konstantinos D; Peters, Christoph; Shu, Nanjiang; Käll, Lukas; Elofsson, Arne

2015-07-01

TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Packaging signals in two single-stranded RNA viruses imply a conserved assembly mechanism and geometry of the packaged genome.

PubMed

Dykeman, Eric C; Stockley, Peter G; Twarock, Reidun

2013-09-09

The current paradigm for assembly of single-stranded RNA viruses is based on a mechanism involving non-sequence-specific packaging of genomic RNA driven by electrostatic interactions. Recent experiments, however, provide compelling evidence for sequence specificity in this process both in vitro and in vivo. The existence of multiple RNA packaging signals (PSs) within viral genomes has been proposed, which facilitates assembly by binding coat proteins in such a way that they promote the protein-protein contacts needed to build the capsid. The binding energy from these interactions enables the confinement or compaction of the genomic RNAs. Identifying the nature of such PSs is crucial for a full understanding of assembly, which is an as yet untapped potential drug target for this important class of pathogens. Here, for two related bacterial viruses, we determine the sequences and locations of their PSs using Hamiltonian paths, a concept from graph theory, in combination with bioinformatics and structural studies. Their PSs have a common secondary structure motif but distinct consensus sequences and positions within the respective genomes. Despite these differences, the distributions of PSs in both viruses imply defined conformations for the packaged RNA genomes in contact with the protein shell in the capsid, consistent with a recent asymmetric structure determination of the MS2 virion. The PS distributions identified moreover imply a preferred, evolutionarily conserved assembly pathway with respect to the RNA sequence with potentially profound implications for other single-stranded RNA viruses known to have RNA PSs, including many animal and human pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
Sequence, molecular properties, and chromosomal mapping of mouse lumican

NASA Technical Reports Server (NTRS)

Funderburgh, J. L.; Funderburgh, M. L.; Hevelone, N. D.; Stech, M. E.; Justice, M. J.; Liu, C. Y.; Kao, W. W.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

1995-01-01

PURPOSE. Lumican is a major proteoglycan of vertebrate cornea. This study characterizes mouse lumican, its molecular form, cDNA sequence, and chromosomal localization. METHODS. Lumican sequence was determined from cDNA clones selected from a mouse corneal cDNA expression library using a bovine lumican cDNA probe. Tissue expression and size of lumican mRNA were determined using Northern hybridization. Glycosidase digestion followed by Western blot analysis provided characterization of molecular properties of purified mouse corneal lumican. Chromosomal mapping of the lumican gene (Lcn) used Southern hybridization of a panel of genomic DNAs from an interspecific murine backcross. RESULTS. Mouse lumican is a 338-amino acid protein with high-sequence identity to bovine and chicken lumican proteins. The N-terminus of the lumican protein contains consensus sequences for tyrosine sulfation. A 1.9-kb lumican mRNA is present in cornea and several other tissues. Antibody against bovine lumican reacted with recombinant mouse lumican expressed in Escherichia coli and also detected high molecular weight proteoglycans in extracts of mouse cornea. Keratanase digestion of corneal proteoglycans released lumican protein, demonstrating the presence of sulfated keratan sulfate chains on mouse corneal lumican in vivo. The lumican gene (Lcn) was mapped to the distal region of mouse chromosome 10. The Lcn map site is in the region of a previously identified developmental mutant, eye blebs, affecting corneal morphology. CONCLUSIONS. This study demonstrates sulfated keratan sulfate proteoglycan in mouse cornea and describes the tools (antibodies and cDNA) necessary to investigate the functional role of this important corneal molecule using naturally occurring and induced mutants of the murine lumican gene.
Identification of three novel B-cell epitopes of VMH protein from Vibrio mimicus by screening a phage display peptide library.

PubMed

Xiao, Ning; Cao, Ji; Zhou, Hao; Ding, Shu-Quan; Kong, Ling-Yan; Li, Jin-Nian

2016-12-01

Vibrio mimicus is the causative agent of ascites disease in fish. The heat-labile hemolytic toxin designated VMH is an immunoprotective antigen of V. mimicus. However, its epitopes have not been well characterized. Here, a commercially available phage displayed 12-mer peptide library was used to screen epitopes of VMH protein using polyclonal rabbit anti-rVMH protein antibodies, and then five positive phage clones were identified by sandwich and competitive ELISA. Sequences analysis showed that the motif of DPTLL displayed on phage clone 15 and the consensus motif of SLDDDST displayed on the clone 4/11 corresponded to the residues 134-138 and 238-244 of VMH protein, respectively, and the synthetic motif peptides could also be recognized by anti-rVMH-HD antibody in peptide-ELISA. Thus, both motifs DPTLL and SLDDDST were identified as minimal linear B-cell epitopes of VMH protein. Although no similarity was found between VMH protein and the consensus motif of ADGLVPR displayed on the clone 2/6, the synthetic peptide ADGLVPR could absorb anti-rVMH-HD antibody and inhibit the antibody binding to rVMH protein in enhanced chemoluminescence Western blotting, whereas irrelevant control peptide did not affect the antibody binding with rVMH. These results revealed that the peptide ADGLVPR was a mimotope of VMH protein. Taken together, three novel B-cell epitopes of VMH protein were identified, which provide a foundation for developing epitope-based vaccine against V. mimicus infection in fish. Copyright © 2016 Elsevier B.V. All rights reserved.
Mother-to-Child HIV Transmission Bottleneck Selects for Consensus Virus with Lower Gag-Protease-Driven Replication Capacity

PubMed Central

Naidoo, Vanessa L.; Mann, Jaclyn K.; Noble, Christie; Adland, Emily; Carlson, Jonathan M.; Thomas, Jake; Brumme, Chanson J.; Thobakgale-Tshabalala, Christina F.; Brumme, Zabrina L.; Goulder, Philip J. R.

2017-01-01

ABSTRACT In the large majority of cases, HIV infection is established by a single variant, and understanding the characteristics of successfully transmitted variants is relevant to prevention strategies. Few studies have investigated the viral determinants of mother-to-child transmission. To determine the impact of Gag-protease-driven viral replication capacity on mother-to-child transmission, the replication capacities of 148 recombinant viruses encoding plasma-derived Gag-protease from 53 nontransmitter mothers, 48 transmitter mothers, and 47 infected infants were assayed in an HIV-1-inducible green fluorescent protein reporter cell line. All study participants were infected with HIV-1 subtype C. There was no significant difference in replication capacities between the nontransmitter (n = 53) and transmitter (n = 44) mothers (P = 0.48). Infant-derived Gag-protease NL4-3 recombinant viruses (n = 41) were found to have a significantly lower Gag-protease-driven replication capacity than that of viruses derived from the mothers (P < 0.0001 by a paired t test). High percent similarities to consensus subtype C Gag, p17, p24, and protease sequences were also found in the infants (n = 28) in comparison to their mothers (P = 0.07, P = 0.002, P = 0.03, and P = 0.02, respectively, as determined by a paired t test). These data suggest that of the viral quasispecies found in mothers, the HIV mother-to-child transmission bottleneck favors the transmission of consensus-like viruses with lower viral replication capacities. IMPORTANCE Understanding the characteristics of successfully transmitted HIV variants has important implications for preventative interventions. Little is known about the viral determinants of HIV mother-to-child transmission (MTCT). We addressed the role of viral replication capacity driven by Gag, a major structural protein that is a significant determinant of overall viral replicative ability and an important target of the host immune response, in the MTCT bottleneck. This study advances our understanding of the genetic bottleneck in MTCT by revealing that viruses transmitted to infants have a lower replicative ability as well as a higher similarity to the population consensus (in this case HIV subtype C) than those of their mothers. Furthermore, the observation that “consensus-like” virus sequences correspond to lower in vitro replication abilities yet appear to be preferentially transmitted suggests that viral characteristics favoring transmission are decoupled from those that enhance replicative capacity. PMID:28637761
Advanced evolutionary molecular engineering to produce thermostable cellulase by using a small but efficient library.

PubMed

Ito, Y; Ikeuchi, A; Imamura, C

2013-01-01

We aimed at constructing thermostable cellulase variants of cellobiohydrolase II, derived from the mesophilic fungus Phanerochaete chrysosporium, by using an advanced evolutionary molecular engineering method. By aligning the amino acid sequences of the catalytic domains of five thermophilic fungal CBH2 and PcCBH2 proteins, we identified 45 positions where the PcCBH2 genes differ from the consensus sequence of two to five thermophilic fungal CBH2s. PcCBH2 variants with the consensus mutations were obtained by a cell-free translation system that was chosen for easy evaluation of thermostability. From the small library of consensus mutations, advantageous mutations for improving thermostability were found to occur with much higher frequency relative to a random library. To further improve thermostability, advantageous mutations were accumulated within the wild-type gene. Finally, we obtained the most thermostable variant Mall4, which contained all 15 advantageous mutations found in this study. This variant had the same specific cellulase activity as the wild type and retained sufficient activity at 50°C for >72 h, whereas wild-type PcCBH2 retained much less activity under the same conditions. The history of the accumulation process indicated that evolution of PcCBH2 toward improved thermostability was ideally and rapidly accomplished through the evolutionary process employed in this study.
A cell death assay for assessing the mitochondrial targeting of proteins.

PubMed

Camara Teixeira, Daniel; Cordonier, Elizabeth L; Wijeratne, Subhashinee S K; Huebbe, Patricia; Jamin, Augusta; Jarecke, Sarah; Wiebe, Matthew; Zempleni, Janos

2018-06-01

The mitochondrial proteome comprises 1000 to 1500 proteins, in addition to proteins for which the mitochondrial localization is uncertain. About 800 diseases have been linked with mutations in mitochondrial proteins. We devised a cell survival assay for assessing the mitochondrial localization in a high-throughput format. This protocol allows us to assess the mitochondrial localization of proteins and their mutants, and to identify drugs and nutrients that modulate the mitochondrial targeting of proteins. The assay works equally well for proteins directed to the outer mitochondrial membrane, inner mitochondrial membrane mitochondrial and mitochondrial matrix, as demonstrated by assessing the mitochondrial targeting of the following proteins: carnitine palmitoyl transferase 1 (consensus sequence and R123C mutant), acetyl-CoA carboxylase 2, uncoupling protein 1 and holocarboxylase synthetase. Our screen may be useful for linking the mitochondrial proteome with rare diseases and for devising drug- and nutrition-based strategies for altering the mitochondrial targeting of proteins. Copyright © 2018 Elsevier Inc. All rights reserved.
A single amino acid change, Q114R, in the cleavage-site sequence of Newcastle disease virus fusion protein attenuates viral replication and pathogenicity.

PubMed

Samal, Sweety; Kumar, Sachin; Khattar, Sunil K; Samal, Siba K

2011-10-01

A key determinant of Newcastle disease virus (NDV) virulence is the amino acid sequence at the fusion (F) protein cleavage site. The NDV F protein is synthesized as an inactive precursor, F(0), and is activated by proteolytic cleavage between amino acids 116 and 117 to produce two disulfide-linked subunits, F(1) and F(2). The consensus sequence of the F protein cleavage site of virulent [(112)(R/K)-R-Q-(R/K)-R↓F-I(118)] and avirulent [(112)(G/E)-(K/R)-Q-(G/E)-R↓L-I(118)] strains contains a conserved glutamine residue at position 114. Recently, some NDV strains from Africa and Madagascar were isolated from healthy birds and have been reported to contain five basic residues (R-R-R-K-R↓F-I/V or R-R-R-R-R↓F-I/V) at the F protein cleavage site. In this study, we have evaluated the role of this conserved glutamine residue in the replication and pathogenicity of NDV by using the moderately pathogenic Beaudette C strain and by making Q114R, K115R and I118V mutants of the F protein in this strain. Our results showed that changing the glutamine to a basic arginine residue reduced viral replication and attenuated the pathogenicity of the virus in chickens. The pathogenicity was further reduced when the isoleucine at position 118 was substituted for valine.
DNA breathing dynamics distinguish binding from nonbinding consensus sites for transcription factor YY1 in cells.

PubMed

Alexandrov, Boian S; Fukuyo, Yayoi; Lange, Martin; Horikoshi, Nobuo; Gelev, Vladimir; Rasmussen, Kim Ø; Bishop, Alan R; Usheva, Anny

2012-11-01

The genome-wide mapping of the major gene expression regulators, the transcription factors (TFs) and their DNA binding sites, is of great importance for describing cellular behavior and phenotypic diversity. Presently, the methods for prediction of genomic TF binding produce a large number of false positives, most likely due to insufficient description of the physiochemical mechanisms of protein-DNA binding. Growing evidence suggests that, in the cell, the double-stranded DNA (dsDNA) is subject to local transient strands separations (breathing) that contribute to genomic functions. By using site-specific chromatin immunopecipitations, gel shifts, BIOBASE data, and our model that accurately describes the melting behavior and breathing dynamics of dsDNA we report a specific DNA breathing profile found at YY1 binding sites in cells. We find that the genomic flanking sequence variations and SNPs, may exert long-range effects on DNA dynamics and predetermine YY1 binding. The ubiquitous TF YY1 has a fundamental role in essential biological processes by activating, initiating or repressing transcription depending upon the sequence context it binds. We anticipate that consensus binding sequences together with the related DNA dynamics profile may significantly improve the accuracy of genomic TF binding sites and TF binding-related functional SNPs.
Molecular cloning of crustins from the hemocytes of Brazilian penaeid shrimps.

PubMed

Rosa, Rafael Diego; Bandeira, Paula Terra; Barracco, Margherita Anna

2007-09-01

Crustins are antimicrobial peptides initially identified in the hemocytes of the crab Carcinus maenas (11.5-kDa peptide or carcinin) and recently also recognized in penaeid shrimps and other crustacean species. The aim of this study was to identify sequences encoding for crustins from the hemocytes of four Brazilian penaeid species: Farfantepenaeus paulensis, Farfantepenaeus subtilis, Farfantepenaeus brasiliensis and Litopenaeus schmitti. Using primers based on consensus nucleotide alignment of crustins from different crustaceans, cDNA sequences coding for crustins in all indigenous penaeid species were amplified. The obtained four crustin sequences encoded for peptides containing a hydrophobic N-terminal region rich in glycine repeats and a C-terminal part with 12 cysteine residues and a conserved whey acidic protein domain. All obtained crustin sequences showed high amino acidic similarity among each other and with crustins from litopenaeid shrimps (76-98%). This is the first report of crustins in native Brazilian penaeid shrimps.
Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project.

PubMed

Aggarwal, Gautam; Worthey, E A; McDonagh, Paul D; Myler, Peter J

2003-06-07

Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.
The s29x gene of symbiotic bacteria in Amoeba proteus with a novel promoter.

PubMed

Pak, J W; Jeon, K W

1996-05-24

Gram-symbiotic bacteria (called X-bacteria), present in the xD strain of Amoeba proteus as required cell components, synthesize and export a large amount of a 29-kDa protein, S29x. S29x is exported into the host's cytoplasm across the bacterial membranes and the symbiosome membrane. The complete nucleotide (nt) sequence of the s29x gene of X-bacteria has been determined, and the promoter sequence and tsp have also been identified. The gene has a nonconventional promoter with putative nt sequences different from the known consensus sequences. When Escherichia coli cells are transformed with s29x, the gene is expressed and the product is secreted into the culture medium. Functions of S29x are not fully known, but it is suspected that S29x plays an important role in the symbiotic relationship between amoebae and X-bacteria.
Confirmation of translatability and functionality certifies the dual endothelin1/VEGFsp receptor (DEspR) protein.

PubMed

Herrera, Victoria L M; Steffen, Martin; Moran, Ann Marie; Tan, Glaiza A; Pasion, Khristine A; Rivera, Keith; Pappin, Darryl J; Ruiz-Opazo, Nelson

2016-06-14

In contrast to rat and mouse databases, the NCBI gene database lists the human dual-endothelin1/VEGFsp receptor (DEspR, formerly Dear) as a unitary transcribed pseudogene due to a stop [TGA]-codon at codon#14 in automated DNA and RNA sequences. However, re-analysis is needed given prior single gene studies detected a tryptophan [TGG]-codon#14 by manual Sanger sequencing, demonstrated DEspR translatability and functionality, and since the demonstration of actual non-translatability through expression studies, the standard-of-excellence for pseudogene designation, has not been performed. Re-analysis must meet UNIPROT criteria for demonstration of a protein's existence at the highest (protein) level, which a priori, would override DNA- or RNA-based deductions. To dissect the nucleotide sequence discrepancy, we performed Maxam-Gilbert sequencing and reviewed 727 RNA-seq entries. To comply with the highest level multiple UNIPROT criteria for determining DEspR's existence, we performed various experiments using multiple anti-DEspR monoclonal antibodies (mAbs) targeting distinct DEspR epitopes with one spanning the contested tryptophan [TGG]-codon#14, assessing: (a) DEspR protein expression, (b) predicted full-length protein size, (c) sequence-predicted protein-specific properties beyond codon#14: receptor glycosylation and internalization, (d) protein-partner interactions, and (e) DEspR functionality via DEspR-inhibition effects. Maxam-Gilbert sequencing and some RNA-seq entries demonstrate two guanines, hence a tryptophan [TGG]-codon#14 within a compression site spanning an error-prone compression sequence motif. Western blot analysis using anti-DEspR mAbs targeting distinct DEspR epitopes detect the identical glycosylated 17.5 kDa pull-down protein. Decrease in DEspR-protein size after PNGase-F digest demonstrates post-translational glycosylation, concordant with the consensus-glycosylation site beyond codon#14. Like other small single-transmembrane proteins, mass spectrometry analysis of anti-DEspR mAb pull-down proteins do not detect DEspR, but detect DEspR-protein interactions with proteins implicated in intracellular trafficking and cancer. FACS analyses also detect DEspR-protein in different human cancer stem-like cells (CSCs). DEspR-inhibition studies identify DEspR-roles in CSC survival and growth. Live cell imaging detects fluorescently-labeled anti-DEspR mAb targeted-receptor internalization, concordant with the single internalization-recognition sequence also located beyond codon#14. Data confirm translatability of DEspR, the full-length DEspR protein beyond codon#14, and elucidate DEspR-specific functionality. Along with detection of the tryptophan [TGG]-codon#14 within an error-prone compression site, cumulative data demonstrating DEspR protein existence fulfill multiple UNIPROT criteria, thus refuting its pseudogene designation.
Specific interaction of capsid protein and importin-{alpha}/{beta} influences West Nile virus production

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhuvanakantham, Raghavan; Chong, Mun-Keat; Ng, Mah-Lee, E-mail: micngml@nus.edu.sg

2009-11-06

West Nile virus (WNV) capsid (C) protein has been shown to enter the nucleus of infected cells. However, the mechanism by which C protein enters the nucleus is unknown. In this study, we have unveiled for the first time that nuclear transport of WNV and Dengue virus C protein is mediated by their direct association with importin-{alpha}. This interplay is mediated by the consensus sequences of bipartite nuclear localization signal located between amino acid residues 85-101 together with amino acid residues 42 and 43 of C protein. Elucidation of biological significance of importin-{alpha}/C protein interaction demonstrated that the binding efficiencymore » of this association influenced the nuclear entry of C protein and virus production. Collectively, this study illustrated the molecular mechanism by which the C protein of arthropod-borne flavivirus enters the nucleus and showed the importance of importin-{alpha}/C protein interaction in the context of flavivirus life-cycle.« less
Definition of a consensus DNA-binding site for PecS, a global regulator of virulence gene expression in Erwinia chrysanthemi and identification of new members of the PecS regulon.

PubMed

Rouanet, Carine; Reverchon, Sylvie; Rodionov, Dmitry A; Nasser, William

2004-07-16

In Erwinia chrysanthemi, production of pectic enzymes is modulated by a complex network involving several regulators. One of them, PecS, which belongs to the MarR family, also controls the synthesis of various other virulence factors, such as cellulases and indigoidine. Here, the PecS consensus-binding site is defined by combining a systematic evolution of ligands by an exponential enrichment approach and mutational analyses. The consensus consists of a 23-base pair palindromic-like sequence (C(-11)G(-10)A(-9)N(-8)W(-7)T(-6)C(-5)G(-4)T(-3)A(-2))T(-1)A(0)T(1)(T(2)A(3)C(4)G(5)A(6)N(7)N(8)N(9)C(10)G(11)). Mutational experiments revealed that (i) the palindromic organization is required for the binding of PecS, (ii) the very conserved part of the consensus (-6 to 6) allows for a specific interaction with PecS, but the presence of the relatively degenerated bases located apart significantly increases PecS affinity, (iii) the four bases G, A, T, and C are required for efficient binding of PecS, and (iv) the presence of several binding sites on the same promoter increases the affinity of PecS. This consensus is detected in the regions involved in PecS binding on the previously characterized target genes. This variable consensus is in agreement with the observation that the members of the MarR family are able to bind various DNA targets as dimers by means of a winged helix DNA-binding motif. Binding of PecS on a promoter region containing the defined consensus results in a repression of gene transcription in vitro. Preliminary scanning of the E. chrysanthemi genome sequence with the consensus revealed the presence of strong PecS-binding sites in the intergenic region between fliE and fliFGHIJKLMNOPQR which encode proteins involved in the biogenesis of flagellum. Accordingly, PecS directly represses fliE expression. Thus, PecS seems to control the synthesis of virulence factors required for the key steps of plant infection.
PROSPECT improves cis-acting regulatory element prediction by integrating expression profile data with consensus pattern searches

PubMed Central

Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David

2001-01-01

Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
Physical and in silico approaches identify DNA-PK in a Tax DNA-damage response interactome

PubMed Central

Ramadan, Emad; Ward, Michael; Guo, Xin; Durkin, Sarah S; Sawyer, Adam; Vilela, Marcelo; Osgood, Christopher; Pothen, Alex; Semmes, Oliver J

2008-01-01

Background We have initiated an effort to exhaustively map interactions between HTLV-1 Tax and host cellular proteins. The resulting Tax interactome will have significant utility toward defining new and understanding known activities of this important viral protein. In addition, the completion of a full Tax interactome will also help shed light upon the functional consequences of these myriad Tax activities. The physical mapping process involved the affinity isolation of Tax complexes followed by sequence identification using tandem mass spectrometry. To date we have mapped 250 cellular components within this interactome. Here we present our approach to prioritizing these interactions via an in silico culling process. Results We first constructed an in silico Tax interactome comprised of 46 literature-confirmed protein-protein interactions. This number was then reduced to four Tax-interactions suspected to play a role in DNA damage response (Rad51, TOP1, Chk2, 53BP1). The first-neighbor and second-neighbor interactions of these four proteins were assembled from available human protein interaction databases. Through an analysis of betweenness and closeness centrality measures, and numbers of interactions, we ranked proteins in the first neighborhood. When this rank list was compared to the list of physical Tax-binding proteins, DNA-PK was the highest ranked protein common to both lists. An overlapping clustering of the Tax-specific second-neighborhood protein network showed DNA-PK to be one of three bridge proteins that link multiple clusters in the DNA damage response network. Conclusion The interaction of Tax with DNA-PK represents an important biological paradigm as suggested via consensus findings in vivo and in silico. We present this methodology as an approach to discovery and as a means of validating components of a consensus Tax interactome. PMID:18922151

Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments

PubMed Central

Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R

2008-01-01

EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation. PMID:18190707
The neurotoxicant PCB-95 by increasing the neuronal transcriptional repressor REST down-regulates caspase-8 and increases Ripk1, Ripk3 and MLKL expression determining necroptotic neuronal death.

PubMed

Guida, Natascia; Laudati, Giusy; Serani, Angelo; Mascolo, Luigi; Molinaro, Pasquale; Montuori, Paolo; Di Renzo, Gianfranco; Canzoniero, Lorella M T; Formisano, Luigi

2017-10-15

Our previous study showed that the environmental neurotoxicant non-dioxin-like polychlorinated biphenyl (PCB)-95 increases RE1-silencing transcription factor (REST) expression, which is related to necrosis, but not apoptosis, of neurons. Meanwhile, necroptosis is a type of a programmed necrosis that is positively regulated by receptor interacting protein kinase 1 (RIPK1), RIPK3 and mixed lineage kinase domain-like (MLKL) and negatively regulated by caspase-8. Here we evaluated whether necroptosis contributes to PCB-95-induced neuronal death through REST up-regulation. Our results demonstrated that in cortical neurons PCB-95 increased RIPK1, RIPK3, and MLKL expression and decreased caspase-8 at the gene and protein level. Furthermore, the RIPK1 inhibitor necrostatin-1 or siRNA-mediated RIPK1, RIPK3 and MLKL expression knockdown significantly reduced PCB-95-induced neuronal death. Intriguingly, PCB-95-induced increases in RIPK1, RIPK3, MLKL expression and decreases in caspase-8 expression were reversed by knockdown of REST expression with a REST-specific siRNA (siREST). Notably, in silico analysis of the rat genome identified a REST consensus sequence in the caspase-8 gene promoter (Casp8-RE1), but not the RIPK1, RIPK3 and MLKL promoters. Interestingly, in PCB-95-treated neurons, REST binding to the Casp8-RE1 sequence increased in parallel with a reduction in its promoter activity, whereas under the same experimental conditions, transfection of siREST or mutation of the Casp8-RE1 sequence blocked PCB-95-induced caspase-8 reduction. Since RIPK1, RIPK3 and MLKL rat genes showed no putative REST binding site, we assessed whether the transcription factor cAMP Responsive Element Binding Protein (CREB), which has a consensus sequence in all three genes, affected neuronal death. In neurons treated with PCB-95, CREB protein expression decreased in parallel with a reduction in binding to the RIPK1, RIPK3 and MLKL gene promoter sequence. Furthermore, CREB overexpression was associated with reduced promoter activity of the RIPK1, RIPK3 and MLKL genes. Collectively, these results indicate that PCB-95 was associated with REST-induced necroptotic cell death by increasing RIPK1, RIPK3 and MLKL expression and reducing caspase-8 levels. In addition, since REST is involved in several neurological disorders, therapies that block REST-induced necroptosis could be a new strategy to revert the neurodetrimental effects associated to its overexpression. Copyright © 2017 Elsevier Inc. All rights reserved.
ΔN-P63α and TA-P63α exhibit intrinsic differences in transactivation specificities that depend on distinct features of DNA target sites

PubMed Central

Foggetti, Giorgia; Raimondi, Ivan; Campomenosi, Paola; Menichini, Paola

2014-01-01

TP63 is a member of the TP53 gene family that encodes for up to ten different TA and ΔN isoforms through alternative promoter usage and alternative splicing. Besides being a master regulator of gene expression for squamous epithelial proliferation, differentiation and maintenance, P63, through differential expression of its isoforms, plays important roles in tumorigenesis. All P63 isoforms share an immunoglobulin-like folded DNA binding domain responsible for binding to sequence-specific response elements (REs), whose overall consensus sequence is similar to that of the canonical p53 RE. Using a defined assay in yeast, where P63 isoforms and RE sequences are the only variables, and gene expression assays in human cell lines, we demonstrated that human TA- and ΔN-P63α proteins exhibited differences in transactivation specificity not observed with the corresponding P73 or P53 protein isoforms. These differences 1) were dependent on specific features of the RE sequence, 2) could be related to intrinsic differences in their oligomeric state and cooperative DNA binding, and 3) appeared to be conserved in evolution. Since genotoxic stress can change relative ratio of TA- and ΔN-P63α protein levels, the different transactivation specificity of each P63 isoform could potentially influence cellular responses to specific stresses. PMID:24926492
Adjacent DNA sequences modulate Sox9 transcriptional activation at paired Sox sites in three chondrocyte-specific enhancer elements

PubMed Central

Bridgewater, Laura C.; Walker, Marlan D.; Miller, Gwen C.; Ellison, Trevor A.; Holsinger, L. Daniel; Potter, Jennifer L.; Jackson, Todd L.; Chen, Reuben K.; Winkel, Vicki L.; Zhang, Zhaoping; McKinney, Sandra; de Crombrugghe, Benoit

2003-01-01

Expression of the type XI collagen gene Col11a2 is directed to cartilage by at least three chondrocyte-specific enhancer elements, two in the 5′ region and one in the first intron of the gene. The three enhancers each contain two heptameric sites with homology to the Sox protein-binding consensus sequence. The two sites are separated by 3 or 4 bp and arranged in opposite orientation to each other. Targeted mutational analyses of these three enhancers showed that in the intronic enhancer, as in the other two enhancers, both Sox sites in a pair are essential for enhancer activity. The transcription factor Sox9 binds as a dimer at the paired sites, and the introduction of insertion mutations between the sites demonstrated that physical interactions between the adjacently bound proteins are essential for enhancer activity. Additional mutational analyses demonstrated that although Sox9 binding at the paired Sox sites is necessary for enhancer activity, it alone is not sufficient. Adjacent DNA sequences in each enhancer are also required, and mutation of those sequences can eliminate enhancer activity without preventing Sox9 binding. The data suggest a new model in which adjacently bound proteins affect the DNA bend angle produced by Sox9, which in turn determines whether an active transcriptional enhancer complex is assembled. PMID:12595563
Identification of New Single Nucleotide Polymorphism-Based Markers for Inter- and Intraspecies Discrimination of Obligate Bacterial Parasites (Pasteuria spp.) of Invertebrates ▿ †

PubMed Central

Mauchline, Tim H.; Knox, Rachel; Mohan, Sharad; Powers, Stephen J.; Kerry, Brian R.; Davies, Keith G.; Hirsch, Penny R.

2011-01-01

Protein-encoding and 16S rRNA genes of Pasteuria penetrans populations from a wide range of geographic locations were examined. Most interpopulation single nucleotide polymorphisms (SNPs) were detected in the 16S rRNA gene. However, in order to fully resolve all populations, these were supplemented with SNPs from protein-encoding genes in a multilocus SNP typing approach. Examination of individual 16S rRNA gene sequences revealed the occurrence of “cryptic” SNPs which were not present in the consensus sequences of any P. penetrans population. Additionally, hierarchical cluster analysis separated P. penetrans 16S rRNA gene clones into four groups, and one of which contained sequences from the most highly passaged population, demonstrating that it is possible to manipulate the population structure of this fastidious bacterium. The other groups were made from representatives of the other populations in various proportions. Comparison of sequences among three Pasteuria species, namely, P. penetrans, P. hartismeri, and P. ramosa, showed that the protein-encoding genes provided greater discrimination than the 16S rRNA gene. From these findings, we have developed a toolbox for the discrimination of Pasteuria at both the inter- and intraspecies levels. We also provide a model to monitor genetic variation in other obligate hyperparasites and difficult-to-culture microorganisms. PMID:21803895
Identification of new single nucleotide polymorphism-based markers for inter- and intraspecies discrimination of obligate bacterial parasites (Pasteuria spp.) of invertebrates.

PubMed

Mauchline, Tim H; Knox, Rachel; Mohan, Sharad; Powers, Stephen J; Kerry, Brian R; Davies, Keith G; Hirsch, Penny R

2011-09-01

Protein-encoding and 16S rRNA genes of Pasteuria penetrans populations from a wide range of geographic locations were examined. Most interpopulation single nucleotide polymorphisms (SNPs) were detected in the 16S rRNA gene. However, in order to fully resolve all populations, these were supplemented with SNPs from protein-encoding genes in a multilocus SNP typing approach. Examination of individual 16S rRNA gene sequences revealed the occurrence of "cryptic" SNPs which were not present in the consensus sequences of any P. penetrans population. Additionally, hierarchical cluster analysis separated P. penetrans 16S rRNA gene clones into four groups, and one of which contained sequences from the most highly passaged population, demonstrating that it is possible to manipulate the population structure of this fastidious bacterium. The other groups were made from representatives of the other populations in various proportions. Comparison of sequences among three Pasteuria species, namely, P. penetrans, P. hartismeri, and P. ramosa, showed that the protein-encoding genes provided greater discrimination than the 16S rRNA gene. From these findings, we have developed a toolbox for the discrimination of Pasteuria at both the inter- and intraspecies levels. We also provide a model to monitor genetic variation in other obligate hyperparasites and difficult-to-culture microorganisms.
Acylation-dependent protein export in Leishmania.

PubMed

Denny, P W; Gokool, S; Russell, D G; Field, M C; Smith, D F

2000-04-14

The surface of the protozoan parasite Leishmania is unusual in that it consists predominantly of glycosylphosphatidylinositol-anchored glycoconjugates and proteins. Additionally, a family of hydrophilic acylated surface proteins (HASPs) has been localized to the extracellular face of the plasma membrane in infective parasite stages. These surface polypeptides lack a recognizable endoplasmic reticulum secretory signal sequence, transmembrane spanning domain, or glycosylphosphatidylinositol-anchor consensus sequence, indicating that novel mechanisms are involved in their transport and localization. Here, we show that the N-terminal domain of HASPB contains primary structural information that directs both N-myristoylation and palmitoylation and is essential for correct localization of the protein to the plasma membrane. Furthermore, the N-terminal 18 amino acids of HASPB, encoding the dual acylation site, are sufficient to target the heterologous Aequorea victoria green fluorescent protein to the cell surface of Leishmania. Mutagenesis of the predicted acylated residues confirms that modification by both myristate and palmitate is required for correct trafficking. These data suggest that HASPB is a representative of a novel class of proteins whose translocation onto the surface of eukaryotic cells is dependent upon a "non-classical" pathway involving N-myristoylation/palmitoylation. Significantly, HASPB is also translocated on to the extracellular face of the plasma membrane of transfected mammalian cells, indicating that the export signal for HASPB is recognized by a higher eukaryotic export mechanism.
New glycoproteomics software, GlycoPep Evaluator, generates decoy glycopeptides de novo and enables accurate false discovery rate analysis for small data sets.

PubMed

Zhu, Zhikai; Su, Xiaomeng; Go, Eden P; Desaire, Heather

2014-09-16

Glycoproteins are biologically significant large molecules that participate in numerous cellular activities. In order to obtain site-specific protein glycosylation information, intact glycopeptides, with the glycan attached to the peptide sequence, are characterized by tandem mass spectrometry (MS/MS) methods such as collision-induced dissociation (CID) and electron transfer dissociation (ETD). While several emerging automated tools are developed, no consensus is present in the field about the best way to determine the reliability of the tools and/or provide the false discovery rate (FDR). A common approach to calculate FDRs for glycopeptide analysis, adopted from the target-decoy strategy in proteomics, employs a decoy database that is created based on the target protein sequence database. Nonetheless, this approach is not optimal in measuring the confidence of N-linked glycopeptide matches, because the glycopeptide data set is considerably smaller compared to that of peptides, and the requirement of a consensus sequence for N-glycosylation further limits the number of possible decoy glycopeptides tested in a database search. To address the need to accurately determine FDRs for automated glycopeptide assignments, we developed GlycoPep Evaluator (GPE), a tool that helps to measure FDRs in identifying glycopeptides without using a decoy database. GPE generates decoy glycopeptides de novo for every target glycopeptide, in a 1:20 target-to-decoy ratio. The decoys, along with target glycopeptides, are scored against the ETD data, from which FDRs can be calculated accurately based on the number of decoy matches and the ratio of the number of targets to decoys, for small data sets. GPE is freely accessible for download and can work with any search engine that interprets ETD data of N-linked glycopeptides. The software is provided at https://desairegroup.ku.edu/research.
Specific Inhibition of the transcription factor Ci by a Cobalt(III)-Schiff base-DNA conjugate

PubMed Central

Hurtado, Ryan R.; Harney, Allison S.; Heffern, Marie C.; Holbrook, Robert J.; Holmgren, Robert A.; Meade, Thomas J.

2012-01-01

We describe the use of Co(III) Schiff base-DNA conjugates, a versatile class of research tools that target C2H2 transcription factors, to inhibit the Hedgehog (Hh) pathway. In developing mammalian embryos, Hh signaling is critical for the formation and development of many tissues and organs. Inappropriate activation of the Hedgehog (Hh) pathway has been implicated in a variety of cancers including medulloblastomas and basal cell carcinomas. It is well known that Hh regulates the activity of the Gli family of C2H2 zinc finger transcription factors in mammals. In Drosophila the function of the Gli proteins is performed by a single transcription factor with an identical DNA binding consensus sequence, Cubitus Interruptus (Ci). We have demonstrated previously that conjugation of a specific 17 base-pair oligonucleotide to a Co(III) Schiff base complex results in a targeted inhibitor of the Snail family C2H2 zinc finger transcription factors. Modification of the oligonucleotide sequence in the Co(III) Schiff base-DNA conjugate to that of Ci’s consensus sequence (Co(III)-Ci) generates an equally selective inhibitor of Ci. Co(III)-Ci irreversibly binds the Ci zinc finger domain and prevents it from binding DNA in vitro. In a Ci responsive tissue culture reporter gene assay, Co(III)-Ci reduces the transcriptional activity of Ci in a concentration dependent manner. In addition, injection of wild-type Drosophila embryos with Co(III)-Ci phenocopies a Ci loss of function phenotype, demonstrating effectiveness in vivo. This study provides evidence that Co(III) Schiff base-DNA conjugates are a versatile class of specific and potent tools for studying zinc finger domain proteins and have potential applications as customizable anti-cancer therapeutics. PMID:22214326
Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.

PubMed

You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing

2013-01-01

Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.
Human Adenovirus Core Protein V Is Targeted by the Host SUMOylation Machinery To Limit Essential Viral Functions.

PubMed

Freudenberger, Nora; Meyer, Tina; Groitl, Peter; Dobner, Thomas; Schreiner, Sabrina

2018-02-15

Human adenoviruses (HAdV) are nonenveloped viruses containing a linear, double-stranded DNA genome surrounded by an icosahedral capsid. To allow proper viral replication, the genome is imported through the nuclear pore complex associated with viral core proteins. Until now, the role of these incoming virion proteins during the early phase of infection was poorly understood. The core protein V is speculated to bridge the core and the surrounding capsid. It binds the genome in a sequence-independent manner and localizes in the nucleus of infected cells, accumulating at nucleoli. Here, we show that protein V contains conserved SUMO conjugation motifs (SCMs). Mutation of these consensus motifs resulted in reduced SUMOylation of the protein; thus, protein V represents a novel target of the host SUMOylation machinery. To understand the role of protein V SUMO posttranslational modification during productive HAdV infection, we generated a replication-competent HAdV with SCM mutations within the protein V coding sequence. Phenotypic analyses revealed that these SCM mutations are beneficial for adenoviral replication. Blocking protein V SUMOylation at specific sites shifts the onset of viral DNA replication to earlier time points during infection and promotes viral gene expression. Simultaneously, the altered kinetics within the viral life cycle are accompanied by more efficient proteasomal degradation of host determinants and increased virus progeny production than that observed during wild-type infection. Taken together, our studies show that protein V SUMOylation reduces virus growth; hence, protein V SUMOylation represents an important novel aspect of the host antiviral strategy to limit virus replication and thereby points to potential intervention strategies. IMPORTANCE Many decades of research have revealed that HAdV structural proteins promote viral entry and mainly physical stability of the viral genome in the capsid. Our work over the last years showed that this concept needs expansion as the functions are more diverse. We showed that capsid protein VI regulates the antiviral response by modulation of the transcription factor Daxx during infection. Moreover, core protein VII interacts with SPOC1 restriction factor, which is beneficial for efficient viral gene expression. Here, we were able to show that core protein V also represents a novel substrate of the host SUMOylation machinery and contains several conserved SCMs; mutation of these consensus motifs reduced SUMOylation of the protein. Unexpectedly, we observed that introducing these mutations into HAdV promotes adenoviral replication. In conclusion, we offer novel insights into adenovirus core proteins and provide evidence that SUMOylation of HAdV factors regulates replication efficiency. Copyright © 2018 American Society for Microbiology.
Identification of a GTP-binding protein. cap alpha. subunit that lacks an apparent ADP-ribosylation site for pertussis toxin

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fong, H.K.W.; Yoshimoto, K.K.; Eversole-Cire, P.

1988-05-01

Recent molecular cloning of cDNA for the ..cap alpha.. subunit of bovine transducin (a guanine nucleotide-binding regulatory protein, or G protein) has revealed the presence of two retinal-specific transducins, called T/sub r/ and T/sub c/, which are expressed in rod or cone photoreceptor cells. In a further study of G-protein diversity and signal transduction in the retina, the authors have identified a G-protein ..cap alpha.. subunit, which they refer to as G/sub z/..cap alpha.., by isolating a human retinal cDNA clone that cross-hybridizes at reduced stringency with bovine T/sub r/ ..cap alpha..-subunit cDNA. The deduced amino acid sequence of G/submore » z/..cap alpha.. is 41-67% identical with those of other known G-protein ..cap alpha.. subunits. However, the 355-residue G/sub z/..cap alpha.. lacks a consensus site for ADP-ribosylation by pertussis toxin, and its amino acid sequence varies within a number of regions that are strongly conserved among all of the other G-protein ..cap alpha.. subunits. They suggest that G/sub z/..cap alpha.., which appears to be highly expressed in neural tissues, represents a member of a subfamily of G proteins that mediate signal transduction in pertussis toxin-insensitive systems.« less
Molecular cloning of a murine homologue of membrane cofactor protein (CD46): preferential expression in testicular germ cells.

PubMed Central

Tsujimura, A; Shida, K; Kitamura, M; Nomura, M; Takeda, J; Tanaka, H; Matsumoto, M; Matsumiya, K; Okuyama, A; Nishimune, Y; Okabe, M; Seya, T

1998-01-01

Human membrane cofactor protein (MCP, CD46) has been suggested, although no convincing evidence has been proposed, to be a fertilization-associated protein, in addition to its primary functions as a complement regulator and a measles virus receptor. We have cloned a cDNA encoding the murine homologue of MCP. This cDNA showed 45% identity in deduced protein sequence and 62% identity in nucleotide sequence with human MCP. Its ectodomains were four short consensus repeats and a serine/threonine-rich domain, and it appeared to be a type 1 membrane protein with a 23-amino acid transmembrane domain and a short cytoplasmic tail. The protein expressed on Chinese hamster ovary cell transfectants was 47 kDa on SDS/PAGE immunoblotting, approximately 6 kDa larger than the murine testis MCP. It served as a cofactor for factor I-mediated inactivation of the complement protein C3b in a homologous system and, to a lesser extent, in a human system. Strikingly, the major message of murine MCP was 1.5 kb and was expressed predominantly in the testis. It was not detected in mice defective in spermatogenesis or with immature germ cells (until 23 days old). Thus, murine MCP may be a sperm-dominant protein the message of which is expressed selectively in spermatids during germ-cell differentiation. PMID:9461505
Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.

PubMed

Sanford, Jeremy R; Wang, Xin; Mort, Matthew; Vanduyn, Natalia; Cooper, David N; Mooney, Sean D; Edenberg, Howard J; Liu, Yunlong

2009-03-01

Metazoan genes are encrypted with at least two superimposed codes: the genetic code to specify the primary structure of proteins and the splicing code to expand their proteomic output via alternative splicing. Here, we define the specificity of a central regulator of pre-mRNA splicing, the conserved, essential splicing factor SFRS1. Cross-linking immunoprecipitation and high-throughput sequencing (CLIP-seq) identified 23,632 binding sites for SFRS1 in the transcriptome of cultured human embryonic kidney cells. SFRS1 was found to engage many different classes of functionally distinct transcripts including mRNA, miRNA, snoRNAs, ncRNAs, and conserved intergenic transcripts of unknown function. The majority of these diverse transcripts share a purine-rich consensus motif corresponding to the canonical SFRS1 binding site. The consensus site was not only enriched in exons cross-linked to SFRS1 in vivo, but was also enriched in close proximity to splice sites. mRNAs encoding RNA processing factors were significantly overrepresented, suggesting that SFRS1 may broadly influence the post-transcriptional control of gene expression in vivo. Finally, a search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt predicted SFRS1 binding sites. This comprehensive analysis substantially expands the known roles of human SR proteins in the regulation of a diverse array of RNA transcripts.
Cloning and Characterization of an Outer Membrane Protein of Vibrio vulnificus Required for Heme Utilization: Regulation of Expression and Determination of the Gene Sequence

PubMed Central

Litwin, Christine M.; Byrne, Burke L.

1998-01-01

Vibrio vulnificus is a halophilic, marine pathogen that has been associated with septicemia and serious wound infections in patients with iron overload and preexisting liver disease. For V. vulnificus, the ability to acquire iron from the host has been shown to correlate with virulence. V. vulnificus is able to use host iron sources such as hemoglobin and heme. We previously constructed a fur mutant of V. vulnificus which constitutively expresses at least two iron-regulated outer membrane proteins, of 72 and 77 kDa. The N-terminal amino acid sequence of the 77-kDa protein purified from the V. vulnificus fur mutant had 67% homology with the first 15 amino acids of the mature protein of the Vibrio cholerae heme receptor, HutA. In this report, we describe the cloning, DNA sequence, mutagenesis, and analysis of transcriptional regulation of the structural gene for HupA, the heme receptor of V. vulnificus. DNA sequencing of hupA demonstrated a single open reading frame of 712 amino acids that was 50% identical and 66% similar to the sequence of V. cholerae HutA and similar to those of other TonB-dependent outer membrane receptors. Primer extension analysis localized one promoter for the V. vulnificus hupA gene. Analysis of the promoter region of V. vulnificus hupA showed a sequence homologous to the consensus Fur box. Northern blot analysis showed that the transcript was strongly regulated by iron. An internal deletion in the V. vulnificus hupA gene, done by using marker exchange, resulted in the loss of expression of the 77-kDa protein and the loss of the ability to use hemin or hemoglobin as a source of iron. The hupA deletion mutant of V. vulnificus will be helpful in future studies of the role of heme iron in V. vulnificus pathogenesis. PMID:9632577
The GENCODE exome: sequencing the complete human exome

PubMed Central

Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno

2011-01-01

Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695
Bph32, a novel gene encoding an unknown SCR domain-containing protein, confers resistance against the brown planthopper in rice.

PubMed

Ren, Juansheng; Gao, Fangyuan; Wu, Xianting; Lu, Xianjun; Zeng, Lihua; Lv, Jianqun; Su, Xiangwen; Luo, Hong; Ren, Guangjun

2016-11-23

An urgent need exists to identify more brown planthopper (Nilaparvata lugens Stål, BPH) resistance genes, which will allow the development of rice varieties with resistance to BPH to counteract the increased incidence of this pest species. Here, using bioinformatics and DNA sequencing approaches, we identified a novel BPH resistance gene, LOC_Os06g03240 (MSU LOCUS ID), from the rice variety Ptb33 in the interval between the markers RM19291 and RM8072 on the short arm of chromosome 6, where a gene for resistance to BPH was mapped by Jirapong Jairin et al. and renamed as "Bph32". This gene encodes a unique short consensus repeat (SCR) domain protein. Sequence comparison revealed that the Bph32 gene shares 100% sequence identity with its allele in Oryza latifolia. The transgenic introgression of Bph32 into a susceptible rice variety significantly improved resistance to BPH. Expression analysis revealed that Bph32 was highly expressed in the leaf sheaths, where BPH primarily settles and feeds, at 2 and 24 h after BPH infestation, suggesting that Bph32 may inhibit feeding in BPH. Western blotting revealed the presence of Pph (Ptb33) and Tph (TN1) proteins using a Penta-His antibody, and both proteins were insoluble. This study provides information regarding a valuable gene for rice defence against insect pests.
Bph32, a novel gene encoding an unknown SCR domain-containing protein, confers resistance against the brown planthopper in rice

PubMed Central

Ren, Juansheng; Gao, Fangyuan; Wu, Xianting; Lu, Xianjun; Zeng, Lihua; Lv, Jianqun; Su, Xiangwen; Luo, Hong; Ren, Guangjun

2016-01-01

An urgent need exists to identify more brown planthopper (Nilaparvata lugens Stål, BPH) resistance genes, which will allow the development of rice varieties with resistance to BPH to counteract the increased incidence of this pest species. Here, using bioinformatics and DNA sequencing approaches, we identified a novel BPH resistance gene, LOC_Os06g03240 (MSU LOCUS ID), from the rice variety Ptb33 in the interval between the markers RM19291 and RM8072 on the short arm of chromosome 6, where a gene for resistance to BPH was mapped by Jirapong Jairin et al. and renamed as “Bph32”. This gene encodes a unique short consensus repeat (SCR) domain protein. Sequence comparison revealed that the Bph32 gene shares 100% sequence identity with its allele in Oryza latifolia. The transgenic introgression of Bph32 into a susceptible rice variety significantly improved resistance to BPH. Expression analysis revealed that Bph32 was highly expressed in the leaf sheaths, where BPH primarily settles and feeds, at 2 and 24 h after BPH infestation, suggesting that Bph32 may inhibit feeding in BPH. Western blotting revealed the presence of Pph (Ptb33) and Tph (TN1) proteins using a Penta-His antibody, and both proteins were insoluble. This study provides information regarding a valuable gene for rice defence against insect pests. PMID:27876888
DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

PubMed Central

Palzkill, T G; Oliver, S G; Newlon, C S

1986-01-01

Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036
ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites.

PubMed

Li, Li; Crabtree, Jonathan; Fischer, Steve; Pinney, Deborah; Stoeckert, Christian J; Sibley, L David; Roos, David S

2004-01-01

ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences.

Vaccination potential of B and T epitope-enriched NP and M2 against Influenza A viruses from different clades and hosts

PubMed Central

Esmagambetov, Ilias; Bagaev, Alexander; Pichugin, Alexey; Lysenko, Andrey; Shcherbinin, Dmitry; Sedova, Elena; Logunov, Denis; Shmarov, Maxim; Ataullakhanov, Ravshan; Naroditsky, Boris; Gintsburg, Alexander

2018-01-01

To avoid outbreaks of influenza virus epidemics and pandemics among human populations, modern medicine requires the development of new universal vaccines that are able to provide protection from a wide range of influenza A virus strains. In the course of development of a universal vaccine, it is necessary to consider that immunity must be generated even against viruses from different hosts because new human epidemic virus strains have their origins in viruses of birds and other animals. We have enriched conserved viral proteins–nucleoprotein (NP) and matrix protein 2 (M2)—by B and T-cell epitopes not only human origin but also swine and avian origin. For this purpose, we analyzed M2 and NP sequences with respect to changes in the sequences of known T and B-cell epitopes and chose conserved and evolutionarily significant epitopes. Eventually, we found consensus sequences of M2 and NP that have the maximum quantity of epitopes that are 100% coincident with them. Consensus epitope-enriched amino acid sequences of M2 and NP proteins were included in a recombinant adenoviral vector. Immunization with Ad5-tet-M2NP induced strong CD8 and CD4 T cells responses, specific to each of the encoded antigens, i.e. M2 and NP. Eight months after immunization with Ad5-tet-M2NP, high numbers of M2- and NP-responding “effector memory” CD44posCD62neg T cells were found in the mouse spleens, which revealed a long-term T cell immune memory conferred by the immunization. In all, the challenge experiments showed an extraordinarily wide-ranging efficacy of protection by the Ad5-tet-M2NP vaccine, covering 5 different heterosubtypes of influenza A virus (2 human, 2 avian and 1 swine). PMID:29377916
InterEvDock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs.

PubMed

Quignot, Chloé; Rey, Julien; Yu, Jinchao; Tufféry, Pierre; Guerois, Raphaël; Andreani, Jessica

2018-05-08

Computational protein docking is a powerful strategy to predict structures of protein-protein interactions and provides crucial insights for the functional characterization of macromolecular cross-talks. We previously developed InterEvDock, a server for ab initio protein docking based on rigid-body sampling followed by consensus scoring using physics-based and statistical potentials, including the InterEvScore function specifically developed to incorporate co-evolutionary information in docking. InterEvDock2 is a major evolution of InterEvDock which allows users to submit input sequences - not only structures - and multimeric inputs and to specify constraints for the pairwise docking process based on previous knowledge about the interaction. For this purpose, we added modules in InterEvDock2 for automatic template search and comparative modeling of the input proteins. The InterEvDock2 pipeline was benchmarked on 812 complexes for which unbound homology models of the two partners and co-evolutionary information are available in the PPI4DOCK database. InterEvDock2 identified a correct model among the top 10 consensus in 29% of these cases (compared to 15-24% for individual scoring functions) and at least one correct interface residue among 10 predicted in 91% of these cases. InterEvDock2 is thus a unique protein docking server, designed to be useful for the experimental biology community. The InterEvDock2 web interface is available at http://bioserv.rpbs.univ-paris-diderot.fr/services/InterEvDock2/.
Structure, replication efficiency and fragility of yeast ARS elements.

PubMed

Dhar, Manoj K; Sehgal, Shelly; Kaul, Sanjana

2012-05-01

DNA replication in eukaryotes initiates at specific sites known as origins of replication, or replicators. These replication origins occur throughout the genome, though the propensity of their occurrence depends on the type of organism. In eukaryotes, zones of initiation of replication spanning from about 100 to 50,000 base pairs have been reported. The characteristics of eukaryotic replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where some autonomously replicating sequences, or ARS elements, confer origin activity. ARS elements are short DNA sequences of a few hundred base pairs, identified by their efficiency at initiating a replication event when cloned in a plasmid. ARS elements, although structurally diverse, maintain a basic structure composed of three domains, A, B and C. Domain A is comprised of a consensus sequence designated ACS (ARS consensus sequence), while the B domain has the DNA unwinding element and the C domain is important for DNA-protein interactions. Although there are ∼400 ARS elements in the yeast genome, not all of them are active origins of replication. Different groups within the genus Saccharomyces have ARS elements as components of replication origin. The present paper provides a comprehensive review of various aspects of ARSs, starting from their structural conservation to sequence thermodynamics. All significant and conserved functional sequence motifs within different types of ARS elements have been extensively described. Issues like silencing at ARSs, their inherent fragility and factors governing their replication efficiency have also been addressed. Progress in understanding crucial components associated with the replication machinery and timing at these ARS elements is discussed in the section entitled "The replicon revisited". Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
New approaches to high-throughput structure characterization of SH3 complexes: the example of Myosin-3 and Myosin-5 SH3 domains from S. cerevisiae.

PubMed

Musi, Valeria; Birdsall, Berry; Fernandez-Ballester, Gregorio; Guerrini, Remo; Salvatori, Severo; Serrano, Luis; Pastore, Annalisa

2006-04-01

SH3 domains are small protein modules that are involved in protein-protein interactions in several essential metabolic pathways. The availability of the complete genome and the limited number of clearly identifiable SH3 domains make the yeast Saccharomyces cerevisae an ideal proteomic-based model system to investigate the structural rules dictating the SH3-mediated protein interactions and to develop new tools to assist these studies. In the present work, we have determined the solution structure of the SH3 domain from Myo3 and modeled by homology that of the highly homologous Myo5, two myosins implicated in actin polymerization. We have then implemented an integrated approach that makes use of experimental and computational methods to characterize their binding properties. While accommodating their targets in the classical groove, the two domains have selectivity in both orientation and sequence specificity of the target peptides. From our study, we propose a consensus sequence that may provide a useful guideline to identify new natural partners and suggest a strategy of more general applicability that may be of use in other structural proteomic studies.
An A257V Mutation in the Bacillus subtilis Response Regulator Spo0A Prevents Regulated Expression of Promoters with Low-Consensus Binding Sites▿

PubMed Central

Seredick, Steve D.; Seredick, Barbara M.; Baker, David; Spiegelman, George B.

2009-01-01

In Bacillus species, the master regulator of sporulation is Spo0A. Spo0A functions by both activating and repressing transcription initiation from target promoters that contain 0A boxes, the binding sites for Spo0A. Several classes of spo0A mutants have been isolated, and the molecular basis for their phenotypes has been determined. However, the molecular basis of the Spo0A(A257V) substitution, representative of an unusual phenotypic class, is not understood. Spo0A(A257V) is unusual in that it abolishes sporulation; in vivo, it fails to activate transcription from key stage II promoters yet retains the ability to repress the abrB promoter. To determine how Spo0A(A257V) retains the ability to repress but not stimulate transcription, we performed a series of in vitro and in vivo assays. We found unexpectedly that the mutant protein both stimulated transcription from the spoIIG promoter and repressed transcription from the abrB promoter, albeit twofold less than the wild type. A DNA binding analysis of Spo0A(A257V) showed that the mutant protein was less able to tolerate alterations in the sequence and arrangement of its DNA binding sites than the wild-type protein. In addition, we found that Spo0A(A257V) could stimulate transcription of a mutant spoIIG promoter in vivo in which low-consensus binding sites were replaced by high-consensus binding sites. We conclude that Spo0A(A257V) is able to bind to and regulate the expression of only genes whose promoters contain high-consensus binding sites and that this effect is sufficient to explain the observed sporulation defect. PMID:19581368
Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana.

PubMed

Liu, Yanan; Wang, Baoju; Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia

2016-01-01

The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model.
Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana

PubMed Central

Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia

2016-01-01

The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model. PMID:27806133
Mapping and Sequencing of the Canine NRAMP1 Gene and Identification of Mutations in Leishmaniasis-Susceptible Dogs

PubMed Central

Altet, Laura; Francino, Olga; Solano-Gallego, Laia; Renier, Corinne; Sánchez, Armand

2002-01-01

The NRAMP1 gene (Slc11a1) encodes an ion transporter protein involved in the control of intraphagosomal replication of parasites and in macrophage activation. It has been described in mice as the determinant of natural resistance or susceptibility to infection with antigenically unrelated pathogens, including Leishmania. Our aims were to sequence and map the canine Slc11a1 gene and to identify mutations that may be associated with resistance or susceptibility to Leishmania infection. The canine Slc11a1 gene has been mapped to dog chromosome CFA37 and covers 9 kb, including a 700-bp promoter region, 15 exons, and a polymorphic microsatellite in intron 1. It encodes a 547-amino-acid protein that has over 87% identity with the Slc11a1 proteins of different mammalian species. A case-control study with 33 resistant and 84 susceptible dogs showed an association between allele 145 of the microsatellite and susceptible dogs. Sequence variant analysis was performed by direct sequencing of the cDNA and the promoter region of four unrelated beagles experimentally infected with Leishmania infantum to search for possible functional mutations. Two of the dogs were classified as susceptible and the other two were classified as resistant based on their immune responses. Two important mutations were found in susceptible dogs: a G-rich region in the promoter that was common to both animals and a complete deletion of exon 11, which encodes the consensus transport motif of the protein, in the unique susceptible dog that needed an additional and prolonged treatment to avoid continuous relapses. A study with a larger dog population would be required to prove the association of these sequence variants with disease susceptibility. PMID:12010961
Benchmark analysis of native and artificial NAD+-dependent enzymes generated by a sequence based design method with or without phylogenetic data.

PubMed

Nakano, Shogo; Motoyama, Tomoharu; Miyashita, Yurina; Ishizuka, Yuki; Matsuo, Naoya; Tokiwa, Hiroaki; Shinoda, Suguru; Asano, Yasuhisa; Ito, Sohei

2018-05-22

The expansion of protein sequence databases has enabled us to design artificial proteins by sequence-based design methods, such as full consensus design (FCD) and ancestral sequence reconstruction (ASR). Artificial proteins with enhanced activity levels compared with native ones can potentially be generated by such methods, but successful design is rare because preparing a sequence library by curating the database and selecting a method is difficult. Utilizing a curated library prepared by reducing conservation energies, we successfully designed two artificial L-threonine 3-dehydrogenase (SDR-TDH) with higher activity levels than native SDR-TDH, FcTDH-N1 and AncTDH, using FCD and ASR, respectively. The artificial SDR-TDHs had excellent thermal stability and NAD+ recognition compared to native SDR-TDH from Cupriavidus necator (CnTDH): the melting temperatures of FcTDH-N1 and AncTDH were about 10 and 5°C higher than CnTDH, respectively, and the dissociation constants toward NAD+ of FcTDH-N1 and AncTDH were two- and seven-fold lower than that of CnTDH, respectively. Enzymatic efficiency of the artificial SDR-TDHs were comparable to that of CnTDH. Crystal structures of FcTDH-N1 and AncTDH were determined at 2.8 and 2.1 Å resolution, respectively. Structural and MD simulation analysis of the SDR-TDHs indicated that only the flexibility at specific regions was changed, suggesting that multiple mutations introduced in the artificial SDR-TDHs altered their flexibility and thereby affected their enzymatic properties. Benchmark analysis of the SDR-TDHs indicated that both FCD and ASR can generate highly functional proteins if a curated library is prepared appropriately.
Complete genome sequences of avian paramyxovirus type 8 strains goose/Delaware/1053/76 and pintail/Wakuya/20/78

PubMed Central

Paldurai, Anandan; Subbiah, Madhuri; Kumar, Sachin; Collins, Peter L.; Samal, Siba K.

2009-01-01

Complete consensus genome sequences were determined for avian paramyxovirus type 8 (APMV-8) strains goose/Delaware/1053/76 (prototype strain) and pintail/Wakuya/20/78. The genome of each strain is 15,342 nucleotides (nt) long, which follows the “rule of six”. The genome consists of six genes in the order of 3′-N-P/V/W-M-F-HN-L-5′. The genes are flanked on either side by conserved transcription start and stop signals, and have intergenic regions ranging from 1 to 30 nt. The genome contains a 55 nt leader region at the 3′-end and a 171 nt trailer region at the 5′-end. Comparison of sequences of strains Delaware and Wakuya showed nucleotide identity of 96.8% at the genome level and amino acid identities of 99.3%, 96.5%, 98.6%, 99.4%, 98.6% and 99.1% for the predicted N, P, M, F, HN and L proteins, respectively. Both strains grew in embryonated chicken eggs and in primary chicken embryo kidney cells, and 293T cells. Both strains contained only a single basic residue at the cleavage activation site of the F protein and their efficiency of replication in vitro depended on and was augmented by, the presence of exogenous protease in most cell lines. Sequence alignment and phylogenic analysis of the predicted amino acid sequence of APMV-8 strain Delaware proteins with the cognate proteins of other available APMV serotypes showed that APMV-8 is more closely related to APMV-2 and -6 than to APMV-1, -3 and -4. PMID:19341613
Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim

PubMed Central

2010-01-01

Background Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum. Results cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9). Conclusion A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species. PMID:20141623
Identification of Common Epitopes on a Conserved Region of NSs Proteins Among Tospoviruses of Watermelon silver mottle virus Serogroup.

PubMed

Chen, Tsung-Chi; Huang, Ching-Wen; Kuo, Yan-Wen; Liu, Fang-Lin; Yuan, Chao-Hsiu Hsuan; Hsu, Hei-Ti; Yeh, Shyi-Dong

2006-12-01

ABSTRACT The NSs protein of Watermelon silver mottle virus (WSMoV) was expressed by a Zucchini yellow mosaic virus (ZYMV) vector in squash. The expressed NSs protein with a histidine tag and an additional NIa protease cleavage sequence was isolated by Ni(2+)-NTA resins as a free-form protein and further eluted after sodium dodecyl sulfate-polyacrylamide gel electrophoresis for production of rabbit antiserum and mouse monoclonal antibodies (MAbs). The rabbit antiserum strongly reacted with the NSs crude antigen of WSMoV and weakly reacted with that of a high-temperature-recovered gloxinia isolate (HT-1) of Capsicum chlorosis virus (CaCV), but not with that of Calla lily chlorotic spot virus (CCSV). In contrast, the MAbs reacted strongly with all crude NSs antigens of WSMoV, CaCV, and CCSV. Various deletions of the NSs open reading frame were constructed and expressed by ZYMV vector. Results indicate that all three MAbs target the 89- to 125-amino-acid (aa) region of WSMoV NSs protein. Two indispensable residues of cysteine and lysine were essential for MAbs recognition. Sequence comparison of the deduced MAbs-recognized region with the reported tospoviral NSs proteins revealed the presence of a consensus sequence VRKPGVKNTGCKFTMHNQIFNPN (denoted WNSscon), at the 98- to 120-aa position of NSs proteins, sharing 86 to 100% identities among those of WSMoV, CaCV, CCSV, and Peanut bud necrosis virus. A synthetic WNSscon peptide reacted with the MAbs and verified that the epitopes are present in the 98- to 120-aa region of WSMoV NSs protein. The WSMoV sero-group-specific NSs MAbs provide a means for reliable identification of tospoviruses in this large serogroup.
VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data

PubMed Central

Jia, Peilin; Zhao, Zhongming

2014-01-01

A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data. PMID:24516372
VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

PubMed

Jia, Peilin; Zhao, Zhongming

2014-02-01

A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.
Common and distinct DNA-binding and regulatory activities of the BEN-solo transcription factor family.

PubMed

Dai, Qi; Ren, Aiming; Westholm, Jakub O; Duan, Hong; Patel, Dinshaw J; Lai, Eric C

2015-01-01

Recently, the BEN (BANP, E5R, and NAC1) domain was recognized as a new class of conserved DNA-binding domain. The fly genome encodes three proteins that bear only a single BEN domain ("BEN-solo" factors); namely, Insensitive (Insv), Bsg25A (Elba1), and CG9883 (Elba2). Insv homodimers preferentially bind CCAATTGG palindromes throughout the genome to mediate transcriptional repression, whereas Bsg25A and Elba2 heterotrimerize with their obligate adaptor, Elba3 (i.e., the ELBA complex), to recognize a CCAATAAG motif in the Fab-7 insulator. While these data suggest distinct DNA-binding properties of BEN-solo proteins, we performed reporter assays that indicate that both Bsg25A and Elba2 can individually recognize Insv consensus sites efficiently. We confirmed this by solving the structure of Bsg25A complexed to the Insv site, which showed that key aspects of the BEN:DNA recognition strategy are similar between these proteins. We next show that both Insv and ELBA proteins are competent to mediate transcriptional repression via Insv consensus sequences but that the ELBA complex appears to be selective for the ELBA site. Reciprocally, genome-wide analysis reveals that Insv exhibits significant cobinding to class I insulator elements, indicating that it may also contribute to insulator function. Indeed, we observed abundant Insv binding within the Hox complexes with substantial overlaps with class I insulators, many of which bear Insv consensus sites. Moreover, Insv coimmunoprecipitates with the class I insulator factor CP190. Finally, we observed that Insv harbors exclusive activity among fly BEN-solo factors with respect to regulation of Notch-mediated cell fate choices in the peripheral nervous system. This in vivo activity is recapitulated by BEND6, a mammalian BEN-solo factor that conserves the Notch corepressor function of Insv but not its capacity to bind Insv consensus sites. Altogether, our data define an array of common and distinct biochemical and functional properties of this new family of transcription factors. © 2015 Dai et al.; Published by Cold Spring Harbor Laboratory Press.
Common and distinct DNA-binding and regulatory activities of the BEN-solo transcription factor family

PubMed Central

Dai, Qi; Ren, Aiming; Westholm, Jakub O.; Duan, Hong; Patel, Dinshaw J.

2015-01-01

Recently, the BEN (BANP, E5R, and NAC1) domain was recognized as a new class of conserved DNA-binding domain. The fly genome encodes three proteins that bear only a single BEN domain (“BEN-solo” factors); namely, Insensitive (Insv), Bsg25A (Elba1), and CG9883 (Elba2). Insv homodimers preferentially bind CCAATTGG palindromes throughout the genome to mediate transcriptional repression, whereas Bsg25A and Elba2 heterotrimerize with their obligate adaptor, Elba3 (i.e., the ELBA complex), to recognize a CCAATAAG motif in the Fab-7 insulator. While these data suggest distinct DNA-binding properties of BEN-solo proteins, we performed reporter assays that indicate that both Bsg25A and Elba2 can individually recognize Insv consensus sites efficiently. We confirmed this by solving the structure of Bsg25A complexed to the Insv site, which showed that key aspects of the BEN:DNA recognition strategy are similar between these proteins. We next show that both Insv and ELBA proteins are competent to mediate transcriptional repression via Insv consensus sequences but that the ELBA complex appears to be selective for the ELBA site. Reciprocally, genome-wide analysis reveals that Insv exhibits significant cobinding to class I insulator elements, indicating that it may also contribute to insulator function. Indeed, we observed abundant Insv binding within the Hox complexes with substantial overlaps with class I insulators, many of which bear Insv consensus sites. Moreover, Insv coimmunoprecipitates with the class I insulator factor CP190. Finally, we observed that Insv harbors exclusive activity among fly BEN-solo factors with respect to regulation of Notch-mediated cell fate choices in the peripheral nervous system. This in vivo activity is recapitulated by BEND6, a mammalian BEN-solo factor that conserves the Notch corepressor function of Insv but not its capacity to bind Insv consensus sites. Altogether, our data define an array of common and distinct biochemical and functional properties of this new family of transcription factors. PMID:25561495
Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

PubMed

Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

2011-06-01

Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.
Screening for Protein-DNA Interactions by Automatable DNA-Protein Interaction ELISA

PubMed Central

Schüssler, Axel; Kolukisaoglu, H. Üner; Koch, Grit; Wallmeroth, Niklas; Hecker, Andreas; Thurow, Kerstin; Zell, Andreas; Harter, Klaus; Wanke, Dierk

2013-01-01

DNA-binding proteins (DBPs), such as transcription factors, constitute about 10% of the protein-coding genes in eukaryotic genomes and play pivotal roles in the regulation of chromatin structure and gene expression by binding to short stretches of DNA. Despite their number and importance, only for a minor portion of DBPs the binding sequence had been disclosed. Methods that allow the de novo identification of DNA-binding motifs of known DBPs, such as protein binding microarray technology or SELEX, are not yet suited for high-throughput and automation. To close this gap, we report an automatable DNA-protein-interaction (DPI)-ELISA screen of an optimized double-stranded DNA (dsDNA) probe library that allows the high-throughput identification of hexanucleotide DNA-binding motifs. In contrast to other methods, this DPI-ELISA screen can be performed manually or with standard laboratory automation. Furthermore, output evaluation does not require extensive computational analysis to derive a binding consensus. We could show that the DPI-ELISA screen disclosed the full spectrum of binding preferences for a given DBP. As an example, AtWRKY11 was used to demonstrate that the automated DPI-ELISA screen revealed the entire range of in vitro binding preferences. In addition, protein extracts of AtbZIP63 and the DNA-binding domain of AtWRKY33 were analyzed, which led to a refinement of their known DNA-binding consensi. Finally, we performed a DPI-ELISA screen to disclose the DNA-binding consensus of a yet uncharacterized putative DBP, AtTIFY1. A palindromic TGATCA-consensus was uncovered and we could show that the GATC-core is compulsory for AtTIFY1 binding. This specific interaction between AtTIFY1 and its DNA-binding motif was confirmed by in vivo plant one-hybrid assays in protoplasts. Thus, the value and applicability of the DPI-ELISA screen for de novo binding site identification of DBPs, also under automatized conditions, is a promising approach for a deeper understanding of gene regulation in any organism of choice. PMID:24146751
Improved Thermostability of Clostridium thermocellum Endoglucanase Cel8A by Using Consensus-Guided Mutagenesis

PubMed Central

Anbar, Michael; Gul, Ozgur; Lamed, Raphael; Sezerman, Ugur O.

2012-01-01

The use of thermostable cellulases is advantageous for the breakdown of lignocellulosic biomass toward the commercial production of biofuels. Previously, we have demonstrated the engineering of an enhanced thermostable family 8 cellulosomal endoglucanase (EC 3.2.1.4), Cel8A, from Clostridium thermocellum, using random error-prone PCR and a combination of three beneficial mutations, dominated by an intriguing serine-to-glycine substitution (M. Anbar, R. Lamed, E. A. Bayer, ChemCatChem 2:997–1003, 2010). In the present study, we used a bioinformatics-based approach involving sequence alignment of homologous family 8 glycoside hydrolases to create a library of consensus mutations in which residues of the catalytic module are replaced at specific positions with the most prevalent amino acids in the family. One of the mutants (G283P) displayed a higher thermal stability than the wild-type enzyme. Introducing this mutation into the previously engineered Cel8A triple mutant resulted in an optimized enzyme, increasing the half-life of activity by 14-fold at 85°C. Remarkably, no loss of catalytic activity was observed compared to that of the wild-type endoglucanase. The structural changes were simulated by molecular dynamics analysis, and specific regions were identified that contributed to the observed thermostability. Intriguingly, most of the proteins used for sequence alignment in determining the consensus residues were derived from mesophilic bacteria, with optimal temperatures well below that of C. thermocellum Cel8A. PMID:22389377
ESTuber db: an online database for Tuber borchii EST sequences.

PubMed

Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo

2007-03-08

The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.

Identification of amino acid substitutions with compensational effects in the attachment protein of canine distemper virus.

PubMed

Sattler, Ursula; Khosravi, Mojtaba; Avila, Mislay; Pilo, Paola; Langedijk, Johannes P; Ader-Ebert, Nadine; Alves, Lisa A; Plattet, Philippe; Origgi, Francesco C

2014-07-01

The hemagglutinin (H) gene of canine distemper virus (CDV) encodes the receptor-binding protein. This protein, together with the fusion (F) protein, is pivotal for infectivity since it contributes to the fusion of the viral envelope with the host cell membrane. Of the two receptors currently known for CDV (nectin-4 and the signaling lymphocyte activation molecule [SLAM]), SLAM is considered the most relevant for host susceptibility. To investigate how evolution might have impacted the host-CDV interaction, we examined the functional properties of a series of missense single nucleotide polymorphisms (SNPs) naturally accumulating within the H-gene sequences during the transition between two distinct but related strains. The two strains, a wild-type strain and a consensus strain, were part of a single continental outbreak in European wildlife and occurred in distinct geographical areas 2 years apart. The deduced amino acid sequence of the two H genes differed at 5 residues. A panel of mutants carrying all the combinations of the SNPs was obtained by site-directed mutagenesis. The selected mutant, wild type, and consensus H proteins were functionally evaluated according to their surface expression, SLAM binding, fusion protein interaction, and cell fusion efficiencies. The results highlight that the most detrimental functional effects are associated with specific sets of SNPs. Strikingly, an efficient compensational system driven by additional SNPs appears to come into play, virtually neutralizing the negative functional effects. This system seems to contribute to the maintenance of the tightly regulated function of the H-gene-encoded attachment protein. Importance: To investigate how evolution might have impacted the host-canine distemper virus (CDV) interaction, we examined the functional properties of naturally occurring single nucleotide polymorphisms (SNPs) in the hemagglutinin gene of two related but distinct strains of CDV. The hemagglutinin gene encodes the attachment protein, which is pivotal for infection. Our results show that few SNPs have a relevant detrimental impact and they generally appear in specific combinations (molecular signatures). These drastic negative changes are neutralized by compensatory mutations, which contribute to maintenance of an overall constant bioactivity of the attachment protein. This compensational mechanism might reflect the reaction of the CDV machinery to the changes occurring in the virus following antigenic variations critical for virulence. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Hormonal regulation of metamorphosis and reproduction in ticks.

PubMed

Roe, R Michael; Donohue, Kevin V; Khalil, Sayed M S; Sonenshine, Daniel E

2008-05-01

The presence of a "status quo" hormone like JH has not been found in ticks. The most advanced understanding of tick endocrinology is associated with female reproduction, where the sequence of the first messages for storage proteins (vitellogenin (Vg) and carrier protein), the Vg receptor, and male peptidic pheromones were recently reported. The current consensus model suggests that ecdysteroids from the epidermis regulated by a putative peptidic ecdysiotrophic hormone from the synganlion initiates the expression of the Vg messages in fat body and midgut. Vg protein, secreted into the hemolymph, requires an ovary Vg receptor to be absorbed by oocytes. Male pheromones transferred into the female genital tract during mating initiate blood feeding to repletion and vitellogenesis. The work so far on tick endocrinology is limited by the paucity of identified hormones and the small number of studies on a few tick models. The role of storage proteins in the evolution of hematophagy is discussed.
RADH, a gene of Saccharomyces cerevisiae encoding a putative DNA helicase involved in DNA repair. Characteristics of radH mutants and sequence of the gene.

PubMed

Aboussekhra, A; Chanet, R; Zgaga, Z; Cassier-Chauvat, C; Heude, M; Fabre, F

1989-09-25

A new type of radiation-sensitive mutant of S. cerevisiae is described. The recessive radH mutation sensitizes to the lethal effect of UV radiations haploids in the G1 but not in the G2 mitotic phase. Homozygous diploids are as sensitive as G1 haploids. The UV-induced mutagenesis is depressed, while the induction of gene conversion is increased. The mutation is believed to channel the repair of lesions engaged in the mutagenic pathway into a recombination process, successful if the events involve sister-chromatids but lethal if they involve homologous chromosomes. The sequence of the RADH gene reveals that it may code for a DNA helicase, with a Mr of 134 kDa. All the consensus domains of known DNA helicases are present. Besides these consensus regions, strong homologies with the Rep and UvrD helicases of E. coli were found. The RadH putative helicase appears to belong to the set of proteins involved in the error-prone repair mechanism, at least for UV-induced lesions, and could act in coordination with the Rev3 error-prone DNA polymerase.
Automated Sanger Analysis Pipeline (ASAP): A Tool for Rapidly Analyzing Sanger Sequencing Data with Minimum User Interference.

PubMed

Singh, Aditya; Bhatia, Prateek

2016-12-01

Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.
Soft hydrogel materials from elastomeric gluten-mimetic proteins

NASA Astrophysics Data System (ADS)

Bagheri, Mehran; Scott, Shane; Wan, Fan; Dick, Scott; Harden, James; Biomolecular Assemblies Team

2014-03-01

Elastomeric proteins are ubiquitous in both animal and plant tissues, where they are responsible for the elastic response and mechanical resilience of tissues. In addition to fundamental interest in the molecular origins of their elastic behaviour, this class of proteins has great potential for use in biomaterial applications. The structural and elastomeric properties of these proteins are thought to be controlled by a subtle balance between hydrophobic interactions and entropic effects, and in many cases their characteristic properties can be recapitulated by multi-block protein polymers formed from repeats of short, characteristic polypeptide motifs. We have developed biomimetic multi-block protein polymers based on variants of several elastomeric gluten consensus sequences. These proteins include constituents designed to maximize their solubility in aqueous solution and minimize the formation of extended secondary structure. Thus, they are examples of elastic intrinsically disordered proteins. In addition, the proteins have distributed tyrosine residues which allow for inter-molecular crosslinking to form hydrogel networks. In this talk, we present experimental and simulation studies of the molecular and materials properties of these proteins and their assemblies.
BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

PubMed

Kaur, Harpreet; Raghava, G P S

2002-03-01

beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/
Pro-Inflammatory Flagellin Proteins of Prevalent Motile Commensal Bacteria Are Variably Abundant in the Intestinal Microbiome of Elderly Humans

PubMed Central

Neville, B. Anne; Sheridan, Paul O.; Harris, Hugh M. B.; Coughlan, Simone; Flint, Harry J.; Duncan, Sylvia H.; Jeffery, Ian B.; Claesson, Marcus J.; Ross, R. Paul; Scott, Karen P.; O'Toole, Paul W.

2013-01-01

Some Eubacterium and Roseburia species are among the most prevalent motile bacteria present in the intestinal microbiota of healthy adults. These flagellate species contribute “cell motility” category genes to the intestinal microbiome and flagellin proteins to the intestinal proteome. We reviewed and revised the annotation of motility genes in the genomes of six Eubacterium and Roseburia species that occur in the human intestinal microbiota and examined their respective locus organization by comparative genomics. Motility gene order was generally conserved across these loci. Five of these species harbored multiple genes for predicted flagellins. Flagellin proteins were isolated from R. inulinivorans strain A2-194 and from E. rectale strains A1-86 and M104/1. The amino-termini sequences of the R. inulinivorans and E. rectale A1-86 proteins were almost identical. These protein preparations stimulated secretion of interleukin-8 (IL-8) from human intestinal epithelial cell lines, suggesting that these flagellins were pro-inflammatory. Flagellins from the other four species were predicted to be pro-inflammatory on the basis of alignment to the consensus sequence of pro-inflammatory flagellins from the β- and γ- proteobacteria. Many fliC genes were deduced to be under the control of σ28. The relative abundance of the target Eubacterium and Roseburia species varied across shotgun metagenomes from 27 elderly individuals. Genes involved in the flagellum biogenesis pathways of these species were variably abundant in these metagenomes, suggesting that the current depth of coverage used for metagenomic sequencing (3.13–4.79 Gb total sequence in our study) insufficiently captures the functional diversity of genomes present at low (≤1%) relative abundance. E. rectale and R. inulinivorans thus appear to synthesize complex flagella composed of flagellin proteins that stimulate IL-8 production. A greater depth of sequencing, improved evenness of sequencing and improved metagenome assembly from short reads will be required to facilitate in silico analyses of complete complex biochemical pathways for low-abundance target species from shotgun metagenomes. PMID:23935906
Genomic Heat Shock Element Sequences Drive Cooperative Human Heat Shock Factor 1 DNA Binding and Selectivity*

PubMed Central

Jaeger, Alex M.; Makley, Leah N.; Gestwicki, Jason E.; Thiele, Dennis J.

2014-01-01

The heat shock transcription factor 1 (HSF1) activates expression of a variety of genes involved in cell survival, including protein chaperones, the protein degradation machinery, anti-apoptotic proteins, and transcription factors. Although HSF1 activation has been linked to amelioration of neurodegenerative disease, cancer cells exhibit a dependence on HSF1 for survival. Indeed, HSF1 drives a program of gene expression in cancer cells that is distinct from that activated in response to proteotoxic stress, and HSF1 DNA binding activity is elevated in cycling cells as compared with arrested cells. Active HSF1 homotrimerizes and binds to a DNA sequence consisting of inverted repeats of the pentameric sequence nGAAn, known as heat shock elements (HSEs). Recent comprehensive ChIP-seq experiments demonstrated that the architecture of HSEs is very diverse in the human genome, with deviations from the consensus sequence in the spacing, orientation, and extent of HSE repeats that could influence HSF1 DNA binding efficacy and the kinetics and magnitude of target gene expression. To understand the mechanisms that dictate binding specificity, HSF1 was purified as either a monomer or trimer and used to evaluate DNA-binding site preferences in vitro using fluorescence polarization and thermal denaturation profiling. These results were compared with quantitative chromatin immunoprecipitation assays in vivo. We demonstrate a role for specific orientations of extended HSE sequences in driving preferential HSF1 DNA binding to target loci in vivo. These studies provide a biochemical basis for understanding differential HSF1 target gene recognition and transcription in neurodegenerative disease and in cancer. PMID:25204655
Recognition of Local DNA Structures by p53 Protein

PubMed Central

Brázda, Václav; Coufal, Jan

2017-01-01

p53 plays critical roles in regulating cell cycle, apoptosis, senescence and metabolism and is commonly mutated in human cancer. These roles are achieved by interaction with other proteins, but particularly by interaction with DNA. As a transcription factor, p53 is well known to bind consensus target sequences in linear B-DNA. Recent findings indicate that p53 binds with higher affinity to target sequences that form cruciform DNA structure. Moreover, p53 binds very tightly to non-B DNA structures and local DNA structures are increasingly recognized to influence the activity of wild-type and mutant p53. Apart from cruciform structures, p53 binds to quadruplex DNA, triplex DNA, DNA loops, bulged DNA and hemicatenane DNA. In this review, we describe local DNA structures and summarize information about interactions of p53 with these structural DNA motifs. These recent data provide important insights into the complexity of the p53 pathway and the functional consequences of wild-type and mutant p53 activation in normal and tumor cells. PMID:28208646
Cloning and expression of a nuclear encoded plastid specific 33 kDa ribonucleoprotein gene (33RNP) from pea that is light stimulated.

PubMed

Reddy, M K; Nair, S; Singh, B N; Mudgil, Y; Tewari, K K; Sopory, S K

2001-01-24

We report the cloning and sequencing of both cDNA and genomic DNA of a 33 kDa chloroplast ribonucleoprotein (33RNP) from pea. The analysis of the predicted amino acid sequence of the cDNA clone revealed that the encoded protein contains two RNA binding domains, including the conserved consensus ribonucleoprotein sequences CS-RNP1 and CS-RNP2, on the C-terminus half and the presence of a putative transit peptide sequence in the N-terminus region. The phylogenetic and multiple sequence alignment analysis of pea chloroplast RNP along with RNPs reported from the other plant sources revealed that the pea 33RNP is very closely related to Nicotiana sylvestris 31RNP and 28RNP and also to 31RNP and 28RNP of Arabidopsis and spinach, respectively. The pea 33RNP was expressed in Escherichia coli and purified to homogeneity. The in vitro import of precursor protein into chloroplasts confirmed that the N-terminus putative transit peptide is a bona fide transit peptide and 33RNP is localized in the chloroplast. The nucleic acid-binding properties of the recombinant protein, as revealed by South-Western analysis, showed that 33RNP has higher binding affinity for poly (U) and oligo dT than for ssDNA and dsDNA. The steady state transcript level was higher in leaves than in roots and the expression of this gene is light stimulated. Sequence analysis of the genomic clone revealed that the gene contains four exons and three introns. We have also isolated and analyzed the 5' flanking region of the pea 33RNP gene.
First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

PubMed

Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

2015-10-01

Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Consensus Prediction of Charged Single Alpha-Helices with CSAHserver.

PubMed

Dudola, Dániel; Tóth, Gábor; Nyitray, László; Gáspári, Zoltán

2017-01-01

Charged single alpha-helices (CSAHs) constitute a rare structural motif. CSAH is characterized by a high density of regularly alternating residues with positively and negatively charged side chains. Such segments exhibit unique structural properties; however, there are only a handful of proteins where its existence is experimentally verified. Therefore, establishing a pipeline that is capable of predicting the presence of CSAH segments with a low false positive rate is of considerable importance. Here we describe a consensus-based approach that relies on two conceptually different CSAH detection methods and a final filter based on the estimated helix-forming capabilities of the segments. This pipeline was shown to be capable of identifying previously uncharacterized CSAH segments that could be verified experimentally. The method is available as a web server at http://csahserver.itk.ppke.hu and also a downloadable standalone program suitable to scan larger sequence collections.
Unusually weak oxygen binding, physical properties, partial sequence, autoxidation rate and a potential phosphorylation site of beluga whale (Delphinapterus leucas) myoglobin.

PubMed

Stewart, J M; Blakely, J A; Karpowicz, P A; Kalanxhi, E; Thatcher, B J; Martin, B M

2004-03-01

We purified myoglobin from beluga whale (Delphinapterus leucas) muscle (longissimus dorsi) with size exclusion and cation exchange chromatographies. The molecular mass was determined by mass spectrometry (17,081 Da) and the isoelectric pH (9.4) by capillary isoelectric focusing. The near-complete amino acid sequence was determined and a phylogeny indicated that beluga was in the same clad as Dall's and harbor porpoises. There were consensus motifs for a phosphorylation site on the protein surface with the most likely site at serine-117. This motif was common to all cetacean myoglobins examined. Two oxygen-binding studies at 37 degrees C indicated dissociation constants (20.5 and 23.6 microM) 5.7-6.6 times larger than horse myoglobin (3.6 microM). The autoxidation rate of beluga myoglobin at 37 degrees C, pH 7.2 was 0.218+/-0.028 h(-1), 1/3 larger than reported for myoglobin of terrestrial mammals. There was no clear sequence change to explain the difference in oxygen binding or autoxidation although substitutions (N66 and T67) in an invariant rich sequence (HGNTV) distal to the heme may play a role. Structural models based on the protein sequence and constructed on topologies of known templates (horse and sperm whale crystal structures) were not adequate to assess perturbation of the heme pocket.
Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays

PubMed Central

Sugnet, Charles W; Srinivasan, Karpagam; Clark, Tyson A; O'Brien, Georgeann; Cline, Melissa S; Wang, Hui; Williams, Alan; Kulp, David; Blume, John E; Haussler, David; Ares, Manuel

2006-01-01

Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families. PMID:16424921
"Multiple partial recognitions in dynamic equilibrium" in the binding sites of proteins form the molecular basis of promiscuous recognition of structurally diverse ligands.

PubMed

Kohda, Daisuke

2018-04-01

Promiscuous recognition of ligands by proteins is as important as strict recognition in numerous biological processes. In living cells, many short, linear amino acid motifs function as targeting signals in proteins to specify the final destination of the protein transport. In general, the target signal is defined by a consensus sequence containing wild-characters, and hence represented by diverse amino acid sequences. The classical lock-and-key or induced-fit/conformational selection mechanism may not cover all aspects of the promiscuous recognition. On the basis of our crystallographic and NMR studies on the mitochondrial Tom20 protein-presequence interaction, we proposed a new hypothetical mechanism based on "a rapid equilibrium of multiple states with partial recognitions". This dynamic, multiple recognition mode enables the Tom20 receptor to recognize diverse mitochondrial presequences with nearly equal affinities. The plant Tom20 is evolutionally unrelated to the animal Tom20 in our study, but is a functional homolog of the animal/fungal Tom20. NMR studies by another research group revealed that the presequence binding by the plant Tom20 was not fully explained by simple interaction modes, suggesting the presence of a similar dynamic, multiple recognition mode. Circumstantial evidence also suggested that similar dynamic mechanisms may be applicable to other promiscuous recognitions of signal peptides by the SRP54/Ffh and SecA proteins.
Phosphorylation and subcellular redistribution of high mobility group proteins 14 and 17, analyzed by mass spectrometry.

PubMed Central

Louie, D. F.; Gloor, K. K.; Galasinski, S. C.; Resing, K. A.; Ahn, N. G.

2000-01-01

High mobility group (HMG) proteins 14 and 17 are nonhistone nuclear proteins that have been implicated in control of transcription and chromatin structure. To examine the posttranslational modifications of HMG-14 and -17 in vivo, HMG proteins were prepared from nuclear vs. cytosolic fractions of human K562 cells treated with 12-O-tetradecanoylphorbol 13-acetate (TPA) or okadaic acid (OA) and examined by electrospray mass spectrometry. Analysis of full-length masses demonstrated mono-, di-, and triphosphorylation of HMG-14 and mono- and diphosphorylation of HMG-17 from OA treated cells, whereas HMG-14 and -17 from TPA treated cells were monophosphorylated. Peptide mass and sequence analysis showed major and minor phosphorylation sites, respectively, at Ser24 and Ser28 in HMG-17, and Ser20 and Ser24 in HMG-14. These sites were found in the consensus sequence RRSARLSAK, within the nucleosomal binding domain of each protein. A third phosphorylation site in HMG-14 was located at either Ser6 or Ser7. Interestingly, the proportion of HMG-14 and -17 found in cytosolic pools increased significantly after 1 h of treatment compared to control cells and showed preferential phosphorylation compared with proteins from nuclear fractions. These results suggest that phosphorylation of HMG-14 and -7 interferes with nuclear localization mechanisms in a manner favoring release from nuclei. PMID:10739259
Phosphorylation and subcellular redistribution of high mobility group proteins 14 and 17, analyzed by mass spectrometry.

PubMed

Louie, D F; Gloor, K K; Galasinski, S C; Resing, K A; Ahn, N G

2000-01-01

High mobility group (HMG) proteins 14 and 17 are nonhistone nuclear proteins that have been implicated in control of transcription and chromatin structure. To examine the posttranslational modifications of HMG-14 and -17 in vivo, HMG proteins were prepared from nuclear vs. cytosolic fractions of human K562 cells treated with 12-O-tetradecanoylphorbol 13-acetate (TPA) or okadaic acid (OA) and examined by electrospray mass spectrometry. Analysis of full-length masses demonstrated mono-, di-, and triphosphorylation of HMG-14 and mono- and diphosphorylation of HMG-17 from OA treated cells, whereas HMG-14 and -17 from TPA treated cells were monophosphorylated. Peptide mass and sequence analysis showed major and minor phosphorylation sites, respectively, at Ser24 and Ser28 in HMG-17, and Ser20 and Ser24 in HMG-14. These sites were found in the consensus sequence RRSARLSAK, within the nucleosomal binding domain of each protein. A third phosphorylation site in HMG-14 was located at either Ser6 or Ser7. Interestingly, the proportion of HMG-14 and -17 found in cytosolic pools increased significantly after 1 h of treatment compared to control cells and showed preferential phosphorylation compared with proteins from nuclear fractions. These results suggest that phosphorylation of HMG-14 and -7 interferes with nuclear localization mechanisms in a manner favoring release from nuclei.
Fine-tuning structural RNA alignments in the twilight zone.

PubMed

Bremges, Andreas; Schirmer, Stefanie; Giegerich, Robert

2010-04-30

A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.
Distribution and Evolution of Yersinia Leucine-Rich Repeat Proteins

PubMed Central

Hu, Yueming; Huang, He; Hui, Xinjie; Cheng, Xi; White, Aaron P.

2016-01-01

Leucine-rich repeat (LRR) proteins are widely distributed in bacteria, playing important roles in various protein-protein interaction processes. In Yersinia, the well-characterized type III secreted effector YopM also belongs to the LRR protein family and is encoded by virulence plasmids. However, little has been known about other LRR members encoded by Yersinia genomes or their evolution. In this study, the Yersinia LRR proteins were comprehensively screened, categorized, and compared. The LRR proteins encoded by chromosomes (LRR1 proteins) appeared to be more similar to each other and different from those encoded by plasmids (LRR2 proteins) with regard to repeat-unit length, amino acid composition profile, and gene expression regulation circuits. LRR1 proteins were also different from LRR2 proteins in that the LRR1 proteins contained an E3 ligase domain (NEL domain) in the C-terminal region or an NEL domain-encoding nucleotide relic in flanking genomic sequences. The LRR1 protein-encoding genes (LRR1 genes) varied dramatically and were categorized into 4 subgroups (a to d), with the LRR1a to -c genes evolving from the same ancestor and LRR1d genes evolving from another ancestor. The consensus and ancestor repeat-unit sequences were inferred for different LRR1 protein subgroups by use of a maximum parsimony modeling strategy. Structural modeling disclosed very similar repeat-unit structures between LRR1 and LRR2 proteins despite the different unit lengths and amino acid compositions. Structural constraints may serve as the driving force to explain the observed mutations in the LRR regions. This study suggests that there may be functional variation and lays the foundation for future experiments investigating the functions of the chromosomally encoded LRR proteins of Yersinia. PMID:27217422
Conservation of CD44 exon v3 functional elements in mammals

PubMed Central

Vela, Elena; Hilari, Josep M; Delclaux, María; Fernández-Bellon, Hugo; Isamat, Marcos

2008-01-01

Background The human CD44 gene contains 10 variable exons (v1 to v10) that can be alternatively spliced to generate hundreds of different CD44 protein isoforms. Human CD44 variable exon v3 inclusion in the final mRNA depends on a multisite bipartite splicing enhancer located within the exon itself, which we have recently described, and provides the protein domain responsible for growth factor binding to CD44. Findings We have analyzed the sequence of CD44v3 in 95 mammalian species to report high conservation levels for both its splicing regulatory elements (the 3' splice site and the exonic splicing enhancer), and the functional glycosaminglycan binding site coded by v3. We also report the functional expression of CD44v3 isoforms in peripheral blood cells of different mammalian taxa with both consensus and variant v3 sequences. Conclusion CD44v3 mammalian sequences maintain all functional splicing regulatory elements as well as the GAG binding site with the same relative positions and sequence identity previously described during alternative splicing of human CD44. The sequence within the GAG attachment site, which in turn contains the Y motif of the exonic splicing enhancer, is more conserved relative to the rest of exon. Amplification of CD44v3 sequence from mammalian species but not from birds, fish or reptiles, may lead to classify CD44v3 as an exclusive mammalian gene trait. PMID:18710510

Amino acid sequence of the human fibronectin receptor

PubMed Central

1987-01-01

The amino acid sequence deduced from cDNA of the human placental fibronectin receptor is reported. The receptor is composed of two subunits: an alpha subunit of 1,008 amino acids which is processed into two polypeptides disulfide bonded to one another, and a beta subunit of 778 amino acids. Each subunit has near its COOH terminus a hydrophobic segment. This and other sequence features suggest a structure for the receptor in which the hydrophobic segments serve as transmembrane domains anchoring each subunit to the membrane and dividing each into a large ectodomain and a short cytoplasmic domain. The alpha subunit ectodomain has five sequence elements homologous to consensus Ca2+- binding sites of several calcium-binding proteins, and the beta subunit contains a fourfold repeat strikingly rich in cysteine. The alpha subunit sequence is 46% homologous to the alpha subunit of the vitronectin receptor. The beta subunit is 44% homologous to the human platelet adhesion receptor subunit IIIa and 47% homologous to a leukocyte adhesion receptor beta subunit. The high degree of homology (85%) of the beta subunit with one of the polypeptides of a chicken adhesion receptor complex referred to as integrin complex strongly suggests that the latter polypeptide is the chicken homologue of the fibronectin receptor beta subunit. These receptor subunit homologies define a superfamily of adhesion receptors. The availability of the entire protein sequence for the fibronectin receptor will facilitate studies on the functions of these receptors. PMID:2958481
In silico analysis of β-1,3-glucanase from a psychrophilic yeast, Glaciozyma antarctica PI12

NASA Astrophysics Data System (ADS)

Mohammadi, Salimeh; Bakar, Farah Diba Abu; Rabu, Amir; Murad, Abdul Munir Abdul

2014-09-01

1,3-beta-glucanase is an industrially important enzyme having wide range of applications especially in food industry. It is crucial to gain an understanding about the structure and functional aspects of various beta-1,3-glucanase produced from diverse sources. In this, study a cDNA encoding β-1,3-glucanase (GaExg55) was isolated from a psychrophilic yeast, Glaciozyma antarctica PI12. The cDNA sequence has been submitted to Genbank with an accession number (KJ436377). Subsequently, the perdition protein was analyzed using various bioinformatics tools to explore the properties of the protein. GaEXG55 is consisting of 1,440-bp nucleotides encoding 480 amino acid residues. Alignment of the deduced amino acid for GaExg55 with other exo-β-1,3-glucanase available at the NCBI database indicate that deduced amino acids shared a consensus motif NEP, which is signature pattern of GH5 hydrolases. Predicted molecular weight of GaExg55 is 53.66 kDa. GaExg55 sequences possesses signal peptide sequence and it is highly conserved with other fungal exo-beta-1,3 glucanase.
A Molecular Phylogeny of Hemiptera Inferred from Mitochondrial Genome Sequences

PubMed Central

Song, Nan; Liang, Ai-Ping; Bu, Cui-Ping

2012-01-01

Classically, Hemiptera is comprised of two suborders: Homoptera and Heteroptera. Homoptera includes Cicadomorpha, Fulgoromorpha and Sternorrhyncha. However, according to previous molecular phylogenetic studies based on 18S rDNA, Fulgoromorpha has a closer relationship to Heteroptera than to other hemipterans, leaving Homoptera as paraphyletic. Therefore, the position of Fulgoromorpha is important for studying phylogenetic structure of Hemiptera. We inferred the evolutionary affiliations of twenty-five superfamilies of Hemiptera using mitochondrial protein-coding genes and rRNAs. We sequenced three mitogenomes, from Pyrops candelaria, Lycorma delicatula and Ricania marginalis, representing two additional families in Fulgoromorpha. Pyrops and Lycorma are representatives of an additional major family Fulgoridae in Fulgoromorpha, whereas Ricania is a second representative of the highly derived clade Ricaniidae. The organization and size of these mitogenomes are similar to those of the sequenced fulgoroid species. Our consensus phylogeny of Hemiptera largely supported the relationships (((Fulgoromorpha,Sternorrhyncha),Cicadomorpha),Heteroptera), and thus supported the classic phylogeny of Hemiptera. Selection of optimal evolutionary models (exclusion and inclusion of two rRNA genes or of third codon positions of protein-coding genes) demonstrated that rapidly evolving and saturated sites should be removed from the analyses. PMID:23144967
Direct inhibition of the DNA-binding activity of POU transcription factors Pit-1 and Brn-3 by selective binding of a phenyl-furan-benzimidazole dication.

PubMed

Peixoto, Paul; Liu, Yang; Depauw, Sabine; Hildebrand, Marie-Paule; Boykin, David W; Bailly, Christian; Wilson, W David; David-Cordonnier, Marie-Hélène

2008-06-01

The development of small molecules to control gene expression could be the spearhead of future-targeted therapeutic approaches in multiple pathologies. Among heterocyclic dications developed with this aim, a phenyl-furan-benzimidazole dication DB293 binds AT-rich sites as a monomer and 5'-ATGA sequence as a stacked dimer, both in the minor groove. Here, we used a protein/DNA array approach to evaluate the ability of DB293 to specifically inhibit transcription factors DNA-binding in a single-step, competitive mode. DB293 inhibits two POU-domain transcription factors Pit-1 and Brn-3 but not IRF-1, despite the presence of an ATGA and AT-rich sites within all three consensus sequences. EMSA, DNase I footprinting and surface-plasmon-resonance experiments determined the precise binding site, affinity and stoichiometry of DB293 interaction to the consensus targets. Binding of DB293 occurred as a cooperative dimer on the ATGA part of Brn-3 site but as two monomers on AT-rich sites of IRF-1 sequence. For Pit-1 site, ATGA or AT-rich mutated sequences identified the contribution of both sites for DB293 recognition. In conclusion, DB293 is a strong inhibitor of two POU-domain transcription factors through a cooperative binding to ATGA. These findings are the first to show that heterocyclic dications can inhibit major groove transcription factors and they open the door to the control of transcription factors activity by those compounds.
Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes.

PubMed

Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun; Borodovsky, Mark

2018-05-17

In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed "heuristic" models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts. © 2018 Lomsadze et al.; Published by Cold Spring Harbor Laboratory Press.
Trans splicing in Leishmania enriettii and identification of ribonucleoprotein complexes containing the spliced leader and U2 equivalent RNAs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, S.I.; Wirth, D.F.

1988-06-01

The 5' ends of Leishmania mRNAs contain an identical 35-nucleotide sequence termed the spliced leader (SL) or 5' mini-exon. The SL sequence is at the 5' end of an 85-nucleotide primary transcript that contains a consensus eucaryotic 5' intron-exon splice junction immediately 3' to the SL. The SL is added to protein-coding genes immediately 3' to a consensus eucaryotic 3' intron-exon splice junction. The authors' previous work demonstrated possible intermediates in discontinuous mRNA processing that contain the 50 nucleotides of the SL primary transcript 3' to the SL, the SL intron sequence (SLIS). These RNAs have a 5' terminus atmore » the splice junction of the SL and the SLIS. The authors examined a Leishmania nuclear extract for these RNAs in ribonucleoprotein (RNP) particles. Density centrifugation analysis showed that the SL RNA is predominately in RNP complexes at 60S, while the SLIS-containing RNAs are in complexes at 40S. They also demonstrated that the SLIS can be released from polyadenylated RNA by incubation with a HeLa cell extract containing debranching enzymatic activity. These data suggested that Leishmania enriettii mRNAs are assembled by bimolecular or trans splicing as has been recently demonstrated for Trypanosoma brucei. Furthermore, they determined the partial sequence of the Leishmania U2 equivalent RNA and demonstrated that it cosediments with the SL RNA at 60S in a nuclear extract. These RNP particles may be analogous to so-called spliceosomes that have been demonstrated in other systems.« less
Intercalation of XR5944 with the estrogen response element is modulated by the tri-nucleotide spacer sequence between half-sites

PubMed Central

Sidell, Neil; Mathad, Raveendra I.; Shu, Feng-jue; Zhang, Zhenjiang; Kallen, Caleb B.; Yang, Danzhou

2011-01-01

DNA-intercalating molecules can impair DNA replication, DNA repair, and gene transcription. We previously demonstrated that XR5944, a DNA bis-intercalator, specifically blocks binding of estrogen receptor-α (ERα) to the consensus estrogen response element (ERE). The consensus ERE sequence is AGGTCAnnnTGACCT, where nnn is known as the tri-nucleotide spacer. Recent work has shown that the tri-nucleotide spacer can modulate ERα-ERE binding affinity and ligand-mediated transcriptional responses. To further understand the mechanism by which XR5944 inhibits ERα-ERE binding, we tested its ability to interact with consensus EREs with variable tri-nucleotide spacer sequences and with natural but non-consensus ERE sequences using one dimensional nuclear magnetic resonance (1D 1H NMR) titration studies. We found that the tri-nucleotide spacer sequence significantly modulates the binding of XR5944 to EREs. Of the sequences that were tested, EREs with CGG and AGG spacers showed the best binding specificity with XR5944, while those spaced with TTT demonstrated the least specific binding. The binding stoichiometry of XR5944 with EREs was 2:1, which can explain why the spacer influences the drug-DNA interaction; each XR5944 spans four nucleotides (including portions of the spacer) when intercalating with DNA. To validate our NMR results, we conducted functional studies using reporter constructs containing consensus EREs with tri-nucleotide spacers CGG, CTG, and TTT. Results of reporter assays in MCF-7 cells indicated that XR5944 was significantly more potent in inhibiting the activity of CGG- than TTT-spaced EREs, consistent with our NMR results. Taken together, these findings predict that the anti-estrogenic effects of XR5944 will depend not only on ERE half-site composition but also on the tri-nucleotide spacer sequence of EREs located in the promoters of estrogen-responsive genes. PMID:21333738
Binding of the cSH3 Domain of Grb2 Adaptor to Two Distinct RXXK Motifs within Gab1 Docker Employs Differential Mechanisms

PubMed Central

McDonald, Caleb B.; Seldeen, Kenneth L.; Deegan, Brian J.; Bhat, Vikas; Farooq, Amjad

2010-01-01

A ubiquitous component of cellular signaling machinery, Gab1 docker plays a pivotal role in routing extracellular information in the form of growth factors and cytokines to downstream targets such as transcription factors within the nucleus. Here, using isothermal titration calorimetry (ITC) in combination with macromolecular modeling (MM), we show that although Gab1 contains four distinct RXXK motifs, designated G1, G2, G3 and G4, only G1 and G2 motifs bind to the cSH3 domain of Grb2 adaptor and do so with distinct mechanisms. Thus, while the G1 motif strictly requires the PPRPPKP consensus sequence for high-affinity binding to the cSH3 domain, the G2 motif displays preference for the PXVXRXLKPXR consensus. Such sequential differences in the binding of G1 and G2 motifs arise from their ability to adopt distinct polyproline type II (PPII)- and 310-helical conformations upon binding to the cSH3 domain, respectively. Collectively, our study provides detailed biophysical insights into a key protein-protein interaction involved in a diverse array of signaling cascades central to health and disease. PMID:21472810
Molecular typing of Vibrio parahaemolyticus isolated from seafood harvested along the south-west coast of India.

PubMed

Bhowmick, P P; Khushiramani, R; Raghunath, P; Karunasagar, I; Karunasagar, I

2008-02-01

Evaluation of protein profiling for typing Vibrio parahaemolyticus using 71 strains isolated from different seafood and comparison with other molecular typing techniques such as random amplified polymorphic DNA analysis (RAPD) and enterobacterial repetitive intergenic consensus sequence (ERIC)-PCR. Three molecular typing methods were used for the typing of 71 V. parahaemolyticus isolates from seafood. RAPD had a discriminatory index (DI) of 0.95, while ERIC-PCR showed a DI of 0.94. Though protein profiling had less discriminatory power, use of this method can be helpful in identifying new proteins which might have a role in establishment in the host or virulence of the organism. The use of protein profiling in combination with other established typing methods such as RAPD and ERIC-PCR generates useful information in the case of V. parahaemolyticus associated with seafood. The study demonstrates the usefulness of nucleic acid and protein-based studies in understanding the relationship between various isolates from seafood.
Distribution of a Nocardia brasiliensis catalase gene fragment in members of the genera Nocardia, Gordona, and Rhodococcus.

PubMed

Vera-Cabrera, L; Johnson, W M; Welsh, O; Resendiz-Uresti, F L; Salinas-Carmona, M C

1999-06-01

An immunodominant protein from Nocardia brasiliensis, P61, was subjected to amino-terminal and internal sequence analysis. Three sequences of 22, 17, and 38 residues, respectively, were obtained and compared with the protein database from GenBank by using the BLAST system. The sequences showed homology to some eukaryotic catalases and to a bromoperoxidase-catalase from Streptomyces violaceus. Its identity as a catalase was confirmed by analysis of its enzymatic activity on H2O2 and by a double-staining method on a nondenaturing polyacrylamide gel with 3,3'-diaminobenzidine and ferricyanide; the result showed only catalase activity, but no peroxidase. By using one of the internal amino acid sequences and a consensus catalase motif (VGNNTP), we were able to design a PCR assay that generated a 500-bp PCR product. The amplicon was analyzed, and the nucleotide sequence was compared to the GenBank database with the observation of high homology to other bacterial and eukaryotic catalases. A PCR assay based on this target sequence was performed with primers NB10 and NB11 to confirm the presence of the NB10-NB11 gene fragment in several N. brasiliensis strains isolated from mycetoma. The same assay was used to determine whether there were homologous sequences in several type strains from the genera Nocardia, Rhodococcus, Gordona, and Streptomyces. All of the N. brasiliensis strains presented a positive result but only some of the actinomycetes species tested were positive in the PCR assay. In order to confirm these findings, genomic DNA was subjected to Southern blot analysis. A 1.7-kbp band was observed in the N. brasiliensis strains, and bands of different molecular weight were observed in cross-reacting actinomycetes. Sequence analysis of the amplicons of selected actinomycetes showed high homology in this catalase fragment, thus demonstrating that this protein is highly conserved in this group of bacteria.
Sequence diversity of wheat mosaic virus isolates.

PubMed

Stewart, Lucy R

2016-02-02

Wheat mosaic virus (WMoV), transmitted by eriophyid wheat curl mites (Aceria tosichella) is the causal agent of High Plains disease in wheat and maize. WMoV and other members of the genus Emaravirus evaded thorough molecular characterization for many years due to the experimental challenges of mite transmission and manipulating multisegmented negative sense RNA genomes. Recently, the complete genome sequence of a Nebraska isolate of WMoV revealed eight segments, plus a variant sequence of the nucleocapsid protein-encoding segment. Here, near-complete and partial consensus sequences of five more WMoV isolates are reported and compared to the Nebraska isolate: an Ohio maize isolate (GG1), a Kansas barley isolate (KS7), and three Ohio wheat isolates (H1, K1, W1). Results show two distinct groups of WMoV isolates: Ohio wheat isolate RNA segments had 84% or lower nucleotide sequence identity to the NE isolate, whereas GG1 and KS7 had 98% or higher nucleotide sequence identity to the NE isolate. Knowledge of the sequence variability of WMoV isolates is a step toward understanding virus biology, and potentially explaining observed biological variation. Published by Elsevier B.V.
Identification of Atg3 as an intrinsically disordered polypeptide yields insights into the molecular dynamics of autophagy-related proteins in yeast.

PubMed

Popelka, Hana; Uversky, Vladimir N; Klionsky, Daniel J

2014-06-01

The mechanism of autophagy relies on complex cell signaling and regulatory processes. Each cell contains many proteins that lack a rigid 3-dimensional structure under physiological conditions. These dynamic proteins, called intrinsically disordered proteins (IDPs) and protein regions (IDPRs), are predominantly involved in cell signaling and regulation. Yet, very little is known about their presence among proteins of the core autophagy machinery. In this work, we characterized the autophagy protein Atg3 from yeast and human along with 2 variants to show that Atg3 is an IDPRs-containing protein and that disorder/order predicted for these proteins from their amino acid sequence corresponds to their experimental characteristics. Based on this consensus, we applied the same prediction methods to all known Atg proteins from Saccharomyces cerevisiae. The data presented here provide an insight into the structural dynamics of each Atg protein. They also show that intrinsic disorder at various levels has to be taken into consideration for about half of the Atg proteins. This work should become a useful tool that will facilitate and encourage exploration of protein intrinsic disorder in autophagy.
Purification, molecular cloning, and expression of 2-hydroxyphytanoyl-CoA lyase, a peroxisomal thiamine pyrophosphate-dependent enzyme that catalyzes the carbon–carbon bond cleavage during α-oxidation of 3-methyl-branched fatty acids

PubMed Central

Foulon, Veerle; Antonenkov, Vasily D.; Croes, Kathleen; Waelkens, Etienne; Mannaerts, Guy P.; Van Veldhoven, Paul P.; Casteels, Minne

1999-01-01

In the third step of the α-oxidation of 3-methyl-branched fatty acids such as phytanic acid, a 2-hydroxy-3-methylacyl-CoA is cleaved into formyl-CoA and a 2-methyl-branched fatty aldehyde. The cleavage enzyme was purified from the matrix protein fraction of rat liver peroxisomes and identified as a protein made up of four identical subunits of 63 kDa. Its activity proved to depend on Mg2+ and thiamine pyrophosphate, a hitherto unrecognized cofactor of α-oxidation. Formyl-CoA and 2-methylpentadecanal were identified as reaction products when the purified enzyme was incubated with 2-hydroxy-3-methylhexadecanoyl-CoA as the substrate. Hence the enzyme catalyzes a carbon–carbon cleavage, and we propose calling it 2-hydroxyphytanoyl-CoA lyase. Sequences derived from tryptic peptides of the purified rat protein were used as queries to recover human expressed sequence tags from the databases. The composite cDNA sequence of the human lyase contained an ORF of 1,734 bases that encodes a polypeptide with a calculated molecular mass of 63,732 Da. Recombinant human protein, expressed in mammalian cells, exhibited lyase activity. The lyase displayed homology to a putative Caenorhabditis elegans protein that resembles bacterial oxalyl-CoA decarboxylases. Similarly to the decarboxylases, a thiamine pyrophosphate-binding consensus domain was present in the C-terminal part of the lyase. Although no peroxisome targeting signal, neither 1 nor 2, was apparent, transfection experiments with constructs encoding green fluorescent protein fused to the full-length lyase or its C-terminal pentapeptide indicated that the C terminus of the lyase represents a peroxisome targeting signal 1 variant. PMID:10468558
Molecular phylogeny of 21 tropical bamboo species reconstructed by integrating non-coding internal transcribed spacer (ITS1 and 2) sequences and their consensus secondary structure.

PubMed

Ghosh, Jayadri Sekhar; Bhattacharya, Samik; Pal, Amita

2017-06-01

The unavailability of the reproductive structure and unpredictability of vegetative characters for the identification and phylogenetic study of bamboo prompted the application of molecular techniques for greater resolution and consensus. We first employed internal transcribed spacer (ITS1, 5.8S rRNA and ITS2) sequences to construct the phylogenetic tree of 21 tropical bamboo species. While the sequence alone could grossly reconstruct the traditional phylogeny amongst the 21-tropical species studied, some anomalies were encountered that prompted a further refinement of the phylogenetic analyses. Therefore, we integrated the secondary structure of the ITS sequences to derive individual sequence-structure matrix to gain more resolution on the phylogenetic reconstruction. The results showed that ITS sequence-structure is the reliable alternative to the conventional phenotypic method for the identification of bamboo species. The best-fit topology obtained by the sequence-structure based phylogeny over the sole sequence based one underscores closer clustering of all the studied Bambusa species (Sub-tribe Bambusinae), while Melocanna baccifera, which belongs to Sub-Tribe Melocanneae, disjointedly clustered as an out-group within the consensus phylogenetic tree. In this study, we demonstrated the dependability of the combined (ITS sequence+structure-based) approach over the only sequence-based analysis for phylogenetic relationship assessment of bamboo.
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

PubMed Central

2012-01-01

Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.

PubMed

Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl

2012-07-13

Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
Sequence polymorphism in an insect RNA virus field population: A snapshot from a single point in space and time reveals stochastic differences among and within individual hosts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stenger, Drake C., E-mail: drake.stenger@ars.usda.

Population structure of Homalodisca coagulata Virus-1 (HoCV-1) among and within field-collected insects sampled from a single point in space and time was examined. Polymorphism in complete consensus sequences among single-insect isolates was dominated by synonymous substitutions. The mutant spectrum of the C2 helicase region within each single-insect isolate was unique and dominated by nonsynonymous singletons. Bootstrapping was used to correct the within-isolate nonsynonymous:synonymous arithmetic ratio (N:S) for RT-PCR error, yielding an N:S value ~one log-unit greater than that of consensus sequences. Probability of all possible single-base substitutions for the C2 region predicted N:S values within 95% confidence limits of themore » corrected within-isolate N:S when the only constraint imposed was viral polymerase error bias for transitions over transversions. These results indicate that bottlenecks coupled with strong negative/purifying selection drive consensus sequences toward neutral sequence space, and that most polymorphism within single-insect isolates is composed of newly-minted mutations sampled prior to selection. -- Highlights: •Sampling protocol minimized differential selection/history among isolates. •Polymorphism among consensus sequences dominated by negative/purifying selection. •Within-isolate N:S ratio corrected for RT-PCR error by bootstrapping. •Within-isolate mutant spectrum dominated by new mutations yet to undergo selection.« less
ATtRACT-a database of RNA-binding proteins and associated motifs.

PubMed

Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

2016-01-01

RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.
[Prediction of ETA oligopeptides antagonists from Glycine max based on in silico proteolysis].

PubMed

Qiao, Lian-Sheng; Jiang, Lu-di; Luo, Gang-Gang; Lu, Fang; Chen, Yan-Kun; Wang, Ling-Zhi; Li, Gong-Yu; Zhang, Yan-Ling

2017-02-01

Oligopeptides are one of the the key pharmaceutical effective constituents of traditional Chinese medicine(TCM). Systematic study on composition and efficacy of TCM oligopeptides is essential for the analysis of material basis and mechanism of TCM. In this study, the potential anti-hypertensive oligopeptides from Glycine max and their endothelin receptor A (ETA) antagonistic activity were discovered and predicted based on in silico technologies.Main protein sequences of G. max were collected and oligopeptides were obtained using in silico gastrointestinal tract proteolysis. Then, the pharmacophore of ETA antagonistic peptides was constructed and included one hydrophobic feature, one ionizable negative feature, one ring aromatic feature and five excluded volumes. Meanwhile, three-dimensional structure of ETA was developed by homology modeling methods for further docking studies. According to docking analysis and consensus score, the key amino acid of GLN165 was identified for ETA antagonistic activity. And 27 oligopeptides from G. max were predicted as the potential ETA antagonists by pharmacophore and docking studies.In silico proteolysis could be used to analyze the protein sequences from TCM. According to combination of in silico proteolysis and molecular simulation, the biological activities of oligopeptides could be predicted rapidly based on the known TCM protein sequence. It might provide the methodology basis for rapidly and efficiently implementing the mechanism analysis of TCM oligopeptides. Copyright© by the Chinese Pharmaceutical Association.
Purification, cDNA cloning, and regulation of lysophospholipase from rat liver.

PubMed

Sugimoto, H; Hayashi, H; Yamashita, S

1996-03-29

A lysophospholipase was purified 506-fold from rat liver supernatant. The preparation gave a single 24-kDa protein band on SDS-polyacrylamide gel electrophoresis. The enzyme hydrolyzed lysophosphatidylcholine, lysophosphatidylethanolamine, lysophosphatidylinositol, lysophosphatidylserine, and 1-oleoyl-2-acetyl-sn-glycero-3-phosphocholine at pH 6-8. The purified enzyme was used for the preparation of antibody and peptide sequencing. A cDNA clone was isolated by screening a rat liver lambda gt11 cDNA library with the antibody, followed by the selection of further extended clones from a lambda gt10 library. The isolated cDNA was 2,362 base pairs in length and contained an open reading frame encoding 230 amino acids with a Mr of 24,708. The peptide sequences determined were found in the reading frame. When the cDNA was expressed in Escherichia coli cells as the beta-galactosidase fusion, lysophosphatidylcholine-hydrolyzing activity was markedly increased. The deduced amino acid sequence showed significant similarity to Pseudomonas fluorescence esterase A and Spirulina platensis esterase. The three sequences contained the GXSXG consensus at similar positions. The transcript was found in various tissues with the following order of abundance: spleen, heart, kidney, brain, lung, stomach, and testis = liver. In contrast, the enzyme protein was abundant in the following order: testis, liver, kidney, heart, stomach, lung, brain, and spleen. Thus the mRNA abundance disagreed with the level of the enzyme protein in liver, testis, and spleen. When HL-60 cells were induced to differentiate into granulocytes with dimethyl sulfoxide, the 24-kDa lysophospholipase protein increased significantly, but the mRNA abundance remained essentially unchanged. Thus a posttranscriptional control mechanism is present for the regulation of 24-kDa lysophospholipase.

An ensemble framework for clustering protein-protein interaction networks.

PubMed

Asur, Sitaram; Ucar, Duygu; Parthasarathy, Srinivasan

2007-07-01

Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. The presence of biologically relevant functional modules in these networks has been theorized by many researchers. However, the application of traditional clustering algorithms for extracting these modules has not been successful, largely due to the presence of noisy false positive interactions as well as specific topological challenges in the network. In this article, we propose an ensemble clustering framework to address this problem. For base clustering, we introduce two topology-based distance metrics to counteract the effects of noise. We develop a PCA-based consensus clustering technique, designed to reduce the dimensionality of the consensus problem and yield informative clusters. We also develop a soft consensus clustering variant to assign multifaceted proteins to multiple functional groups. We conduct an empirical evaluation of different consensus techniques using topology-based, information theoretic and domain-specific validation metrics and show that our approaches can provide significant benefits over other state-of-the-art approaches. Our analysis of the consensus clusters obtained demonstrates that ensemble clustering can (a) produce improved biologically significant functional groupings; and (b) facilitate soft clustering by discovering multiple functional associations for proteins. Supplementary data are available at Bioinformatics online.
Enhanced vulnerability of human proteins towards disease-associated inactivation through divergent evolution.

PubMed

Medina-Carmona, Encarnación; Fuchs, Julian E; Gavira, Jose A; Mesa-Torres, Noel; Neira, Jose L; Salido, Eduardo; Palomino-Morales, Rogelio; Burgos, Miguel; Timson, David J; Pey, Angel L

2017-09-15

Human proteins are vulnerable towards disease-associated single amino acid replacements affecting protein stability and function. Interestingly, a few studies have shown that consensus amino acids from mammals or vertebrates can enhance protein stability when incorporated into human proteins. Here, we investigate yet unexplored relationships between the high vulnerability of human proteins towards disease-associated inactivation and recent evolutionary site-specific divergence of stabilizing amino acids. Using phylogenetic, structural and experimental analyses, we show that divergence from the consensus amino acids at several sites during mammalian evolution has caused local protein destabilization in two human proteins linked to disease: cancer-associated NQO1 and alanine:glyoxylate aminotransferase, mutated in primary hyperoxaluria type I. We demonstrate that a single consensus mutation (H80R) acts as a disease suppressor on the most common cancer-associated polymorphism in NQO1 (P187S). The H80R mutation reactivates P187S by enhancing FAD binding affinity through local and dynamic stabilization of its binding site. Furthermore, we show how a second suppressor mutation (E247Q) cooperates with H80R in protecting the P187S polymorphism towards inactivation through long-range allosteric communication within the structural ensemble of the protein. Our results support that recent divergence of consensus amino acids may have occurred with neutral effects on many functional and regulatory traits of wild-type human proteins. However, divergence at certain sites may have increased the propensity of some human proteins towards inactivation due to disease-associated mutations and polymorphisms. Consensus mutations also emerge as a potential strategy to identify structural hot-spots in proteins as targets for pharmacological rescue in loss-of-function genetic diseases. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
New insights into the targeting of a subset of tail-anchored proteins to the outer mitochondrial membrane

PubMed Central

Marty, Naomi J.; Teresinski, Howard J.; Hwang, Yeen Ting; Clendening, Eric A.; Gidda, Satinder K.; Sliwinska, Elwira; Zhang, Daiyuan; Miernyk, Ján A.; Brito, Glauber C.; Andrews, David W.; Dyer, John M.; Mullen, Robert T.

2014-01-01

Tail-anchored (TA) proteins are a unique class of functionally diverse membrane proteins defined by their single C-terminal membrane-spanning domain and their ability to insert post-translationally into specific organelles with an Ncytoplasm-Corganelle interior orientation. The molecular mechanisms by which TA proteins are sorted to the proper organelles are not well-understood. Herein we present results indicating that a dibasic targeting motif (i.e., -R-R/K/H-X{X≠E}) identified previously in the C terminus of the mitochondrial isoform of the TA protein cytochrome b5, also exists in many other A. thaliana outer mitochondrial membrane (OMM)-TA proteins. This motif is conspicuously absent, however, in all but one of the TA protein subunits of the translocon at the outer membrane of mitochondria (TOM), suggesting that these two groups of proteins utilize distinct biogenetic pathways. Consistent with this premise, we show that the TA sequences of the dibasic-containing proteins are both necessary and sufficient for targeting to mitochondria, and are interchangeable, while the TA regions of TOM proteins lacking a dibasic motif are necessary, but not sufficient for localization, and cannot be functionally exchanged. We also present results from a comprehensive mutational analysis of the dibasic motif and surrounding sequences that not only greatly expands the functional definition and context-dependent properties of this targeting signal, but also led to the identification of other novel putative OMM-TA proteins. Collectively, these results provide important insight to the complexity of the targeting pathways involved in the biogenesis of OMM-TA proteins and help define a consensus targeting motif that is utilized by at least a subset of these proteins. PMID:25237314
Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

PubMed Central

2013-01-01

Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620
Enhancement of the efficacy of therapeutic proteins by formulation with PEGylated liposomes; a case of FVIII, FVIIa and G-CSF.

PubMed

Yatuv, Rivka; Robinson, Micah; Dayan, Inbal; Baru, Moshe

2010-02-01

Improving the pharmacodynamics of protein drugs has the potential to improve the care and the quality of life of patients suffering from a variety of diseases. Four approaches to improve protein drugs are described: PEGylation, amino acid substitution, fusion to carrier proteins and encapsulation. A new platform technology based on the binding of proteins/peptides to the outer surface of PEGylated liposomes (PEGLip) is then presented. Binding of proteins to PEGLip is non-covalent, highly specific and dependent on an amino acid consensus sequence within the proteins. Association of proteins with PEGLip results in substantial enhancement of the pharmacodynamic properties of proteins following administration. This has been demonstrated in preclinical studies and clinical trials with coagulation factors VIII and VIIa. It has also been demonstrated in preclinical studies with granulocyte colony-stimulating factor. A mechanism is presented that explains the improvements in hemostatic efficacy of PEGLip-formulated coagulation factors VIII and VIIa. The reader will gain an understanding of the advantages and disadvantages of each of the approaches discussed. PEGLip formulation is an important new approach to improve the pharmacodynamics of protein drugs. This approach may be applied to further therapeutic proteins in the future.
Fine-tuning structural RNA alignments in the twilight zone

PubMed Central

2010-01-01

Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index. PMID:20433706
Burkholderia sp. induces functional nodules on the South African invasive legume Dipogon lignosus (Phaseoleae) in New Zealand soils.

PubMed

Liu, Wendy Y Y; Ridgway, Hayley J; James, Trevor K; James, Euan K; Chen, Wen-Ming; Sprent, Janet I; Young, J Peter W; Andrews, Mitchell

2014-10-01

The South African invasive legume Dipogon lignosus (Phaseoleae) produces nodules with both determinate and indeterminate characteristics in New Zealand (NZ) soils. Ten bacterial isolates produced functional nodules on D. lignosus. The 16S ribosomal RNA (rRNA) gene sequences identified one isolate as Bradyrhizobium sp., one isolate as Rhizobium sp. and eight isolates as Burkholderia sp. The Bradyrhizobium sp. and Rhizobium sp. 16S rRNA sequences were identical to those of strains previously isolated from crop plants and may have originated from inocula used on crops. Both 16S rRNA and DNA recombinase A (recA) gene sequences placed the eight Burkholderia isolates separate from previously described Burkholderia rhizobial species. However, the isolates showed a very close relationship to Burkholderia rhizobial strains isolated from South African plants with respect to their nitrogenase iron protein (nifH), N-acyltransferase nodulation protein A (nodA) and N-acetylglucosaminyl transferase nodulation protein C (nodC) gene sequences. Gene sequences and enterobacterial repetitive intergenic consensus (ERIC) PCR and repetitive element palindromic PCR (rep-PCR) banding patterns indicated that the eight Burkholderia isolates separated into five clones of one strain and three of another. One strain was tested and shown to produce functional nodules on a range of South African plants previously reported to be nodulated by Burkholderia tuberum STM678(T) which was isolated from the Cape Region. Thus, evidence is strong that the Burkholderia strains isolated here originated in South Africa and were somehow transported with the plants from their native habitat to NZ. It is possible that the strains are of a new species capable of nodulating legumes.
Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

PubMed

Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

2013-07-01

The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Structural organization of the porcine and human genes coding for a leydig cell-specific insulin-like peptide (LEY I-L) and chromosomal localization of the human gene (INSL3)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burkhardt E.; Adham, I.M.; Brosig, B.

1994-03-01

Leydig insulin-like protein (LEY I-L) is a member of the insulin-like hormone superfamily. The LEY I-L gene (designated INSL3) is expressed exclusively in prenatal and postnatal Leydig cells. The authors report here the cloning and nucleotide sequence of porcine and human LEY I-L genes including the 5[prime] regions. Both genes consist of two exons and one intron. The organization of the LEY I-L gene is similar to that of insulin and relaxin. The transcription start site in the porcine and human LEY I-L gene is localized 13 and 14 bp upstream of the translation start site, respectively. Alignment of themore » 5[prime] flanking regions of both genes reveals that the first 107 nucleotides upstream of the transcription start site exhibit an overall sequence similarity of 80%. This conserved region contains a consensus TATAA box, a CAAT-like element (GAAT), and a consensus SP1 sequence (GGGCGG) at equivalent positions in both genes and therefore may play a role in regulation of expression of the LEY I-L gene. The porcine and human genome contains a single copy of the LEY I-L gene. By in situ hybridization, the human gene was assigned to bands p13.2-p12 of the short arm of chromosome 19. 25 refs., 6 figs.« less
Chimaeric Virus-Like Particles Derived from Consensus Genome Sequences of Human Rotavirus Strains Co-Circulating in Africa

PubMed Central

Jere, Khuzwayo C.; O'Neill, Hester G.; Potgieter, A. Christiaan; van Dijk, Alberdina A.

2014-01-01

Rotavirus virus-like particles (RV-VLPs) are potential alternative non-live vaccine candidates due to their high immunogenicity. They mimic the natural conformation of native viral proteins but cannot replicate because they do not contain genomic material which makes them safe. To date, most RV-VLPs have been derived from cell culture adapted strains or common G1 and G3 rotaviruses that have been circulating in communities for some time. In this study, chimaeric RV-VLPs were generated from the consensus sequences of African rotaviruses (G2, G8, G9 or G12 strains associated with either P[4], P[6] or P[8] genotypes) characterised directly from human stool samples without prior adaptation of the wild type strains to cell culture. Codon-optimised sequences for insect cell expression of genome segments 2 (VP2), 4 (VP4), 6 (VP6) and 9 (VP7) were cloned into a modified pFASTBAC vector, which allowed simultaneous expression of up to four genes using the Bac-to-Bac Baculovirus Expression System (BEVS; Invitrogen). Several combinations of the genome segments originating from different field strains were cloned to produce double-layered RV-VLPs (dRV-VLP; VP2/6), triple-layered RV-VLPs (tRV-VLP; VP2/6/7 or VP2/6/7/4) and chimaeric tRV-VLPs. The RV-VLPs were produced by infecting Spodoptera frugiperda 9 and Trichoplusia ni cells with recombinant baculoviruses using multi-cistronic, dual co-infection and stepwise-infection expression strategies. The size and morphology of the RV-VLPs, as determined by transmission electron microscopy, revealed successful production of RV-VLPs. The novel approach of producing tRV-VLPs, by using the consensus insect cell codon-optimised nucleotide sequence derived from dsRNA extracted directly from clinical specimens, should speed-up vaccine research and development by by-passing the need to adapt rotaviruses to cell culture. Other problems associated with cell culture adaptation, such as possible changes in epitopes, can also be circumvented. Thus, it is now possible to generate tRV-VLPs for evaluation as non-live vaccine candidates for any human or animal field rotavirus strain. PMID:25268783
The Sequence-specific Peptide-binding Activity of the Protein Sulfide Isomerase AGR2 Directs Its Stable Binding to the Oncogenic Receptor EpCAM.

PubMed

Mohtar, M Aiman; Hernychova, Lenka; O'Neill, J Robert; Lawrence, Melanie L; Murray, Euan; Vojtesek, Borek; Hupp, Ted R

2018-04-01

AGR2 is an oncogenic endoplasmic reticulum (ER)-resident protein disulfide isomerase. AGR2 protein has a relatively unique property for a chaperone in that it can bind sequence-specifically to a specific peptide motif (TTIYY). A synthetic TTIYY-containing peptide column was used to affinity-purify AGR2 from crude lysates highlighting peptide selectivity in complex mixtures. Hydrogen-deuterium exchange mass spectrometry localized the dominant region in AGR2 that interacts with the TTIYY peptide to within a structural loop from amino acids 131-135 (VDPSL). A peptide binding site consensus of Tx[IL][YF][YF] was developed for AGR2 by measuring its activity against a mutant peptide library. Screening the human proteome for proteins harboring this motif revealed an enrichment in transmembrane proteins and we focused on validating EpCAM as a potential AGR2-interacting protein. AGR2 and EpCAM proteins formed a dose-dependent protein-protein interaction in vitro Proximity ligation assays demonstrated that endogenous AGR2 and EpCAM protein associate in cells. Introducing a single alanine mutation in EpCAM at Tyr251 attenuated its binding to AGR2 in vitro and in cells. Hydrogen-deuterium exchange mass spectrometry was used to identify a stable binding site for AGR2 on EpCAM, adjacent to the TLIYY motif and surrounding EpCAM's detergent binding site. These data define a dominant site on AGR2 that mediates its specific peptide-binding function. EpCAM forms a model client protein for AGR2 to study how an ER-resident chaperone can dock specifically to a peptide motif and regulate the trafficking a protein destined for the secretory pathway. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer.

PubMed

Quick, Joshua; Quinlan, Aaron R; Loman, Nicholas J

2014-01-01

The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods.
p53 Specifically Binds Triplex DNA In Vitro and in Cells

PubMed Central

Brázdová, Marie; Tichý, Vlastimil; Helma, Robert; Bažantová, Pavla; Polášková, Alena; Krejčí, Aneta; Petr, Marek; Navrátilová, Lucie; Tichá, Olga; Nejedlý, Karel; Bennink, Martin L.; Subramaniam, Vinod; Bábková, Zuzana; Martínek, Tomáš; Lexa, Matej; Adámik, Matej

2016-01-01

Triplex DNA is implicated in a wide range of biological activities, including regulation of gene expression and genomic instability leading to cancer. The tumor suppressor p53 is a central regulator of cell fate in response to different type of insults. Sequence and structure specific modes of DNA recognition are core attributes of the p53 protein. The focus of this work is the structure-specific binding of p53 to DNA containing triplex-forming sequences in vitro and in cells and the effect on p53-driven transcription. This is the first DNA binding study of full-length p53 and its deletion variants to both intermolecular and intramolecular T.A.T triplexes. We demonstrate that the interaction of p53 with intermolecular T.A.T triplex is comparable to the recognition of CTG-hairpin non-B DNA structure. Using deletion mutants we determined the C-terminal DNA binding domain of p53 to be crucial for triplex recognition. Furthermore, strong p53 recognition of intramolecular T.A.T triplexes (H-DNA), stabilized by negative superhelicity in plasmid DNA, was detected by competition and immunoprecipitation experiments, and visualized by AFM. Moreover, chromatin immunoprecipitation revealed p53 binding T.A.T forming sequence in vivo. Enhanced reporter transactivation by p53 on insertion of triplex forming sequence into plasmid with p53 consensus sequence was observed by luciferase reporter assays. In-silico scan of human regulatory regions for the simultaneous presence of both consensus sequence and T.A.T motifs identified a set of candidate p53 target genes and p53-dependent activation of several of them (ABCG5, ENOX1, INSR, MCC, NFAT5) was confirmed by RT-qPCR. Our results show that T.A.T triplex comprises a new class of p53 binding sites targeted by p53 in a DNA structure-dependent mode in vitro and in cells. The contribution of p53 DNA structure-dependent binding to the regulation of transcription is discussed. PMID:27907175
Mammalian genome projects reveal new growth hormone (GH) sequences. Characterization of the GH-encoding genes of armadillo (Dasypus novemcinctus), hedgehog (Erinaceus europaeus), bat (Myotis lucifugus), hyrax (Procavia capensis), shrew (Sorex araneus), ground squirrel (Spermophilus tridecemlineatus), elephant (Loxodonta africana), cat (Felis catus) and opossum (Monodelphis domestica).

PubMed

Wallis, Michael

2008-01-15

Mammalian growth hormone (GH) sequences have been shown previously to display episodic evolution: the sequence is generally strongly conserved but on at least two occasions during mammalian evolution (on lineages leading to higher primates and ruminants) bursts of rapid evolution occurred. However, the number of mammalian orders studied previously has been relatively limited, and the availability of sequence data via mammalian genome projects provides the potential for extending the range of GH gene sequences examined. Complete or nearly complete GH gene sequences for six mammalian species for which no data were previously available have been extracted from the genome databases-Dasypus novemcinctus (nine-banded armadillo), Erinaceus europaeus (western European hedgehog), Myotis lucifugus (little brown bat), Procavia capensis (cape rock hyrax), Sorex araneus (European shrew), Spermophilus tridecemlineatus (13-lined ground squirrel). In addition incomplete data for several other species have been extended. Examination of the data in detail and comparison with previously available sequences has allowed assessment of the reliability of deduced sequences. Several of the new sequences differ substantially from the consensus sequence previously determined for eutherian GHs, indicating greater variability than previously recognised, and confirming the episodic pattern of evolution. The episodic pattern is not seen for signal sequences, 5' upstream sequence or synonymous substitutions-it is specific to the mature protein sequence, suggesting that it relates to the hormonal function. The substitutions accumulated during the course of GH evolution have occurred mainly on the side of the hormone facing away from the receptor, in a non-random fashion, and it is suggested that this may reflect interaction of the receptor-bound hormone with other proteins or small ligands.
Structure of genes for Hsp30 from the white-rot fungus Coriolus versicolor and the increase of their expression by heat shock and exposure to a hazardous chemical.

PubMed

Iimura, Yosuke; Tatsumi, Kenji

2002-07-01

We isolated and analysed two genomic DNAs that encode the heat-shock protein Hsp30 from Coriolus versicolor. The amino acid sequences substitute only three amino acid substitutions. The promoter regions contain the consensus heat-shock element, a xenobiotic-response element, a stress-response element, and a metal-response element. The levels of mRNAs for Hsp30 increased markedly after exposure of C. versicolor to pentachlorophenol and levels were higher than those after heat shock.
The alpha subunit of the Saccharomyces cerevisiae oligosaccharyltransferase complex is essential for vegetative growth of yeast and is homologous to mammalian ribophorin I

PubMed Central

1995-01-01

Oligosaccharyltransferase mediates the transfer of a preassembled high mannose oligosaccharide from a lipid-linked oligosaccharide donor to consensus glycosylation acceptor sites in newly synthesized proteins in the lumen of the rough endoplasmic reticulum. The Saccharomyces cerevisiae oligosaccharyltransferase is an oligomeric complex composed of six nonidentical subunits (alpha-zeta), two of which are glycoproteins (alpha and beta). The beta and delta subunits of the oligosaccharyltransferase are encoded by the WBP1 and SWP1 genes. Here we describe the functional characterization of the OST1 gene that encodes the alpha subunit of the oligosaccharyltransferase. Protein sequence analysis revealed a significant sequence identity between the Saccharomyces cerevisiae Ost1 protein and ribophorin I, a previously identified subunit of the mammalian oligosaccharyltransferase. A disruption of the OST1 locus was not tolerated in haploid yeast showing that expression of the Ost1 protein is essential for vegetative growth of yeast. An analysis of a series of conditional ost1 mutants demonstrated that defects in the Ost1 protein cause pleiotropic underglycosylation of soluble and membrane-bound glycoproteins at both the permissive and restrictive growth temperatures. Microsomal membranes isolated from ost1 mutant yeast showed marked reductions in the in vitro transfer of high mannose oligosaccharide from exogenous lipid-linked oligosaccharide to a glycosylation site acceptor tripeptide. Microsomal membranes isolated from the ost1 mutants contained elevated amounts of the Kar2 stress-response protein. PMID:7860628
Gene structure and functional characterization of growth hormone in dogfish, Squalus acanthias.

PubMed

Moriyama, Shunsuke; Oda, Mayumi; Yamazaki, Tomohide; Yamaguchi, Kiyoko; Amiya, Noriko; Takahashi, Akiyoshi; Amano, Masafumi; Goto, Tomoaki; Nozaki, Masumi; Meguro, Hiroshi; Kawauchi, Hiroshi

2008-06-01

Dogfish (Squalus acanthias) growth hormone (GH) was identified by cDNA cloning and protein purification from the pituitary gland. Dogfish GH cDNA encoded a prehormone of 210 amino acids (aa). Sequence analysis of purified GH revealed that the prehormone is composed of a signal peptide of 27 aa and a mature protein of 183 aa. Dogfish GH showed 94% sequence identity with blue shark GH, and also showed 37-66%, 26%, and 48-67% sequence identity with GH from osteichtyes, an agnathan, and tetrapods. The site of production was identified through immunocytochemistry to be cells of the proximal pars distalis of the pituitary gland. Dogfish GH stimulates both insulin-like growth factor-I and II mRNA levels in dogfish liver in vitro. The dogfish GH gene consisted of five exons and four introns, the same as in lamprey, teleosts such as cypriniforms and siluriforms, and tetrapods. The 5'-flanking region within 1082 bp of the transcription start site contained consensus sequences for the TATA box, Pit-1/GHF-1, CRE, TRE, and ERE. These results show that the endocrine mechanism for growth stimulation by the GH-IGF axis was established at an early stage of vertebrate evolution, and that the 5-exon-type gene organization might reflect the structure of the ancestral gene for the GH gene family.
The nucleoid protein Dps binds genomic DNA of Escherichia coli in a non-random manner

PubMed Central

Kondrashov, F. A.; Toshchakov, S. V.; Dominova, I.; Shvyreva, U. S.; Vrublevskaya, V. V.; Morenkov, O. S.; Panyukov, V. V.

2017-01-01

Dps is a multifunctional homododecameric protein that oxidizes Fe2+ ions accumulating them in the form of Fe2O3 within its protein cavity, interacts with DNA tightly condensing bacterial nucleoid upon starvation and performs some other functions. During the last two decades from discovery of this protein, its ferroxidase activity became rather well studied, but the mechanism of Dps interaction with DNA still remains enigmatic. The crucial role of lysine residues in the unstructured N-terminal tails led to the conventional point of view that Dps binds DNA without sequence or structural specificity. However, deletion of dps changed the profile of proteins in starved cells, SELEX screen revealed genomic regions preferentially bound in vitro and certain affinity of Dps for artificial branched molecules was detected by atomic force microscopy. Here we report a non-random distribution of Dps binding sites across the bacterial chromosome in exponentially growing cells and show their enrichment with inverted repeats prone to form secondary structures. We found that the Dps-bound regions overlap with sites occupied by other nucleoid proteins, and contain overrepresented motifs typical for their consensus sequences. Of the two types of genomic domains with extensive protein occupancy, which can be highly expressed or transcriptionally silent only those that are enriched with RNA polymerase molecules were preferentially occupied by Dps. In the dps-null mutant we, therefore, observed a differentially altered expression of several targeted genes and found suppressed transcription from the dps promoter. In most cases this can be explained by the relieved interference with Dps for nucleoid proteins exploiting sequence-specific modes of DNA binding. Thus, protecting bacterial cells from different stresses during exponential growth, Dps can modulate transcriptional integrity of the bacterial chromosome hampering RNA biosynthesis from some genes via competition with RNA polymerase or, vice versa, competing with inhibitors to activate transcription. PMID:28800583
Structure of genes and an insertion element in the methane producing archaebacterium Methanobrevibacter smithii.

PubMed

Hamilton, P T; Reeve, J N

1985-01-01

DNA fragments cloned from the methanogenic archaebacterium Methanobrevibacter smithii which complement mutations in the purE and proC genes of E. coli have been sequenced. Sequence analyses, transposon mutagenesis and expression in E. coli minicells indicate that purE and proC complementations result from the synthesis of M. smithii polypeptides with molecular weights of 36,697 and 27,836 respectively. The encoding genes appear to be located in operons. The M. smithii genome contains 69% A/T basepairs (bp) which is reflected in unusual codon usages and intergenic regions containing approximately 85% A/T bp. An insertion element, designated ISM1, was found within the cloned M. smithii DNA located adjacent to the proC complementing region. ISM1 is 1381 bp in length, has 29 bp terminal inverted repeat sequences and contains one major ORF encoded in 87% of the ISM1 sequence. ISM1 is mobile, present in approximately 10 copies per genome and integration duplicates 8 bp at the site of insertion. The duplicated sequences show homology with sequences within the 29 bp terminal repeat sequence of ISM1. Comparison of our data with sequences from halophilic archaebacteria suggests that 5'GAANTTTCA and 5'TTTTAATATAAA may be consensus promoter sequences for archaebacteria. These sequences closely resemble the consensus sequences which precede Drosophila heat-shock genes (Pelham 1982; Davidson et al. 1983). Methanogens appear to employ the eubacterial system of mRNA: 16SrRNA hybridization to ensure initiation of translation; the consensus ribosome binding sequence is 5'AGGTGA.
Comparison of ZP3 protein sequences among vertebrate species: to obtain a consensus sequence for immunocontraception.

PubMed

Zhu, X; Naz, R K

1999-03-01

The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.

Regulation of amyloid precursor protein processing by its KFERQ motif.

PubMed

Park, Ji-Seon; Kim, Dong-Hou; Yoon, Seung-Yong

2016-06-01

Understanding of trafficking, processing, and degradation mechanisms of amyloid precursor protein (APP) is important because APP can be processed to produce β-amyloid (Aβ), a key pathogenic molecule in Alzheimer's disease (AD). Here, we found that APP contains KFERQ motif at its C-terminus, a consensus sequence for chaperone-mediated autophagy (CMA) or microautophagy which are another types of autophagy for degradation of pathogenic molecules in neurodegenerative diseases. Deletion of KFERQ in APP increased C-terminal fragments (CTFs) and secreted N-terminal fragments of APP and kept it away from lysosomes. KFERQ deletion did not abolish the interaction of APP or its cleaved products with heat shock cognate protein 70 (Hsc70), a protein necessary for CMA or microautophagy. These findings suggest that KFERQ motif is important for normal processing and degradation of APP to preclude the accumulation of APP-CTFs although it may not be important for CMA or microautophagy. [BMB Reports 2016; 49(6): 337-342].
Inducing β-Sheets Formation in Synthetic Spider Silk Fibers by Aqueous Post-Spin Stretching

PubMed Central

Hinman, Michael B.; Holland, Gregory P.; Yarger, Jeffery L.; Lewis, Randolph V.

2012-01-01

As a promising biomaterial with numerous potential applications, various types of synthetic spider silk fibers have been produced and studied in an effort to produce manmade fibers with mechanical and physical properties comparable to those of native spider silk. In this study, two recombinant proteins based on Nephila clavipes Major ampullate Spidroin 1 (MaSp1) consensus repeat sequence were expressed and spun into fibers. Mechanical test results showed that fiber spun from the higher molecular weight protein had better overall mechanical properties (70 KD versus 46 KD), whereas postspin stretch treatment in water helped increase fiber tensile strength significantly. Carbon-13 solid-state NMR studies of those fibers further revealed that the postspin stretch in water promoted protein molecule rearrangement and the formation of β-sheets in the polyalanine region of the silk. The rearrangement correlated with improved fiber mechanical properties and indicated that postspin stretch is key to helping the spider silk proteins in the fiber form correct secondary structures, leading to better quality fibers. PMID:21574576
The novel extremely psychrophilic luciferase from Metridia longa: Properties of a high-purity protein produced in insect cells.

PubMed

Larionova, Marina D; Markova, Svetlana V; Vysotski, Eugene S

2017-01-29

The bright bioluminescence of copepod Metridia longa is conditioned by a small secreted coelenterazine-dependent luciferase (MLuc). To date, three isoforms of MLuc differing in length, sequences, and some properties were cloned and successfully applied as high sensitive bioluminescent reporters. In this work, we report cloning of a novel group of genes from M. longa encoding extremely psychrophilic isoforms of MLuc (MLuc2-type). The novel isoforms share only ∼54-64% of protein sequence identity with the previously cloned isoforms and, consequently, are the product of a separate group of paralogous genes. The MLuc2 isoform with consensus sequence was produced as a natively folded protein using baculovirus/insect cell expression system, purified, and characterized. The MLuc2 displays a very high bioluminescent activity and high thermostability similar to those of the previously characterized M. longa luciferase isoform MLuc7. However, in contrast to MLuc7 revealing the highest activity at 12-17 °C and 0.5 M NaCl, the bioluminescence optima of MLuc2 isoforms are at ∼5 °C and 1 M NaCl. The MLuc2 adaptation to cold is also accompanied by decrease of melting temperature and affinity to substrate suggesting a more conformational flexibility of a protein structure. The luciferase isoforms with different temperature optima may provide adaptability of the M. longa bioluminescence to the changes of water temperature during diurnal vertical migrations. Copyright © 2016 Elsevier Inc. All rights reserved.
Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

PubMed

Adhikari, Badri; Hou, Jie; Cheng, Jianlin

2018-03-01

In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
Biological assay using T cell response for Cry-consensus peptide designed for the peptide-based immunotherapy of Japanese cedar pollinosis.

PubMed

Kozutsumi, Daisuke; Tsunematsu, Masako; Yamaji, Taketo; Kino, Kohsuke

2007-01-01

Cry-consensus peptide is a linearly linked peptide of T-cell epitopes for the management of Japanese cedar (JC) pollinosis and is expected to become a new drug for immunotherapy. However, the mechanism of T-cell epitopes in allergic diseases is not well understood, and thus, a simple in vitro procedure for evaluation of its biological activity is desired. Peripheral blood mononuclear cells (PBMC) were isolated from 27 JC pollinosis patients and 10 healthy subjects, and cultured in vitro for 4 days in the presence of Cry-consensus peptide and (3)H-thymidine. The relationship between growth stimulation (stimulation index; SI) and antigen-specific IgE levels in serum was also investigated in JC pollinosis patients. Moreover, to confirm the importance of the primary sequence in Cry-consensus peptide, heat-treated Cry-consensus peptide and a mixture of the amino acids of which Cry-consensus peptide is composed, and their (3)H-thymidine uptake was compared with Cry-consensus peptide. Finally, whether Cry-consensus peptide stimulates PBMCs from healthy subjects was investigated. The mean SI of JC patients showed a good correlation with Cry-consensus peptide concentration in the culture medium; however, the SI was independent of the anti-Cry j 1 IgE level. Heat-denatured Cry-consensus peptide retained a PBMC proliferation stimulatory effect comparable to the original Cry-consensus peptide, while the mixture of amino acids constituting Cry-consensus peptide did not stimulate PBMC proliferation. PBMCs from healthy subjects did not respond to Cry-consensus peptide at all. These data indicate that the PBMC response of patients suffering from JC pollinosis to Cry-consensus peptide is specific for the sequence of T cell epitopes thereof and may be useful for the evaluation of the efficacy of Cry-consensus peptide in vivo.
Structural basis for lack of ADP-ribosyltransferase activity in poly(ADP-ribose) polymerase-13/zinc finger antiviral protein.

PubMed

Karlberg, Tobias; Klepsch, Mirjam; Thorsell, Ann-Gerd; Andersson, C David; Linusson, Anna; Schüler, Herwig

2015-03-20

The mammalian poly(ADP-ribose) polymerase (PARP) family includes ADP-ribosyltransferases with diphtheria toxin homology (ARTD). Most members have mono-ADP-ribosyltransferase activity. PARP13/ARTD13, also called zinc finger antiviral protein, has roles in viral immunity and microRNA-mediated stress responses. PARP13 features a divergent PARP homology domain missing a PARP consensus sequence motif; the domain has enigmatic functions and apparently lacks catalytic activity. We used x-ray crystallography, molecular dynamics simulations, and biochemical analyses to investigate the structural requirements for ADP-ribosyltransferase activity in human PARP13 and two of its functional partners in stress granules: PARP12/ARTD12, and PARP15/BAL3/ARTD7. The crystal structure of the PARP homology domain of PARP13 shows obstruction of the canonical active site, precluding NAD(+) binding. Molecular dynamics simulations indicate that this closed cleft conformation is maintained in solution. Introducing consensus side chains in PARP13 did not result in 3-aminobenzamide binding, but in further closure of the site. Three-dimensional alignment of the PARP homology domains of PARP13, PARP12, and PARP15 illustrates placement of PARP13 residues that deviate from the PARP family consensus. Introducing either one of two of these side chains into the corresponding positions in PARP15 abolished PARP15 ADP-ribosyltransferase activity. Taken together, our results show that PARP13 lacks the structural requirements for ADP-ribosyltransferase activity. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gacias, Mar; Perez-Marti, Albert; Pujol-Vidal, Magdalena

Highlights: Black-Right-Pointing-Pointer The Cact gene is induced in mouse skeletal muscle after 24 h of fasting. Black-Right-Pointing-Pointer The Cact gene contains a functional consensus sequence for ERR. Black-Right-Pointing-Pointer This sequence binds ERR{alpha} both in vivo and in vitro. Black-Right-Pointing-Pointer This ERRE is required for the activation of Cact expression by the PGC-1/ERR axis. Black-Right-Pointing-Pointer Our results add Cact as a genuine gene target of these transcriptional regulators. -- Abstract: Carnitine/acylcarnitine translocase (CACT) is a mitochondrial-membrane carrier proteins that mediates the transport of acylcarnitines into the mitochondrial matrix for their oxidation by the mitochondrial fatty acid-oxidation pathway. CACT deficiency causes amore » variety of pathological conditions, such as hypoketotic hypoglycemia, cardiac arrest, hepatomegaly, hepatic dysfunction and muscle weakness, and it can be fatal in newborns and infants. Here we report that expression of the Cact gene is induced in mouse skeletal muscle after 24 h of fasting. To gain insight into the control of Cact gene expression, we examine the transcriptional regulation of the mouse Cact gene. We show that the 5 Prime -flanking region of this gene is transcriptionally active and contains a consensus sequence for the estrogen-related receptor (ERR), a member of the nuclear receptor family of transcription factors. This sequence binds ERR{alpha}in vivo and in vitro and is required for the activation of Cact expression by the peroxisome proliferator-activated receptor gamma coactivator (PGC)-1/ERR axis. We also demonstrate that XTC790, the inverse agonist of ERR{alpha}, specifically blocks Cact activation by PGC-1{beta} in C2C12 cells.« less
The Use of Protein-DNA, Chromatin Immunoprecipitation, and Transcriptome Arrays to Describe Transcriptional Circuits in the Dehydrated Male Rat Hypothalamus

PubMed Central

Qiu, Jing; Kleineidam, Anna; Gouraud, Sabine; Yao, Song Tieng; Greenwood, Mingkwan; Hoe, See Ziau; Hindmarch, Charles

2014-01-01

The supraoptic nucleus (SON) of the hypothalamus is responsible for maintaining osmotic stability in mammals through its elaboration of the antidiuretic hormone arginine vasopressin. Upon dehydration, the SON undergoes a function-related plasticity, which includes remodeling of morphology, electrical properties, and biosynthetic activity. This process occurs alongside alterations in steady state transcript levels, which might be mediated by changes in the activity of transcription factors. In order to identify which transcription factors might be involved in changing patterns of gene expression, an Affymetrix protein-DNA array analysis was carried out. Nuclear extracts of SON from dehydrated and control male rats were analyzed for binding to the 345 consensus DNA transcription factor binding sequences of the array. Statistical analysis revealed significant changes in binding to 26 consensus elements, of which EMSA confirmed increased binding to signal transducer and activator of transcription (Stat) 1/Stat3, cellular Myelocytomatosis virus-like cellular proto-oncogene (c-Myc)-Myc-associated factor X (Max), and pre-B cell leukemia transcription factor 1 sequences after dehydration. Focusing on c-Myc and Max, we used quantitative PCR to confirm previous transcriptomic analysis that had suggested an increase in c-Myc, but not Max, mRNA levels in the SON after dehydration, and we demonstrated c-Myc- and Max-like immunoreactivities in SON arginine vasopressin-expressing cells. Finally, by comparing new data obtained from Roche-NimbleGen chromatin immunoprecipitation arrays with previously published transcriptomic data, we have identified putative c-Myc target genes whose expression changes in the SON after dehydration. These include known c-Myc targets, such as the Slc7a5 gene, which encodes the L-type amino acid transporter 1, ribosomal protein L24, histone deactylase 2, and the Rat sarcoma proto-oncogene (Ras)-related nuclear GTPase. PMID:25144923
The human RNA-binding protein and E3 ligase MEX-3C binds the MEX-3-recognition element (MRE) motif with high affinity.

PubMed

Yang, Lingna; Wang, Chongyuan; Li, Fudong; Zhang, Jiahai; Nayab, Anam; Wu, Jihui; Shi, Yunyu; Gong, Qingguo

2017-09-29

MEX-3 is a K-homology (KH) domain-containing RNA-binding protein first identified as a translational repressor in Caenorhabditis elegans , and its four orthologs (MEX-3A-D) in human and mouse were subsequently found to have E3 ubiquitin ligase activity mediated by a RING domain and critical for RNA degradation. Current evidence implicates human MEX-3C in many essential biological processes and suggests a strong connection with immune diseases and carcinogenesis. The highly conserved dual KH domains in MEX-3 proteins enable RNA binding and are essential for the recognition of the 3'-UTR and post-transcriptional regulation of MEX-3 target transcripts. However, the molecular mechanisms of translational repression and the consensus RNA sequence recognized by the MEX-3C KH domain are unknown. Here, using X-ray crystallography and isothermal titration calorimetry, we investigated the RNA-binding activity and selectivity of human MEX-3C dual KH domains. Our high-resolution crystal structures of individual KH domains complexed with a noncanonical U-rich and a GA-rich RNA sequence revealed that the KH1/2 domains of human MEX-3C bound MRE10, a 10-mer RNA (5'-CAGAGUUUAG-3') consisting of an eight-nucleotide MEX-3-recognition element (MRE) motif, with high affinity. Of note, we also identified a consensus RNA motif recognized by human MEX-3C. The potential RNA-binding sites in the 3'-UTR of the human leukocyte antigen serotype ( HLA-A2 ) mRNA were mapped with this RNA-binding motif and further confirmed by fluorescence polarization. The binding motif identified here will provide valuable information for future investigations of the functional pathways controlled by human MEX-3C and for predicting potential mRNAs regulated by this enzyme. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Catabolite repression in Lactobacillus casei ATCC 393 is mediated by CcpA.

PubMed Central

Monedero, V; Gosalbes, M J; Pérez-Martínez, G

1997-01-01

The chromosomal ccpA gene from Lactobacillus casei ATCC 393 has been cloned and sequenced. It encodes the CcpA protein, a central catabolite regulator belonging to the LacI-GalR family of bacterial repressors, and shows 54% identity with CcpA proteins from Bacillus subtilis and Bacillus megaterium. The L. casei ccpA gene was able to complement a B. subtilis ccpA mutant. An L. casei ccpA mutant showed increased doubling times and a relief of the catabolite repression of some enzymatic activities, such as N-acetylglucosaminidase and phospho-beta-galactosidase. Detailed analysis of CcpA activity was performed by using the promoter region of the L. casei chromosomal lacTEGF operon which is subject to catabolite repression and contains a catabolite responsive element (cre) consensus sequence. Deletion of this cre site or the presence of the ccpA mutation abolished the catabolite repression of a lacp::gusA fusion. These data support the role of CcpA as a common regulatory element mediating catabolite repression in low-GC-content gram-positive bacteria. PMID:9352913
Sel1-like repeat proteins in signal transduction.

PubMed

Mittl, Peer R E; Schneider-Brachert, Wulf

2007-01-01

Solenoid proteins, which are distinguished from general globular proteins by their modular architectures, are frequently involved in signal transduction pathways. Proteins from the tetratricopeptide repeat (TPR) and Sel1-like repeat (SLR) families share similar alpha-helical conformations but different consensus sequence lengths and superhelical topologies. Both families are characterized by low sequence similarity levels, rendering the identification of functional homologous difficult. Therefore current knowledge of the molecular and cellular functions of the SLR proteins Sel1, Hrd3, Chs4, Nif1, PodJ, ExoR, AlgK, HcpA, Hsp12, EnhC, LpnE, MotX, and MerG has been reviewed. Although SLR proteins possess different cellular functions they all seem to serve as adaptor proteins for the assembly of macromolecular complexes. Sel1, Hrd3, Hsp12 and LpnE are activated under cellular stress. The eukaryotic Sel1 and Hrd3 proteins are involved in the ER-associated protein degradation, whereas the bacterial LpnE, EnhC, HcpA, ExoR, and AlgK proteins mediate the interactions between bacterial and eukaryotic host cells. LpnE and EnhC are responsible for the entry of L. pneumophila into epithelial cells and macrophages. ExoR from the symbiotic microorganism S. melioti and AlgK from the pathogen P. aeruginosa regulate exopolysaccaride synthesis. Nif1 and Chs4 from yeast are responsible for the regulation of mitosis and septum formation during cell division, respectively, and PodJ guides the cellular differentiation during the cell cycle of the bacterium C. crescentus. Taken together the SLR motif establishes a link between signal transduction pathways from eukaryotes and bacteria. The SLR motif is so far absent from archaea. Therefore the SLR could have developed in the last common ancestor between eukaryotes and bacteria.
R2R--software to speed the depiction of aesthetic consensus RNA secondary structures.

PubMed

Weinberg, Zasha; Breaker, Ronald R

2011-01-04

With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or interactive programs specific to RNA. Unfortunately, the use of applications like Illustrator is extremely time consuming, while existing RNA-specific programs produce figures that are useful, but usually not of the same aesthetic quality as those produced at great cost in Illustrator. Additionally, most existing RNA-specific applications are designed for drawing single RNA molecules, not consensus diagrams. We created R2R, a computer program that facilitates the generation of aesthetic and readable drawings of RNA consensus diagrams in a fraction of the time required with general-purpose drawing programs. Since the inference of a consensus RNA structure typically requires a multiple-sequence alignment, the R2R user annotates the alignment with commands directing the layout and annotation of the RNA. R2R creates SVG or PDF output that can be imported into Adobe Illustrator, Inkscape or CorelDRAW. R2R can be used to create consensus sequence and secondary structure models for novel RNA structures or to revise models when new representatives for known RNA classes become available. Although R2R does not currently have a graphical user interface, it has proven useful in our efforts to create 100 schematic models of distinct noncoding RNA classes. R2R makes it possible to obtain high-quality drawings of the consensus sequence and structural models of many diverse RNA structures with a more practical amount of effort. R2R software is available at http://breaker.research.yale.edu/R2R and as an Additional file.
In vivo binding of PRDM9 reveals interactions with noncanonical genomic sites

PubMed Central

Grey, Corinne; Clément, Julie A.J.; Buard, Jérôme; Leblanc, Benjamin; Gut, Ivo; Gut, Marta; Duret, Laurent

2017-01-01

In mouse and human meiosis, DNA double-strand breaks (DSBs) initiate homologous recombination and occur at specific sites called hotspots. The localization of these sites is determined by the sequence-specific DNA binding domain of the PRDM9 histone methyl transferase. Here, we performed an extensive analysis of PRDM9 binding in mouse spermatocytes. Unexpectedly, we identified a noncanonical recruitment of PRDM9 to sites that lack recombination activity and the PRDM9 binding consensus motif. These sites include gene promoters, where PRDM9 is recruited in a DSB-dependent manner. Another subset reveals DSB-independent interactions between PRDM9 and genomic sites, such as the binding sites for the insulator protein CTCF. We propose that these DSB-independent sites result from interactions between hotspot-bound PRDM9 and genomic sequences located on the chromosome axis. PMID:28336543
The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

PubMed

Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

2017-10-01

We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
A retrotransposable element from the mosquito Anopheles gambiae .

PubMed Central

Besansky, N J

1990-01-01

A family of middle repetitive elements from the African malaria vector Anopheles gambiae is described. Approximately 100 copies of the element, designated T1Ag, are dispersed in the genome. Full-length elements are 4.6 kilobase pairs in length, but truncation of the 5' end is common. Nucleotide sequences of one full-length, two 5'-truncated, and two 5' ends of T1Ag elements were determined and aligned to define a consensus sequence. Sequence analysis revealed two long, overlapping open reading frames followed by a polyadenylation signal, AATAAA, and a tail consisting of tandem repetitions of the motif TGAAA. No direct or inverted long terminal repeats (LTRs) were detected. The first open reading frame, 442 amino acids in length, includes a domain resembling that of nucleic acid-binding proteins. The second open reading frame, 975 amino acids long, resembles the reverse transcriptases of a category of retrotransposable elements without LTRs, variously termed class II retrotransposons, class III elements or non-LTR retrotransposons. Similarity at the sequence and structural levels places T1Ag in this category. Images PMID:1689457
Low molecular weight squash trypsin inhibitors from Sechium edule seeds.

PubMed

Laure, Hélen J; Faça, Vítor M; Izumi, Clarice; Padovan, Júlio C; Greene, Lewis J

2006-02-01

Nine chromatographic components containing trypsin inhibitor activity were isolated from Sechium edule seeds by acetone fractionation, gel filtration, affinity chromatography and RP-HPLC in an overall yield of 46% of activity and 0.05% of protein. The components obtained with highest yield of total activity and highest specific activity were sequenced by Edman degradation and their molecular masses determined by mass spectrometry. The inhibitors contained 31, 32 and 27 residues per molecule and their sequences were: SETI-IIa, EDRKCPKILMRCKRDSDCLAKCTCQESGYCG; SETI-IIb, EEDRKCPKILMRCKRDSDCLAKCTCQESGYCG and SETI-V, CPRILMKCKLDTDCFPTCTCRPSGFCG. SETI-IIa and SETI-IIb, which differed by an amino-terminal E in the IIb form, were not separable under the conditions employed. The sequences are consistent with consensus sequences obtained from 37 other inhibitors: CPriI1meCk_DSDCla_C_C_G_CG, where capital letters are invariant amino acid residues and lower case letters are the most preserved in this position. SETI-II and SETI-V form complexes with trypsin with a 1:1 stoichiometry and have dissociation constants of 5.4x10(-11)M and 1.1x10(-9)M, respectively.
Analysis by mutagenesis of the ATP binding site of the gamma subunit of skeletal muscle phosphorylase kinase expressed using a baculovirus system.

PubMed

Lee, J H; Maeda, S; Angelos, K L; Kamita, S G; Ramachandran, C; Walsh, D A

1992-11-03

Active gamma subunit of skeletal muscle phosphorylase kinase has been obtained by expression of the rat soleus cDNA in a baculovirus system. The protein exhibited the expected pH 6.8/8.2 activity ratio of 0.6, and its activity was insensitive to Ca2+ addition, indicating that it was free gamma subunit and not a gamma subunit-calmodulin complex. It was stimulated approximately 2-fold by Ca(2+)-calmodulin addition, demonstrating that it had retained high-affinity calmodulin binding. By site-directed mutagenesis, we have examined the role of six of the amino acids that constitute the consensus ATP binding site of the protein kinase, which in the gamma subunit is represented by the sequence 26Gly.Arg.Gly.Val.Ser.Ser.Val.Val33. Changes were evaluated by the kinetic determination of the dissociation constants of gamma-ATP, gamma-ADP, gamma-AMP.PCP, and gamma-phosphorylase and the maximum catalytic activity. The mutants Ser26-gamma, Ser29-gamma, Phe30-gamma, and Gly31-gamma each exhibited an essentially identical dissociation constant for gamma subunit phosphorylase, indicating that these mutations had not caused a global alteration in the protein structure but were limited to changes in the nucleotide binding site domain. Substitution of either Val33 (by Gly) or Gly28 (by Ser), two of the most conserved residues in all protein kinases, resulted in enzyme with marginally detectable activity. In noted contrast, the Ser26 mutant, which substituted the first glycine of the consensus glycine trio motif, and which is also very highly conserved, retained at least 25% of the enzymatic activity. The Gly31 substitution, which restored a glycine to a position characteristic for most protein kinases, had little overall effect upon the maximum rate of catalysis. Restoration of Ser30 to the more typical phenylalanine, which is present in most protein kinases, had minimal effect on catalysis. These data provide the first direct evaluation of the roles that different residues play within this consensus glycine trio/valine motif of the protein kinases, which up to now have only been surmised to be of importance because of their conservation. Two unexpected findings are that for one residue that is very conserved (Gly26) there is some flexibility of substitution not apparent from the evolutionary conservation and that a second quite conserved residue in protein kinases (equivalent to Gly at position 31) does not produce a protein optimized for nucleotide binding.
Characterization of a species-specific repetitive DNA from a highly endangered wild animal, Rhinoceros unicornis, and assessment of genetic polymorphism by microsatellite associated sequence amplification (MASA).

PubMed

Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S

1999-03-04

We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.
(S)-3-hydroxy-3-methylglutaryl coenzyme A reductase, a product of the mva operon of Pseudomonas mevalonii, is regulated at the transcriptional level.

PubMed Central

Wang, Y L; Beach, M J; Rodwell, V W

1989-01-01

We have cloned and sequenced a 505-base-pair (bp) segment of DNA situated upstream of mvaA, the structural gene for (S)-3-hydroxy-3-methylglutaryl coenzyme A reductase (EC 1.1.1.88) of Pseudomonas mevalonii. The DNA segment that we characterized includes the promoter region for the mva operon. Nuclease S1 mapping and primer extension analysis showed that mvaA is the promoter-proximal gene of the mva operon. Transcription initiates at -56 bp relative to the first A (+1) of the translation start site. Transcription in vivo was induced by mevalonate. Structural features of the mva promoter region include an 80-bp A + T-rich region, and -12, -24 consensus sequences that resemble sequences of sigma 54 promoters in enteric organisms. The relative amplitudes of catalytic activity, enzyme protein, and mvaA mRNA are consistent with a model of regulation of this operon at the transcriptional level. Images PMID:2477360
Direct repeat sequences are essential for function of the cis-acting locus of transfer (clt) of Streptomyces phaeochromogenes plasmid pJV1.

PubMed

Franco, Bernardo; González-Cerón, Gabriela; Servín-González, Luis

2003-11-01

The functionality of direct and inverted repeat sequences inside the cis acting locus of transfer (clt) of the Streptomyces plasmid pJV1 was determined by testing the effect of different deletions on plasmid transfer. The results show that the single most important element for pJV1 clt function is a series of evenly spaced 9 bp long direct repeats which match the consensus CCGCACA(C/G)(C/G), since their deletion caused a dramatic reduction in plasmid transfer. The presence of these repeats in the absence of any other clt sequences allowed plasmid transfer to occur at a frequency that was at least two orders of magnitude higher than that obtained in the complete absence of clt. A database search revealed regions with a similar organization, and in the same position, in Streptomyces plasmids pSN22 and pSLS, which have transfer proteins homologous to those of pJV1.

A novel atypical hemolytic uremic syndrome-associated hybrid CFHR1/CFH gene encoding a fusion protein that antagonizes factor H-dependent complement regulation.

PubMed

Valoti, Elisabetta; Alberti, Marta; Tortajada, Agustin; Garcia-Fernandez, Jesus; Gastoldi, Sara; Besso, Luca; Bresin, Elena; Remuzzi, Giuseppe; Rodriguez de Cordoba, Santiago; Noris, Marina

2015-01-01

Genomic aberrations affecting the genes encoding factor H (FH) and the five FH-related proteins (FHRs) have been described in patients with atypical hemolytic uremic syndrome (aHUS), a rare condition characterized by microangiopathic hemolytic anemia, thrombocytopenia, and ARF. These genomic rearrangements occur through nonallelic homologous recombinations caused by the presence of repeated homologous sequences in CFH and CFHR1-R5 genes. In this study, we found heterozygous genomic rearrangements among CFH and CFHR genes in 4.5% of patients with aHUS. CFH/CFHR rearrangements were associated with poor clinical prognosis and high risk of post-transplant recurrence. Five patients carried known CFH/CFHR1 genes, but we found a duplication leading to a novel CFHR1/CFH hybrid gene in a family with two affected subjects. The resulting fusion protein contains the first four short consensus repeats of FHR1 and the terminal short consensus repeat 20 of FH. In an FH-dependent hemolysis assay, we showed that the hybrid protein causes sheep erythrocyte lysis. Functional analysis of the FHR1 fraction purified from serum of heterozygous carriers of the CFHR1/CFH hybrid gene indicated that the FHR1/FH hybrid protein acts as a competitive antagonist of FH. Furthermore, sera from carriers of the hybrid CFHR1/CFH gene induced more C5b-9 deposition on endothelial cells than control serum. These results suggest that this novel genomic hybrid mediates disease pathogenesis through dysregulation of complement at the endothelial cell surface. We recommend that genetic screening of aHUS includes analysis of CFH and CFHR rearrangements, particularly before a kidney transplant. Copyright © 2015 by the American Society of Nephrology.
Isolation and characterization of the gene coding for Escherichia coli arginyl-tRNA synthetase.

PubMed Central

Eriani, G; Dirheimer, G; Gangloff, J

1989-01-01

The gene coding for Escherichia coli arginyl-tRNA synthetase (argS) was isolated as a fragment of 2.4 kb after analysis and subcloning of recombinant plasmids from the Clarke and Carbon library. The clone bearing the gene overproduces arginyl-tRNA synthetase by a factor 100. This means that the enzyme represents more than 20% of the cellular total protein content. Sequencing revealed that the fragment contains a unique open reading frame of 1734 bp flanked at its 5' and 3' ends respectively by 247 bp and 397 bp. The length of the corresponding protein (577 aa) is well consistent with earlier Mr determination (about 70 kd). Primer extension analysis of the ArgRS mRNA by reverse transcriptase, located its 5' end respectively at 8 and 30 nucleotides downstream of a TATA and a TTGAC like element (CTGAC) and 60 nucleotides upstream of the unusual translation initiation codon GUG; nuclease S1 analysis located the 3'-end at 48 bp downstream of the translation termination codon. argS has a codon usage pattern typical for highly expressed E. coli genes. With the exception of the presence of a HVGH sequence similar to the HIGH consensus element, ArgRS has no relevant sequence homologies with other aminoacyl-tRNA synthetases. Images PMID:2668891
Cloning of cellobiose phosphoenolpyruvate-dependent phosphotransferase genes: Functional expression in recombinant Escherichia coli and identification of a putative binding region for disaccharides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lai, Xiaokuang; Davis, F.C.; Ingram, L.O.

1997-02-01

Genomic libraries from nine cellobiose-metabolizing bacteria were screened for cellobiose utilization. Positive clones were recovered from six libraries, all of which encode phosphoenolpyruvate:carbohydrate phosphotransferase system (PTS) proteins. Clones from Bacillus subtilis, Butyrivibrio fibrisolvens, and Klebsiella oxytoca allowed the growth of recombinant Escherichia coli in cellobiose-M9 minimal medium. The K. oxytoca clone, pLOI1906, exhibited an unusually broad substrate range (cellobiose, arbutin, salicin, and methylumbelliferyl derivatives of glucose, cellobiose, mannose, and xylose) and was sequenced. The insert in this plasmid encoded the carboxy-terminal region of a putative regulatory protein, cellobiose permease (single polypeptide), and phospho-{beta}-glucosidase, which appear to form an operon (casRAB).more » Subclones allowed both casA and casB to be expressed independently, as evidenced by in vitro complementation. An analysis of the translated sequences from the EIIC domains of cellobiose, aryl-{beta}-glucoside, and other disaccharide permeases allowed the identification of a 50-amino-acid conserved region. A disaccharide consensus sequence is proposed for the most conserved segment (13 amino acids), which may represent part of the EIIC active site for binding and phosphorylation. 63 refs., 4 figs., 4 tabs.« less
Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

PubMed Central

2014-01-01

Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494
Novel Interactions of the TRTK12 Peptide with S100 Protein Family Members: Specificity and Thermodynamic Characterization

PubMed Central

Wafer, Lucas N.; Tzul, Franco O.; Pandharipande, Pranav P.; Makhatadze, George I.

2013-01-01

The S100 protein family consists of small, dimeric proteins that exert their biological functions in response to changing calcium concentrations. S100B is the best studied member and has been shown to interact with over 20 binding partners in a calcium-dependent manner. The TRTK12 peptide, derived from the consensus binding sequence for S100B, has previously been found to interact with S100A1 and has been proposed to be a general binding partner of the S100 family. To test this hypothesis and gain a better understanding of the specificity of binding for the S100 proteins sixteen members of the human S100 family were screened against this peptide and its alanine variants. Novel interactions were only found with two family members: S100P and S100A2, indicating that TRTK12 selectively interacts with a small subset of the S100 proteins. Substantial promiscuity was observed in the binding site of S100B to accommodate variations in the peptide sequence, while S100A1, S100A2, and S100P exhibited larger differences in the binding constants for the TRTK12 alanine variants. This suggests that single-point substitutions can be used to selectively modulate the affinity of TRTK12 peptides for individual S100 proteins. This study has important implications for the rational drug design of inhibitors for the S100 proteins, which are involved in a variety of cancers and neurodegenerative diseases. PMID:23899389
Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus.

PubMed Central

Laprevotte, I; Hampe, A; Sherr, C J; Galibert, F

1984-01-01

The nucleotide sequence of the gag gene of feline leukemia virus and its flanking sequences were determined and compared with the corresponding sequences of two strains of feline sarcoma virus and with that of the Moloney strain of murine leukemia virus. A high degree of nucleotide sequence homology between the feline leukemia virus and murine leukemia virus gag genes was observed, suggesting that retroviruses of domestic cats and laboratory mice have a common, proximal evolutionary progenitor. The predicted structure of the complete feline leukemia virus gag gene precursor suggests that the translation of nonglycosylated and glycosylated gag gene polypeptides is initiated at two different AUG codons. These initiator codons fall in the same reading frame and are separated by a 222-base-pair segment which encodes an amino terminal signal peptide. The nucleotide sequence predicts the order of amino acids in each of the individual gag-coded proteins (p15, p12, p30, p10), all of which derive from the gag gene precursor. Stable stem-and-loop secondary structures are proposed for two regions of viral RNA. The first falls within sequences at the 5' end of the viral genome, together with adjacent palindromic sequences which may play a role in dimer linkage of RNA subunits. The second includes coding sequences at the gag-pol junction and is proposed to be involved in translation of the pol gene product. Sequence analysis of the latter region shows that the gag and pol genes are translated in different reading frames. Classical consensus splice donor and acceptor sequences could not be localized to regions which would permit synthesis of the expected gag-pol precursor protein. Alternatively, we suggest that the pol gene product (RNA-dependent DNA polymerase) could be translated by a frameshift suppressing mechanism which could involve cleavage modification of stems and loops in a manner similar to that observed in tRNA processing. PMID:6328019
An Enhanced Synthetic Multiclade DNA Prime Induces Improved Cross-Clade-Reactive Functional Antibodies when Combined with an Adjuvanted Protein Boost in Nonhuman Primates

PubMed Central

Wise, Megan C.; Hutnick, Natalie A.; Pollara, Justin; Myles, Devin J. F.; Williams, Constance; Yan, Jian; LaBranche, Celia C.; Khan, Amir S.; Sardesai, Niranjan Y.; Montefiori, David; Barnett, Susan W.; Zolla-Pazner, Susan; Ferrari, Guido

2015-01-01

ABSTRACT The search for an efficacious human immunodeficiency virus type 1 (HIV-1) vaccine remains a pressing need. The moderate success of the RV144 Thai clinical vaccine trial suggested that vaccine-induced HIV-1-specific antibodies can reduce the risk of HIV-1 infection. We have made several improvements to the DNA platform and have previously shown that improved DNA vaccines alone are capable of inducing both binding and neutralizing antibodies in small-animal models. In this study, we explored how an improved DNA prime and recombinant protein boost would impact HIV-specific vaccine immunogenicity in rhesus macaques (RhM). After DNA immunization with either a single HIV Env consensus sequence or multiple constructs expressing HIV subtype-specific Env consensus sequences, we detected both CD4+ and CD8+ T-cell responses to all vaccine immunogens. These T-cell responses were further increased after protein boosting to levels exceeding those of DNA-only or protein-only immunization. In addition, we observed antibodies that exhibited robust cross-clade binding and neutralizing and antibody-dependent cellular cytotoxicity (ADCC) activity after immunization with the DNA prime-protein boost regimen, with the multiple-Env formulation inducing a more robust and broader response than the single-Env formulation. The magnitude and functionality of these responses emphasize the strong priming effect improved DNA immunogens can induce, which are further expanded upon protein boost. These results support further study of an improved synthetic DNA prime together with a protein boost for enhancing anti-HIV immune responses. IMPORTANCE Even with effective antiretroviral drugs, HIV remains an enormous global health burden. Vaccine development has been problematic in part due to the high degree of diversity and poor immunogenicity of the HIV Env protein. Studies suggest that a relevant HIV vaccine will likely need to induce broad cellular and humoral responses from a simple vaccine regimen due to the resource-limited setting in which the HIV pandemic is most rampant. DNA vaccination lends itself well to increasing the amount of diversity included in a vaccine due to the ease of manufacturing multiple plasmids and formulating them as a single immunization. By increasing the number of Envs within a formulation, we were able to show an increased breadth of responses as well as improved functionality induced in a nonhuman primate model. This increased breadth could be built upon, leading to better coverage against circulating strains with broader vaccine-induced protection. PMID:26085155
Structure of the human gene encoding the protein repair L-isoaspartyl (D-aspartyl) O-methyltransferase.

PubMed

DeVry, C G; Tsai, W; Clarke, S

1996-11-15

The protein L-isoaspartyl/D-aspartyl O-methyltransferase (EC 2.1.1.77) catalyzes the first step in the repair of proteins damaged in the aging process by isomerization or racemization reactions at aspartyl and asparaginyl residues. A single gene has been localized to human chromosome 6 and multiple transcripts arising through alternative splicing have been identified. Restriction enzyme mapping, subcloning, and DNA sequence analysis of three overlapping clones from a human genomic library in bacteriophage P1 indicate that the gene spans approximately 60 kb and is composed of 8 exons interrupted by 7 introns. Analysis of intron/exon splice junctions reveals that all of the donor and acceptor splice sites are in agreement with the mammalian consensus splicing sequence. Determination of transcription initiation sites by primer extension analysis of poly(A)+ mRNA from human brain identifies multiple start sites, with a major site 159 nucleotides upstream from the ATG start codon. Sequence analysis of the 5'-untranslated region demonstrates several potential cis-acting DNA elements including SP1, ETF, AP1, AP2, ARE, XRE, CREB, MED-1, and half-palindromic ERE motifs. The promoter of this methyltransferase gene lacks an identifiable TATA box but is characterized by a CpG island which begins approximately 723 nucleotides upstream of the major transcriptional start site and extends through exon 1 and into the first intron. These features are characteristic of housekeeping genes and are consistent with the wide tissue distribution observed for this methyltransferase activity.
Resolution of Site-Specific Conformational Heterogeneity in Proline-Rich Molecular Recognition by Src Homology 3 Domains.

PubMed

Horness, Rachel E; Basom, Edward J; Mayer, John P; Thielges, Megan C

2016-02-03

Conformational heterogeneity and dynamics are increasingly evoked in models of protein molecular recognition but are challenging to experimentally characterize. Here we combine the inherent temporal resolution of infrared (IR) spectroscopy with the spatial resolution afforded by selective incorporation of carbon-deuterium (C-D) bonds, which provide frequency-resolved absorptions within a protein IR spectrum, to characterize the molecular recognition of the Src homology 3 (SH3) domain of the yeast protein Sho1 with its cognate proline-rich (PR) sequence of Pbs2. The IR absorptions of C-D bonds introduced at residues along a peptide of the Pbs2 PR sequence report on the changes in the local environments upon binding to the SH3 domain. Interestingly, upon forming the complex the IR spectra of the peptides labeled with C-D bonds at either of the two conserved prolines of the PXXP consensus recognition sequence show more absorptions than there are C-D bonds, providing evidence for the population of multiple states. In contrast, the NMR spectra of the peptides labeled with (13)C at the same residues show only single resonances, indicating rapid interconversion on the NMR time scale. Thus, the data suggest that the SH3 domain recognizes its cognate peptide with a component of induced fit molecular recognition involving the adoption of multiples states, which have previously gone undetected due to interconversion between the populated states that is too fast to resolve using conventional methods.
A consensus-hemagglutinin-based vaccine delivered by an attenuated Salmonella mutant protects chickens against heterologous H7N1 influenza virus.

PubMed

Hyoung, Kim Je; Hajam, Irshad Ahmed; Lee, John Hwa

2017-06-13

H7N3 and H7N7 are highly pathogenic avian influenza (HPAI) viruses and have posed a great threat not only for the poultry industry but for the human health as well. H7N9, a low pathogenic avian influenza (LPAI) virus, is also highly pathogenic to humans, and there is a great concern that these H7 subtypes would acquire the ability to spread efficiently between humans, thereby becoming a pandemic threat. A vaccine candidate covering all the three subtypes must, therefore, be an integral part of any pandemic preparedness plan. To address this need, we constructed a consensus hemagglutinin (HA) sequence of H7N3, H7N7, and H7N9 based on the data available in the NCBI in early 2012-2015. This artificial sequence was then optimized for protein expression before being transformed into an attenuated auxotrophic mutant of Salmonella Typhimurium, JOL1863 strain. Immunizing chickens with JOL1863, delivered intramuscularly, nasally or orally, elicited efficient humoral and cell mediated immune responses, independently of the route of vaccination. Our results also showed that JOL1863 deliver efficient maturation signals to chicken monocyte derived dendritic cells (MoDCs) which were characterized by upregulation of costimulatory molecules and higher cytokine induction. Moreover, immunization with JOL1863 in chickens conferred a significant protection against the heterologous LPAI H7N1 virus challenge as indicated by reduced viral sheddings in the cloacal swabs. We conclude that this vaccine, based on a consensus HA, could induce broader spectrum of protection against divergent H7 influenza viruses and thus warrants further study.
The transcription factor CCAAT-binding factor CBF/NF-Y regulates the proximal promoter activity in the human alpha 1(XI) collagen gene (COL11A1).

PubMed

Matsuo, Noritaka; Yu-Hua, Wang; Sumiyoshi, Hideaki; Sakata-Takatani, Keiko; Nagato, Hitoshi; Sakai, Kumiko; Sakurai, Mami; Yoshioka, Hidekatsu

2003-08-29

We have characterized the proximal promoter region of the human COL11A1 gene. Transient transfection assays indicate that the segment from -199 to +1 is necessary for the activation of basal transcription. Electrophoretic mobility shift assays (EMSAs) demonstrated that the ATTGG sequence, within the -147 to -121 fragment, is critical to bind nuclear proteins in the proximal COL11A1 promoter. We demonstrated that the CCAAT binding factor (CBF/NF-Y) bound to this region using an interference assay with consensus oligonucleotides and a supershift assay with specific antibodies in an EMSA. In a chromatin immunoprecipitation assay and EMSA using DNA-affinity-purified proteins, CBF/NF-Y proteins directly bound this region in vitro and in vivo. We also showed that four tandem copies of the CBF/NF-Y-binding fragment produced higher transcriptional activity than one or two copies, whereas the absence of a CBF/NF-Y-binding fragment suppressed the COL11A1 promoter activity. Furthermore, overexpression of a dominant-negative CBF-B/NF-YA subunit significantly inhibited promoter activity in both transient and stable cells. These results indicate that the CBF/NF-Y proteins regulate the transcription of COL11A1 by directly binding to the ATTGG sequence in the proximal promoter region.
Putative Nonribosomal Peptide Synthetase and Cytochrome P450 Genes Responsible for Tentoxin Biosynthesis in Alternaria alternata ZJ33.

PubMed

Li, You-Hai; Han, Wen-Jin; Gui, Xi-Wu; Wei, Tao; Tang, Shuang-Yan; Jin, Jian-Ming

2016-08-02

Tentoxin, a cyclic tetrapeptide produced by several Alternaria species, inhibits the F₁-ATPase activity of chloroplasts, resulting in chlorosis in sensitive plants. In this study, we report two clustered genes, encoding a putative non-ribosome peptide synthetase (NRPS) TES and a cytochrome P450 protein TES1, that are required for tentoxin biosynthesis in Alternaria alternata strain ZJ33, which was isolated from blighted leaves of Eupatorium adenophorum. Using a pair of primers designed according to the consensus sequences of the adenylation domain of NRPSs, two fragments containing putative adenylation domains were amplified from A. alternata ZJ33, and subsequent PCR analyses demonstrated that these fragments belonged to the same NRPS coding sequence. With no introns, TES consists of a single 15,486 base pair open reading frame encoding a predicted 5161 amino acid protein. Meanwhile, the TES1 gene is predicted to contain five introns and encode a 506 amino acid protein. The TES protein is predicted to be comprised of four peptide synthase modules with two additional N-methylation domains, and the number and arrangement of the modules in TES were consistent with the number and arrangement of the amino acid residues of tentoxin, respectively. Notably, both TES and TES1 null mutants generated via homologous recombination failed to produce tentoxin. This study provides the first evidence concerning the biosynthesis of tentoxin in A. alternata.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Schriner, J.E.; Yi, W.; Hofmann, S.L.

Palmitoyl-protein thioesterase (PPT) is a small glycoprotein that removes palmitate groups from cysteine residues in lipid-modified proteins. We recently reported mutations in PPT in patients with infantile neuronal ceroid lipofuscinosis (INCL), a severe neurodegenerative disorder. INCL is characterized by the accumulation of proteolipid storage material in brain and other tissues, suggesting that the disease is a consequence of abnormal catabolism of acylated proteins. In the current paper, we report the sequence of the human PPT cDNA and the structure of the human PPT gene. The cDNA predicts a protein of 306 amino acids that contains a 25-amino-acid signal peptide, threemore » N-linked glycosylation sites, and consensus motifs characteristic of thioesterases. Northern analysis of a human tissue blot revealed ubiquitous expression of a single 2.5-kb mRNA, with highest expression in lung, brain, and heart. The human PPT gene spans 25 kb and is composed of seven coding exons and a large eighth exon, containing the entire 3{prime}-untranslated region of 1388 bp. An Alu repeat and promoter elements corresponding to putative binding sites for several general transcription factors were identified in the 1060 nucleotides upstream of the transcription start site. The human PPT cDNA sequence and gene structure will provide the means for the identification of further causative mutations in INCL and facilitate genetic screening in selected high-risk populations. 31 refs., 5 figs., 1 tab.« less
Molecular cloning and expression of rat brain endopeptidase 3.4.24.16.

PubMed

Dauch, P; Vincent, J P; Checler, F

1995-11-10

We have isolated by immunological screening of a lambda ZAPII cDNA library constructed from rat brain mRNAs a cDNA clone encoding endopeptidase 3.4.24.16. The longest open reading frame encodes a 704-amino acid protein with a theoretical molecular mass of 80,202 daltons and bears the consensus sequence of the zinc metalloprotease family. The sequence exhibits a 60.2% homology with those of another zinc metallopeptidase, endopeptidase 3.4.24.15. Northern blot analysis reveals two mRNA species of about 3 and 5 kilobases in rat brain, ileum, kidney, and testis. We have transiently transfected COS-7 cells with pcDNA3 containing the cloned cDNA and established the overexpression of a 70-75-kDa immunoreactive protein. This protein hydrolyzes QFS, a quenched fluorimetric substrate of endopeptidase 3.4.24.16, and cleaves neurotensin at a single peptide bond, leading to the formation of neurotensin (1-10) and neurotensin (11-13). QFS and neurotensin hydrolysis are potently inhibited by the selective endopeptidase 3.4.24.16 dipeptide blocker Pro-Ile and by dithiothreitol, while the enzymatic activity remains unaffected by phosphoramidon and captopril, the specific inhibitors of endopeptidase 3.4.24.11 and angiotensin-converting enzyme, respectively. Altogether, these physicochemical, biochemical, and immunological properties unambiguously identify endopeptidase 3.4.24.16 as the protein encoded by the isolated cDNA clone.
Consensus guided mutagenesis of Renilla luciferase yields enhanced stability and light output.

PubMed

Loening, Andreas Markus; Fenn, Timothy David; Wu, Anna M; Gambhir, Sanjiv Sam

2006-09-01

Luciferases, which have seen expansive employment as reporter genes in biological research, could also be used in applications where the protein itself is conjugated to ligands to create probes that are appropriate for use in small animal imaging. As the bioluminescence activity of commonly used luciferases is too labile in serum to permit this application, specific mutations of Renilla luciferase, selected using a consensus sequence driven strategy, were screened for their ability to confer stability of activity in serum as well as their light output. Using this information, a total of eight favorable mutations were combined to generate a mutant Renilla luciferase (RLuc8) that, compared with the parental enzyme, is 200-fold more resistant to inactivation in murine serum and exhibits a 4-fold improvement in light output. Results of the mutational analysis were also used to generate a double mutant optimized for use as a reporter gene. The double mutant had half the resistance to inactivation in serum of the native enzyme while yielding a 5-fold improvement in light output. These variants of Renilla luciferase, which exhibit significantly improved properties compared with the native enzyme, will allow enhanced sensitivity in existing luciferase-based assays as well as enable the development of novel probes labeled with the luciferase protein.
Molecular Basis of the Binding of YAP Transcriptional Regulator to the ErbB4 Receptor Tyrosine Kinase

PubMed Central

Schuchardt, Brett J.; Bhat, Vikas; Mikles, David C.; McDonald, Caleb B.; Sudol, Marius; Farooq, Amjad

2014-01-01

The newly discovered transactivation function of ErbB4 receptor tyrosine kinase is believed to be mediated by virtue of the ability of its proteolytically-cleaved intracellular domain (ICD) to physically associate with YAP2 transcriptional regulator. In an effort to unearth the molecular basis of YAP2-ErbB4 interaction, we have conducted a detailed biophysical analysis of the binding of WW domains of YAP2 to PPXY motifs located within the ICD of ErbB4. Our data show that the WW1 domain of YAP2 binds to PPXY motifs within the ICD in a differential manner and that this behavior is by and large replicated by the WW2 domain. Remarkably, while both WW domains absolutely require the integrity of the PPXY consensus sequence, non-consensus residues within and flanking this motif do not appear to be critical for binding. In spite of this shared mode of binding, the WW domains of YAP2 display distinct conformational dynamics in complex with PPXY motifs derived from ErbB4. Collectively, our study lends new insights into the molecular basis of a key protein-protein interaction involved in a diverse array of cellular processes. PMID:24472438
Molecular basis of the binding of YAP transcriptional regulator to the ErbB4 receptor tyrosine kinase.

PubMed

Schuchardt, Brett J; Bhat, Vikas; Mikles, David C; McDonald, Caleb B; Sudol, Marius; Farooq, Amjad

2014-06-01

The newly discovered transactivation function of ErbB4 receptor tyrosine kinase is believed to be mediated by virtue of the ability of its proteolytically-cleaved intracellular domain (ICD) to physically associate with YAP2 transcriptional regulator. In an effort to unearth the molecular basis of YAP2-ErbB4 interaction, we have conducted a detailed biophysical analysis of the binding of WW domains of YAP2 to PPXY motifs located within the ICD of ErbB4. Our data show that the WW1 domain of YAP2 binds to PPXY motifs within the ICD in a differential manner and that this behavior is by and large replicated by the WW2 domain. Remarkably, while both WW domains absolutely require the integrity of the PPXY consensus sequence, non-consensus residues within and flanking this motif do not appear to be critical for binding. In spite of this shared mode of binding, the WW domains of YAP2 display distinct conformational dynamics in complex with PPXY motifs derived from ErbB4. Collectively, our study lends new insights into the molecular basis of a key protein-protein interaction involved in a diverse array of cellular processes. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
Binding of the cSH3 domain of Grb2 adaptor to two distinct RXXK motifs within Gab1 docker employs differential mechanisms.

PubMed

McDonald, Caleb B; Seldeen, Kenneth L; Deegan, Brian J; Bhat, Vikas; Farooq, Amjad

2011-01-01

A ubiquitous component of cellular signaling machinery, Gab1 docker plays a pivotal role in routing extracellular information in the form of growth factors and cytokines to downstream targets such as transcription factors within the nucleus. Here, using isothermal titration calorimetry (ITC) in combination with macromolecular modeling (MM), we show that although Gab1 contains four distinct RXXK motifs, designated G1, G2, G3, and G4, only G1 and G2 motifs bind to the cSH3 domain of Grb2 adaptor and do so with distinct mechanisms. Thus, while the G1 motif strictly requires the PPRPPKP consensus sequence for high-affinity binding to the cSH3 domain, the G2 motif displays preference for the PXVXRXLKPXR consensus. Such sequential differences in the binding of G1 and G2 motifs arise from their ability to adopt distinct polyproline type II (PPII)- and 3(10) -helical conformations upon binding to the cSH3 domain, respectively. Collectively, our study provides detailed biophysical insights into a key protein-protein interaction involved in a diverse array of signaling cascades central to health and disease. Copyright © 2010 John Wiley & Sons, Ltd.
The intron 1 of HPV 16 has a suboptimal branch point at a guanosine.

PubMed

De la Rosa-Rios, Marco Antonio; Martínez-Salazar, Martha; Martínez-Garcia, Martha; González-Bonilla, César; Villegas-Sepúlveda, Nicolás

2006-06-01

The branch point sequence (BPS) of intron 1 of the HPV-16 was determined via RT-PCR in a cell free system, using lariat intermediates obtained by in vitro splicing reactions. We used synthetic E6/E7 transcripts and HeLa nuclear protein extracts to obtain the splicing intermediates. Then, a divergent oligonucleotide primer set, pairing on the lariat RNA that encompassed the 2'-5' phosphodiester bond formed between the 5' end of the intron and the BPS, was used for cDNA synthesis and PCR amplification. Subsequent RT-PCR assays revealed four splicing intermediates, made up of a major intermediary corresponding to the BPS and four cryptic branched sequences. Only intermediates bound at the 5' end of the intron are probably the authentic branch point sequence, and all of them branch at guanosine 328 instead of the typical adenosine. Unusually, the BPS of intron 1 of HPV-16 is a suboptimal sequence (AGUGAGU) that differs from the eukaryotic consensus BPS, which correlates with the splicing profile observed for early transcripts of HPV-16 in tumors and tumor derived cell lines. The implications of this unusual branch point sequence for splicing of the HPV-16 pre-mRNA are discussed.
Mutations in the gene encoding the Sigma 2 subunit of the adaptor protein 1 complex, AP1S2, cause X-linked mental retardation.

PubMed

Tarpey, Patrick S; Stevens, Claire; Teague, Jon; Edkins, Sarah; O'Meara, Sarah; Avis, Tim; Barthorpe, Syd; Buck, Gemma; Butler, Adam; Cole, Jennifer; Dicks, Ed; Gray, Kristian; Halliday, Kelly; Harrison, Rachel; Hills, Katy; Hinton, Jonathon; Jones, David; Menzies, Andrew; Mironenko, Tatiana; Perry, Janet; Raine, Keiran; Richardson, David; Shepherd, Rebecca; Small, Alexandra; Tofts, Calli; Varian, Jennifer; West, Sofie; Widaa, Sara; Yates, Andy; Catford, Rachael; Butler, Julia; Mallya, Uma; Moon, Jenny; Luo, Ying; Dorkins, Huw; Thompson, Deborah; Easton, Douglas F; Wooster, Richard; Bobrow, Martin; Carpenter, Nancy; Simensen, Richard J; Schwartz, Charles E; Stevenson, Roger E; Turner, Gillian; Partington, Michael; Gecz, Jozef; Stratton, Michael R; Futreal, P Andrew; Raymond, F Lucy

2006-12-01

In a systematic sequencing screen of the coding exons of the X chromosome in 250 families with X-linked mental retardation (XLMR), we identified two nonsense mutations and one consensus splice-site mutation in the AP1S2 gene on Xp22 in three families. Affected individuals in these families showed mild-to-profound mental retardation. Other features included hypotonia early in life and delay in walking. AP1S2 encodes an adaptin protein that constitutes part of the adaptor protein complex found at the cytoplasmic face of coated vesicles located at the Golgi complex. The complex mediates the recruitment of clathrin to the vesicle membrane. Aberrant endocytic processing through disruption of adaptor protein complexes is likely to result from the AP1S2 mutations identified in the three XLMR-affected families, and such defects may plausibly cause abnormal synaptic development and function. AP1S2 is the first reported XLMR gene that encodes a protein directly involved in the assembly of endocytic vesicles.

A Legionella pneumophila collagen-like protein encoded by a gene with a variable number of tandem repeats is involved in the adherence and invasion of host cells.

PubMed

Vandersmissen, Liesbeth; De Buck, Emmy; Saels, Veerle; Coil, David A; Anné, Jozef

2010-05-01

Legionella pneumophila is a Gram-negative, facultative intracellular pathogen and the causative agent of Legionnaires' disease, a severe pneumonia in humans. Analysis of the Legionella sequenced genomes revealed a gene with a variable number of tandem repeats (VNTRs), whose number varies between strains. We examined the strain distribution of this gene among a collection of 108 clinical, environmental and hot spring serotype I strains. Twelve variants were identified, but no correlation was observed between the number of repeat units and clinical and environmental strains. The encoded protein contains the C-terminal consensus motif of outer membrane proteins and has a large region of collagen-like repeats that is encoded by the VNTR region. We have therefore annotated this protein Lcl for Legionella collagen-like protein. Lcl was shown to contribute to the adherence and invasion of host cells and it was demonstrated that the number of repeat units present in lcl had an influence on these adhesion characteristics.
Increasing Clinical Severity during a Dengue Virus Type 3 Cuban Epidemic: Deep Sequencing of Evolving Viral Populations

PubMed Central

Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.

2016-01-01

ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral factors associated with intraepidemic increasing severity in a unique epidemiological setting. Here, we investigated the intrahost genetic diversity in acute human samples collected at different time points during the DENV-3 epidemic that occurred in Cuba in 2001 to 2002 using a deep-sequencing approach. We concluded that greater variability in significant minor populations occurred as the epidemic progressed, particularly in the nonstructural genes, with higher variability observed in secondary infection cases. Remarkably, for the first time significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in structural proteins. These findings indicate that high-resolution approaches are needed to unravel molecular mechanisms involved in dengue pathogenesis. PMID:26889031
Identification of a cis-regulatory region of a gene in Arabidopsis thaliana whose induction by dehydration is mediated by abscisic acid and requires protein synthesis.

PubMed

Iwasaki, T; Yamaguchi-Shinozaki, K; Shinozaki, K

1995-05-20

In Arabidopsis thaliana, the induction of a dehydration-responsive gene, rd22, is mediated by abscisic acid (ABA) but the gene does not include any sequence corresponding to the consensus ABA-responsive element (ABRE), RYACGTGGYR, in its promoter region. The cis-regulatory region of the rd22 promoter was identified by monitoring the expression of beta-glucuronidase (GUS) activity in leaves of transgenic tobacco plants transformed with chimeric gene fusions constructed between 5'-deleted promoters of rd22 and the coding region of the GUS reporter gene. A 67-bp nucleotide fragment corresponding to positions -207 to -141 of the rd22 promoter conferred responsiveness to dehydration and ABA on a non-responsive promoter. The 67-bp fragment contains the sequences of the recognition sites for some transcription factors, such as MYC, MYB, and GT-1. The fact that accumulation of rd22 mRNA requires protein synthesis raises the possibility that the expression of rd22 might be regulated by one of these trans-acting protein factors whose de novo synthesis is induced by dehydration or ABA. Although the structure of the RD22 protein is very similar to that of a non-storage seed protein, USP, of Vicia faba, the expression of the GUS gene driven by the rd22 promoter in non-stressed transgenic Arabidopsis plants was found mainly in flowers and bolted stems rather than in seeds.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

PubMed

Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

2018-04-24

mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Regulation of Lactobacillus casei Sorbitol Utilization Genes Requires DNA-Binding Transcriptional Activator GutR and the Conserved Protein GutM▿

PubMed Central

Alcántara, Cristina; Sarmiento-Rubiano, Luz Adriana; Monedero, Vicente; Deutscher, Josef; Pérez-Martínez, Gaspar; Yebra, María J.

2008-01-01

Sequence analysis of the five genes (gutRMCBA) downstream from the previously described sorbitol-6-phosphate dehydrogenase-encoding Lactobacillus casei gutF gene revealed that they constitute a sorbitol (glucitol) utilization operon. The gutRM genes encode putative regulators, while the gutCBA genes encode the EIIC, EIIBC, and EIIA proteins of a phosphoenolpyruvate-dependent sorbitol phosphotransferase system (PTSGut). The gut operon is transcribed as a polycistronic gutFRMCBA messenger, the expression of which is induced by sorbitol and repressed by glucose. gutR encodes a transcriptional regulator with two PTS-regulated domains, a galactitol-specific EIIB-like domain (EIIBGat domain) and a mannitol/fructose-specific EIIA-like domain (EIIAMtl domain). Its inactivation abolished gut operon transcription and sorbitol uptake, indicating that it acts as a transcriptional activator. In contrast, cells carrying a gutB mutation expressed the gut operon constitutively, but they failed to transport sorbitol, indicating that EIIBCGut negatively regulates GutR. A footprint analysis showed that GutR binds to a 35-bp sequence upstream from the gut promoter. A sequence comparison with the presumed promoter region of gut operons from various firmicutes revealed a GutR consensus motif that includes an inverted repeat. The regulation mechanism of the L. casei gut operon is therefore likely to be operative in other firmicutes. Finally, gutM codes for a conserved protein of unknown function present in all sequenced gut operons. A gutM mutant, the first constructed in a firmicute, showed drastically reduced gut operon expression and sorbitol uptake, indicating a regulatory role also for GutM. PMID:18676710
Molecular relationships between closely related strains and species of nematodes

NASA Technical Reports Server (NTRS)

Butler, M. H.; Wall, S. M.; Luehrsen, K. R.; Fox, G. E.; Hecht, R. M.

1981-01-01

Electrophoretic comparisons have been made for 24 enzymes in the Bergerac and Bristol strains of Caenorhabditis elegans and the related species, Caenorhabditis briggsae. No variation was detected between the two strains of C. elegans. In contrast, the two species, C. elegans and C. briggsae exhibited electrophoretic differences in 22 of 24 enzymes. A consensus 5S rRNA sequence was determined for C. elegans and found to be identical to that from C. briggsae. By analogy with other species with relatively well established fossil records it can be inferred that the time of divergence between the two nematode species is probably in the tens of millions of years. The limited anatomical evolution during a time period in which proteins undergo extensive changes supports the hypothesis that anatomical evolution is not dependent on overall protein changes.
The sigma factor SigD of Mycobacterium tuberculosis putatively enhances gene expression of the septum site determining protein under stressful environments.

PubMed

Ares, Miguel A; Rios-Sarabia, Nora; De la Cruz, Miguel A; Rivera-Gutiérrez, Sandra; García-Morales, Lázaro; León-Solís, Lizbel; Espitia, Clara; Pacheco, Sabino; Cerna-Cortés, Jorge F; Helguera-Repetto, Cecilia A; García, María Jesús; González-Y-Merchand, Jorge A

2017-07-01

This work examined the expression of the septum site determining gene (ssd) of Mycobacterium tuberculosis CDC1551 and its ∆sigD mutant under different growing conditions. The results showed an up-regulation of ssd during stationary phase and starvation conditions, but not during in vitro dormancy, suggesting a putative role for SigD in the control of ssd expression mainly under lack-of-nutrients environments. Furthermore, we elucidated a putative link between ssd expression and cell elongation of bacilli at stationary phase. In addition, a -35 sigD consensus sequence was found for the ssd promoter region, reinforcing the putative regulation of ssd by SigD, and in turn, supporting this protein role during the adaptation of M. tuberculosis to some stressful environments.
Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.

PubMed Central

Davis, C A; Wyatt, G R

1989-01-01

The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148
Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins

NASA Astrophysics Data System (ADS)

Basu, Sankar; Söderquist, Fredrik; Wallner, Björn

2017-05-01

The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents `Proteus', a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible `disorder-to-order' transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.
Detecting novel SNPs and breed-specific haplotypes at calpastatin gene in Iranian fat- and thin-tailed sheep breeds and their effects on protein structure.

PubMed

Aali, Mohsen; Moradi-Shahrbabak, Mohammad; Moradi-Shahrbabak, Hosein; Sadeghi, Mostafa

2014-03-01

Calpastatin has been introduced as a potential candidate gene for growth and meat quality traits. In this study, genetic variability was investigated in the exon 6 and its intron boundaries of ovine CAST gene by PCR-SSCP analysis and DNA sequencing. Also a protein sequence and structural analysis were performed to predict the possible impact of amino acid substitutions on physicochemical properties and structure of the CAST protein. A total of 487 animals belonging to four ancient Iranian sheep breeds with different fat metabolisms, Lori-Bakhtiari and Chall (fat-tailed), Zel-Atabay cross-bred (medium fat-tailed) and Zel (thin-tailed), were analyzed. Eight unique SSCP patterns, representing eight different sequences or haplotypes, CAST-1, CAST-2 and CAST-6 to CAST-11, were identified. Haplotypes CAST-1 and CAST-2 were most common with frequency of 0.365 and 0.295. The novel haplotype CAST-8 had considerable frequency in Iranian sheep breeds (0.129). All the consensus sequences showed 98-99%, 94-98%, 92-93% and 82-83% similarity to the published ovine, caprine, bovine and porcine CAST locus sequences, respectively. Sequence analysis revealed four SNPs in intron 5 (C24T, G62A, G65T and T69-) and three SNPs in exon 6 (c.197A>T, c.282G>T and c.296C>G). All three SNPs in exon 6 were missense mutations which would result in p.Gln 66 Leu, p.Glu 94 Asp and p.Pro 99 Arg substitutions, respectively, in CAST protein. All three amino acid substitutions affected the physicochemical properties of ovine CAST protein including hydrophobicity, amphiphilicity and net charge and subsequently might influence its structure and effect on the activity of Ca2+ channels; hence, they might regulate calpain activity and afterwards meat tenderness and growth rate. The Lori-Bakhtiari population showed the highest heterozygosity in the ovine CAST locus (0.802). Frequency difference of haplotypes CAST-10 and CAST-8 between Lori-Bakhtiari (fat-tailed) and Zel (thin-tailed) breeds was highly significant (P<0.001), indicating that these two haplotypes might be breed-specific haplotypes that distinguish between fat-tailed and thin-tailed sheep breeds. Copyright © 2013 Elsevier B.V. All rights reserved.
Characterization of the gene encoding component C3 of the complement system from the spider Loxosceles laeta venom glands: Phylogenetic implications.

PubMed

Myamoto, D T; Pidde-Queiroz, G; Pedroso, A; Gonçalves-de-Andrade, R M; van den Berg, C W; Tambourgi, D V

2016-09-01

A transcriptome analysis of the venom glands of the spider Loxosceles laeta, performed by our group, in a previous study (Fernandes-Pedrosa et al., 2008), revealed a transcript with a sequence similar to the human complement component C3. Here we present the analysis of this transcript. cDNA fragments encoding the C3 homologue (Lox-C3) were amplified from total RNA isolated from the venom glands of L. laeta by RACE-PCR. Lox-C3 is a 5178 bps cDNA sequence encoding a 190kDa protein, with a domain configuration similar to human C3. Multiple alignments of C3-like proteins revealed two processing sites, suggesting that Lox-C3 is composed of three chains. Furthermore, the amino acids consensus sequences for the thioester was found, in addition to putative sequences responsible for FB binding. The phylogenetic analysis showed that Lox-C3 belongs to the same group as two C3 isoforms from the spider Hasarius adansoni (Family Salcitidae), showing 53% homology with these. This is the first characterization of a Loxosceles cDNA sequence encoding a human C3 homologue, and this finding, together with our previous finding of the expression of a FB-like molecule, suggests that this spider species also has a complement system. This work will help to improve our understanding of the innate immune system in these spiders and the ancestral structure of C3. Copyright © 2016 Elsevier GmbH. All rights reserved.
Isolation of a gene (pbsC) required for siderophore biosynthesis in fluorescent Pseudomonas sp. strain M114.

PubMed

Adams, C; Dowling, D N; O'Sullivan, D J; O'Gara, F

1994-06-03

An iron-regulated gene, pbsC, required for siderophore production in fluorescent Pseudomonas sp. strain M114 has been identified. A kanamycin-resistance cassette was inserted at specific restriction sites within a 7 kb genomic fragment of M114 DNA and by marker exchange two siderophore-negative mutants, designated M1 and M2, were isolated. The nucleotide sequence of approximately 4 kb of the region flanking the insertion sites was determined and a large open reading frame (ORF) extending for 2409 bp was identified. This gene was designated pbsC (pseudobactin synthesis C) and its putative protein product termed PbsC. PbsC was found to be homologous to a family of enzymes involved in the biosynthesis of secondary metabolites, including EntF of Escherichia coli. These enzymes are believed to act via ATP-dependent binding of AMP to their substrate. Several areas of high sequence homology between these proteins and PbsC were observed, including a conserved AMP-binding domain. The expression of pbsC is iron-regulated as revealed when a DNA fragment containing the upstream region was cloned in a promoter probe vector and conjugated into the wild-type strain, M114. The nucleotide sequence upstream of the putative translational start site contains a region homologous to previously defined -16 to -25 sequences of iron-regulated genes but did not contain an iron-box consensus sequence. It was noted that inactivation of the pbsC gene also affected other iron-regulated phenotypes of Pseudomonas M114.
A variant Tc4 transposable element in the nematode C. elegans could encode a novel protein.

PubMed Central

Li, W; Shaw, J E

1993-01-01

A variant C. elegans Tc4 transposable element, Tc4-rh1030, has been sequenced and is 3483 bp long. The Tc4 element that had been analyzed previously is 1605 bp long, consists of two 774-bp nearly perfect inverted terminal repeats connected by a 57-bp loop, and lacks significant open reading frames. In Tc4-rh1030, by comparison, a 2343-bp novel sequence is present in place of a 477-bp segment in one of the inverted repeats. The novel sequence of Tc4-rh1030 is present about five times per haploid genome and is invariably associated with Tc4 elements; we have used the designation Tc4v to denote this variant subfamily of Tc4 elements. Sequence analysis of three cDNA clones suggests that a Tc4v element contains at least five exons that could encode a novel basic protein of 537 amino acid residues. On northern blots, a 1.6-kb Tc4v-specific transcript was detected in the mutator strain TR679 but not in the wild-type strain N2; Tc4 elements are known to transpose in TR679 but appear to be quiescent in N2. We have analyzed transcripts produced by an unc-33 gene that has the Tc4-rh1030 insertional mutation in its transcribed region; all or almost all of the Tc4v sequence is frequently spliced out of the mutant unc-33 transcripts, sometimes by means of non-consensus splice acceptor sites. Images PMID:8382791
Mimtags: the use of phage display technology to produce novel protein-specific probes.

PubMed

Ahmed, Nayyar; Dhanapala, Pathum; Sadli, Nadia; Barrow, Colin J; Suphioglu, Cenk

2014-03-01

In recent times the use of protein-specific probes in the field of proteomics has undergone evolutionary changes leading to the discovery of new probing techniques. Protein-specific probes serve two main purposes: epitope mapping and detection assays. One such technique is the use of phage display in the random selection of peptide mimotopes (mimtags) that can tag epitopes of proteins, replacing the use of monoclonal antibodies in detection systems. In this study, phage display technology was used to screen a random peptide library with a biologically active purified human interleukin-4 receptor (IL-4R) and interleukin-13 (IL-13) to identify mimtag candidates that interacted with these proteins. Once identified, the mimtags were commercially synthesised, biotinylated and used for in vitro immunoassays. We have used phage display to identify M13 phage clones that demonstrated specific binding to IL-4R and IL-13 cytokine. A consensus in binding sequences was observed and phage clones characterised had identical peptide sequence motifs. Only one was synthesised for use in further immunoassays, demonstrating significant binding to either IL-4R or IL-13. We have successfully shown the use of phage display to identify and characterise mimtags that specifically bind to their target epitope. Thus, this new method of probing proteins can be used in the future as a novel tool for immunoassay and detection technique, which is cheaper and more rapidly produced and therefore a better alternative to the use of monoclonal antibodies. Copyright © 2014 Elsevier B.V. All rights reserved.
Interaction of the Sliding Clamp β-Subunit and Hda, a DnaA-Related Protein

PubMed Central

Kurz, Mareike; Dalrymple, Brian; Wijffels, Gene; Kongsuwan, Kritaya

2004-01-01

In Escherichia coli, interactions between the replication initiation protein DnaA, the β subunit of DNA polymerase III (the sliding clamp protein), and Hda, the recently identified DnaA-related protein, are required to convert the active ATP-bound form of DnaA to an inactive ADP-bound form through the accelerated hydrolysis of ATP. This rapid hydrolysis of ATP is proposed to be the main mechanism that blocks multiple initiations during cell cycle and acts as a molecular switch from initiation to replication. However, the biochemical mechanism for this crucial step in DNA synthesis has not been resolved. Using purified Hda and β proteins in a plate binding assay and Ni-nitrilotriacetic acid pulldown analysis, we show for the first time that Hda directly interacts with β in vitro. A new β-binding motif, a hexapeptide with the consensus sequence QL[SP]LPL, related to the previously identified β-binding pentapeptide motif (QL[SD]LF) was found in the amino terminus of the Hda protein. Mutants of Hda with amino acid changes in the hexapeptide motif are severely defective in their ability to bind β. A 10-amino-acid peptide containing the E. coli Hda β-binding motif was shown to compete with Hda for binding to β in an Hda-β interaction assay. These results establish that the interaction of Hda with β is mediated through the hexapeptide sequence. We propose that this interaction may be crucial to the events that lead to the inactivation of DnaA and the prevention of excess initiation of rounds of replication. PMID:15150238
Interaction of the sliding clamp beta-subunit and Hda, a DnaA-related protein.

PubMed

Kurz, Mareike; Dalrymple, Brian; Wijffels, Gene; Kongsuwan, Kritaya

2004-06-01

In Escherichia coli, interactions between the replication initiation protein DnaA, the beta subunit of DNA polymerase III (the sliding clamp protein), and Hda, the recently identified DnaA-related protein, are required to convert the active ATP-bound form of DnaA to an inactive ADP-bound form through the accelerated hydrolysis of ATP. This rapid hydrolysis of ATP is proposed to be the main mechanism that blocks multiple initiations during cell cycle and acts as a molecular switch from initiation to replication. However, the biochemical mechanism for this crucial step in DNA synthesis has not been resolved. Using purified Hda and beta proteins in a plate binding assay and Ni-nitrilotriacetic acid pulldown analysis, we show for the first time that Hda directly interacts with beta in vitro. A new beta-binding motif, a hexapeptide with the consensus sequence QL[SP]LPL, related to the previously identified beta-binding pentapeptide motif (QL[SD]LF) was found in the amino terminus of the Hda protein. Mutants of Hda with amino acid changes in the hexapeptide motif are severely defective in their ability to bind beta. A 10-amino-acid peptide containing the E. coli Hda beta-binding motif was shown to compete with Hda for binding to beta in an Hda-beta interaction assay. These results establish that the interaction of Hda with beta is mediated through the hexapeptide sequence. We propose that this interaction may be crucial to the events that lead to the inactivation of DnaA and the prevention of excess initiation of rounds of replication.
The protein structure prediction problem could be solved using the current PDB library

PubMed Central

Zhang, Yang; Skolnick, Jeffrey

2005-01-01

For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with ≈82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments. PMID:15653774
Identification of cDNAs encoding viper venom hyaluronidases: cross-generic sequence conservation of full-length and unusually short variant transcripts.

PubMed

Harrison, Robert A; Ibison, Frances; Wilbraham, Davina; Wagstaff, Simon C

2007-05-01

The immobilisation of prey by snakes is most efficiently achieved by the rapid dissemination of venom from its site of injection into the blood stream. Hyaluronidase is a common component of snake venoms and has been termed the "venom spreading factor". In the absence of nucleotide or protein sequence data to confirm the functional identity of this venom component, we interrogated a venom gland EST database for the saw-scaled viper, Echis ocellatus (Nigeria), using the gene ontology (GO) term "carbohydrate metabolism". A single hyalurononglucosaminadase-activity matching sequence (EOC00242) was found and used to design PCR primers to acquire the full-length cDNA sequence. Although very different from the bee venom and mammalian hyaluronidase sequences, the E. ocellatus sequence retained all the catalytic, positional and structural residues that characterise this class of carbohydrate metabolising hydrolases. An extraordinarily high level of sequence identity (>95%) was observed in analogous venom gland cDNA sequences isolated (by PCR) from another saw-scaled viper species, E. pyramidum leakeyi (Kenya), and from the sahara horned viper, Cerastes cerastes cerastes (Egypt) and the puff adder, Bitis arietans (Nigeria). Smaller amplicons, lacking hyaluronidase catalytic residues because of 768 bp or 855 bp central deletions, appear to encode either truncated peptides without hyaluronidase activity, or are non-translated transcripts because they lack consensus translation initiating motifs.
Cloning and characterization of an autonomous replication sequence from Coxiella burnetii.

PubMed Central

Suhan, M; Chen, S Y; Thompson, H A; Hoover, T A; Hill, A; Williams, J C

1994-01-01

A Coxiella burnetii chromosomal fragment capable of functioning as an origin for the replication of a kanamycin resistance (Kanr) plasmid was isolated by use of origin search methods utilizing an Escherichia coli host. The 5.8-kb fragment was subcloned into phagemid vectors and was deleted progressively by an exonuclease III-S1 technique. Plasmids containing progressively shorter DNA fragments were then tested for their capability to support replication by transformation of an E. coli polA strain. A minimal autonomous replication sequence (ARS) was delimited to 403 bp. Sequencing of the entire 5.8-kb region revealed that the minimal ARS contained two consensus DnaA boxes, three A + T-rich 21-mers, a transcriptional promoter leading rightwards, and potential integration host factor and factor of inversion stimulation binding sites. Database comparisons of deduced amino acid sequences revealed that open reading frames located around the ARS were homologous to genes often, but not always, found near bacterial chromosomal origins; these included identities with rpmH and rnpA in E. coli and identities with the 9K protein and 60K membrane protein in E. coli and Pseudomonas species. These and direct hybridization data suggested that the ARS was chromosomal and not associated with the resident plasmid QpH1. Two-dimensional agarose gel electrophoresis did not reveal the presence of initiating intermediates, indicating that the ARS did not initiate chromosome replication during laboratory growth of C. burnetii. Images PMID:8071197
Transcriptome analysis of the honey bee fungal pathogen, Ascosphaera apis: implications for host pathogenesis

PubMed Central

2012-01-01

Background We present a comprehensive transcriptome analysis of the fungus Ascosphaera apis, an economically important pathogen of the Western honey bee (Apis mellifera) that causes chalkbrood disease. Our goals were to further annotate the A. apis reference genome and to identify genes that are candidates for being differentially expressed during host infection versus axenic culture. Results We compared A. apis transcriptome sequence from mycelia grown on liquid or solid media with that dissected from host-infected tissue. 454 pyrosequencing provided 252 Mb of filtered sequence reads from both culture types that were assembled into 10,087 contigs. Transcript contigs, protein sequences from multiple fungal species, and ab initio gene predictions were included as evidence sources in the Maker gene prediction pipeline, resulting in 6,992 consensus gene models. A phylogeny based on 12 of these protein-coding loci further supported the taxonomic placement of Ascosphaera as sister to the core Onygenales. Several common protein domains were less abundant in A. apis compared with related ascomycete genomes, particularly cytochrome p450 and protein kinase domains. A novel gene family was identified that has expanded in some ascomycete lineages, but not others. We manually annotated genes with homologs in other fungal genomes that have known relevance to fungal virulence and life history. Functional categories of interest included genes involved in mating-type specification, intracellular signal transduction, and stress response. Computational and manual annotations have been made publicly available on the Bee Pests and Pathogens website. Conclusions This comprehensive transcriptome analysis substantially enhances our understanding of the A. apis genome and its expression during infection of honey bee larvae. It also provides resources for future molecular studies of chalkbrood disease and ultimately improved disease management. PMID:22747707

Amino acid sequence motifs essential for P0-mediated suppression of RNA silencing in an isolate of potato leafroll virus from Inner Mongolia.

PubMed

Zhuo, Tao; Li, Yuan-Yuan; Xiang, Hai-Ying; Wu, Zhan-Yu; Wang, Xian-Bin; Wang, Ying; Zhang, Yong-Liang; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

2014-06-01

Polerovirus P0 suppressors of host gene silencing contain a consensus F-box-like motif with Leu/Pro (L/P) requirements for suppressor activity. The Inner Mongolian Potato leafroll virus (PLRV) P0 protein (P0(PL-IM)) has an unusual F-box-like motif that contains a Trp/Gly (W/G) sequence and an additional GW/WG-like motif (G139/W140/G141) that is lacking in other P0 proteins. We used Agrobacterium infiltration-mediated RNA silencing assays to establish that P0(PL-IM) has a strong suppressor activity. Mutagenesis experiments demonstrated that the P0(PL-IM) F-box-like motif encompasses amino acids 76-LPRHLHYECLEWGLLCG THP-95, and that the suppressor activity is abolished by L76A, W87A, or G88A substitution. The suppressor activity is also weakened substantially by mutations within the G139/W140/G141 region and is eliminated by a mutation (F220R) in a C-terminal conserved sequence of P0(PL-IM). As has been observed with other P0 proteins, P0(PL-IM) suppression is correlated with reduced accumulation of the host AGO1-silencing complex protein. However, P0(PL-IM) fails to bind SKP1, which functions in a proteasome pathway that may be involved in AGO1 degradation. These results suggest that P0(PL-IM) may suppress RNA silencing by using an alternative pathway to target AGO1 for degradation. Our results help improve our understanding of the molecular mechanisms involved in PLRV infection.
The Oenococcus oeni clpX Homologue Is a Heat Shock Gene Preferentially Expressed in Exponential Growth Phase

PubMed Central

Jobin, Michel-Philippe; Garmyn, Dominique; Diviès, Charles; Guzzo, Jean

1999-01-01

Using degenerated primers from conserved regions of previously studied clpX gene products, we cloned the clpX gene of the malolactic bacterium Oenococcus oeni. The clpX gene was sequenced, and the deduced protein of 413 amino acids (predicted molecular mass of 45,650 Da) was highly similar to previously analyzed clpX gene products from other organisms. An open reading frame located upstream of the clpX gene was identified as the tig gene by similarity of its predicted product to other bacterial trigger factors. ClpX was purified by using a maltose binding protein fusion system and was shown to possess an ATPase activity. Northern analyses indicated the presence of two independent 1.6-kb monocistronic clpX and tig mRNAs and also showed an increase in clpX mRNA amount after a temperature shift from 30 to 42°C. The clpX transcript is abundant in the early exponential growth phase and progressively declines to undetectable levels in the stationary phase. Thus, unlike hsp18, the gene encoding one of the major small heat shock proteins of Oenococcus oeni, clpX expression is related to the exponential growth phase and requires de novo protein synthesis. Primer extension analysis identified the 5′ end of clpX mRNA which is located 408 nucleotides upstream of a putative AUA start codon. The putative transcription start site allowed identification of a predicted promoter sequence with a high similarity to the consensus sequence found in the housekeeping gene promoter of gram-positive bacteria as well as Escherichia coli. PMID:10542163
Using multi-locus allelic sequence data to estimate genetic divergence among four Lilium (Liliaceae) cultivars

PubMed Central

Shahin, Arwa; Smulders, Marinus J. M.; van Tuyl, Jaap M.; Arens, Paul; Bakker, Freek T.

2014-01-01

Next Generation Sequencing (NGS) may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcriptome sequences using three approaches: POFAD (Phylogeny of Organisms from Allelic Data, uses allelic information of sequence data), RAxML (Randomized Accelerated Maximum Likelihood, tree building based on concatenated consensus sequences) and Consensus Network (constructing a network summarizing among gene tree conflicts). Twenty six gene contigs were chosen based on the presence of orthologous sequences in all cultivars, seven of which also had an orthologous sequence in Tulipa, used as out-group. The three approaches generated the same topology. Although the resolution offered by these approaches is high, in this case there was no extra benefit in using allelic information. We conclude that these 26 genes can be widely applied to construct a species tree for the genus Lilium. PMID:25368628
On the Distribution of Protein Refractive Index Increments

PubMed Central

Zhao, Huaying; Brown, Patrick H.; Schuck, Peter

2011-01-01

The protein refractive index increment, dn/dc, is an important parameter underlying the concentration determination and the biophysical characterization of proteins and protein complexes in many techniques. In this study, we examine the widely used assumption that most proteins have dn/dc values in a very narrow range, and reappraise the prediction of dn/dc of unmodified proteins based on their amino acid composition. Applying this approach in large scale to the entire set of known and predicted human proteins, we obtain, for the first time, to our knowledge, an estimate of the full distribution of protein dn/dc values. The distribution is close to Gaussian with a mean of 0.190 ml/g (for unmodified proteins at 589 nm) and a standard deviation of 0.003 ml/g. However, small proteins <10 kDa exhibit a larger spread, and almost 3000 proteins have values deviating by more than two standard deviations from the mean. Due to the widespread availability of protein sequences and the potential for outliers, the compositional prediction should be convenient and provide greater accuracy than an average consensus value for all proteins. We discuss how this approach should be particularly valuable for certain protein classes where a high dn/dc is coincidental to structural features, or may be functionally relevant such as in proteins of the eye. PMID:21539801
On the distribution of protein refractive index increments.

PubMed

Zhao, Huaying; Brown, Patrick H; Schuck, Peter

2011-05-04

The protein refractive index increment, dn/dc, is an important parameter underlying the concentration determination and the biophysical characterization of proteins and protein complexes in many techniques. In this study, we examine the widely used assumption that most proteins have dn/dc values in a very narrow range, and reappraise the prediction of dn/dc of unmodified proteins based on their amino acid composition. Applying this approach in large scale to the entire set of known and predicted human proteins, we obtain, for the first time, to our knowledge, an estimate of the full distribution of protein dn/dc values. The distribution is close to Gaussian with a mean of 0.190 ml/g (for unmodified proteins at 589 nm) and a standard deviation of 0.003 ml/g. However, small proteins <10 kDa exhibit a larger spread, and almost 3000 proteins have values deviating by more than two standard deviations from the mean. Due to the widespread availability of protein sequences and the potential for outliers, the compositional prediction should be convenient and provide greater accuracy than an average consensus value for all proteins. We discuss how this approach should be particularly valuable for certain protein classes where a high dn/dc is coincidental to structural features, or may be functionally relevant such as in proteins of the eye. Copyright © 2011 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters

PubMed Central

Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

2017-01-01

Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana. The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs. PMID:29326750
Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters.

PubMed

Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

2017-01-01

Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana . The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs.
ConsDock: A new program for the consensus analysis of protein-ligand interactions.

PubMed

Paul, Nicodème; Rognan, Didier

2002-06-01

Protein-based virtual screening of chemical libraries is a powerful technique for identifying new molecules that may interact with a macromolecular target of interest. Because of docking and scoring limitations, it is more difficult to apply as a lead optimization method because it requires that the docking/scoring tool is able to propose as few solutions as possible and all of them with a very good accuracy for both the protein-bound orientation and the conformation of the ligand. In the present study, we present a consensus docking approach (ConsDock) that takes advantage of three widely used docking tools (Dock, FlexX, and Gold). The consensus analysis of all possible poses generated by several docking tools is performed sequentially in four steps: (i) hierarchical clustering of all poses generated by a docking tool into families represented by a leader; (ii) definition of all consensus pairs from leaders generated by different docking programs; (iii) clustering of consensus pairs into classes, represented by a mean structure; and (iv) ranking the different means starting from the most populated class of consensus pairs. When applied to a test set of 100 protein-ligand complexes from the Protein Data Bank, ConsDock significantly outperforms single docking with respect to the docking accuracy of the top-ranked pose. In 60% of the cases investigated here, ConsDock was able to rank as top solution a pose within 2 A RMSD of the X-ray structure. It can be applied as a postprocessing filter to either single- or multiple-docking programs to prioritize three-dimensional guided lead optimization from the most likely docking solution. Copyright 2002 Wiley-Liss, Inc.
A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.

PubMed

Savitski, Mikhail M; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus

2015-09-01

Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets

PubMed Central

Savitski, Mikhail M.; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus

2015-01-01

Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. PMID:25987413
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.

Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4

DOE PAGES

Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.; ...

2015-08-15

Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Conserved features of eukaryotic hsp70 genes revealed by comparison with the nucleotide sequence of human hsp70.

PubMed Central

Hunt, C; Morimoto, R I

1985-01-01

We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
Structural Basis of Arc Binding to Synaptic Proteins: Implications for Cognitive Disease

DOE PAGES

Zhang, Wenchi; Wu, Jing; Ward, Matthew D.; ...

2015-04-09

Arc is a cellular immediate-early gene (IEG) that functions at excitatory synapses and is required for learning and memory. Here we report crystal structures of Arc subdomains that form a bi-lobar architecture remarkably similar to the capsid domain of human immunodeficiency virus (HIV) gag protein. Analysis indicates Arc originated from the Ty3/Gypsy retrotransposon family and was “domesticated” in higher vertebrates for synaptic functions. The Arc N-terminal lobe evolved a unique hydrophobic pocket that mediates intermolecular binding with synaptic proteins as resolved in complexes with TARPγ2 (Stargazin) and CaMKII peptides and is essential for Arc’s synaptic function. A consensus sequence formore » Arc binding identifies several additional partners that include genes implicated in schizophrenia. Arc N-lobe binding is inhibited by small chemicals suggesting Arc’s synaptic action may be druggable. Finally, these studies reveal the remarkable evolutionary origin of Arc and provide a structural basis for understanding Arc’s contribution to neural plasticity and disease.« less
Structural Basis of Arc Binding to Synaptic Proteins: Implications for Cognitive Disease

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Wenchi; Wu, Jing; Ward, Matthew D.

Arc is a cellular immediate-early gene (IEG) that functions at excitatory synapses and is required for learning and memory. Here we report crystal structures of Arc subdomains that form a bi-lobar architecture remarkably similar to the capsid domain of human immunodeficiency virus (HIV) gag protein. Analysis indicates Arc originated from the Ty3/Gypsy retrotransposon family and was “domesticated” in higher vertebrates for synaptic functions. The Arc N-terminal lobe evolved a unique hydrophobic pocket that mediates intermolecular binding with synaptic proteins as resolved in complexes with TARPγ2 (Stargazin) and CaMKII peptides and is essential for Arc’s synaptic function. A consensus sequence formore » Arc binding identifies several additional partners that include genes implicated in schizophrenia. Arc N-lobe binding is inhibited by small chemicals suggesting Arc’s synaptic action may be druggable. Finally, these studies reveal the remarkable evolutionary origin of Arc and provide a structural basis for understanding Arc’s contribution to neural plasticity and disease.« less
A versatile nanobody-based toolkit to analyze retrograde transport from the cell surface.

PubMed

Buser, Dominik P; Schleicher, Kai D; Prescianotto-Baschong, Cristina; Spiess, Martin

2018-06-18

Retrograde transport of membranes and proteins from the cell surface to the Golgi and beyond is essential to maintain homeostasis, compartment identity, and physiological functions. To study retrograde traffic biochemically, by live-cell imaging or by electron microscopy, we engineered functionalized anti-GFP nanobodies (camelid VHH antibody domains) to be bacterially expressed and purified. Tyrosine sulfation consensus sequences were fused to the nanobody for biochemical detection of trans -Golgi arrival, fluorophores for fluorescence microscopy and live imaging, and APEX2 (ascorbate peroxidase 2) for electron microscopy and compartment ablation. These functionalized nanobodies are specifically captured by GFP-modified reporter proteins at the cell surface and transported piggyback to the reporters' homing compartments. As an application of this tool, we have used it to determine the contribution of adaptor protein-1/clathrin in retrograde transport kinetics of the mannose-6-phosphate receptors from endosomes back to the trans -Golgi network. Our experiments establish functionalized nanobodies as a powerful tool to demonstrate and quantify retrograde transport pathways.
Structural Basis of Arc Binding to Synaptic Proteins: Implications for Cognitive Disease

PubMed Central

Zhang, Wenchi; Wu, Jing; Ward, Matthew D.; Yang, Sunggu; Chuang, Yang-An; Xiao, Meifang; Li, Ruojing; Leahy, Daniel J.; Worley, Paul F.

2015-01-01

SUMMARY Arc is a cellular immediate early gene (IEG) that functions at excitatory synapses and is required for learning and memory. We report crystal structures of Arc subdomains that form a bi-lobar architecture remarkably similar to the capsid domain of human immunodeficiency virus (HIV) gag protein. Analysis indicates Arc originated from the Ty3/Gypsy retrotransposon family and was “domesticated” in higher vertebrates for synaptic functions. The Arc N-terminal lobe evolved a unique hydrophobic pocket that mediates intermolecular binding with synaptic proteins as resolved in complexes with TARPγ2 (Stargazin) and CaMKII peptides, and is essential for Arc’s synaptic function. A consensus sequence for Arc binding identifies several additional partners that include genes implicated in schizophrenia. Arc N-lobe binding is inhibited by small chemicals suggesting Arc’s synaptic action may be druggable. These studies reveal the remarkable evolutionary origin of Arc and provide a structural basis for understanding Arc’s contribution to neural plasticity and disease. PMID:25864631
Deletion of internal structured repeats increases the stability of a leucine-rich repeat protein, YopM

PubMed Central

Barrick, Doug

2011-01-01

Mapping the stability distributions of proteins in their native folded states provides a critical link between structure, thermodynamics, and function. Linear repeat proteins have proven more amenable to this kind of mapping than globular proteins. C-terminal deletion studies of YopM, a large, linear leucine-rich repeat (LRR) protein, show that stability is distributed quite heterogeneously, yet a high level of cooperativity is maintained [1]. Key components of this distribution are three interfaces that strongly stabilize adjacent sequences, thereby maintaining structural integrity and promoting cooperativity. To better understand the distribution of interaction energy around these critical interfaces, we studied internal (rather than terminal) deletions of three LRRs in this region, including one of these stabilizing interfaces. Contrary to our expectation that deletion of structured repeats should be destabilizing, we find that internal deletion of folded repeats can actually stabilize the native state, suggesting that these repeats are destabilizing, although paradoxically, they are folded in the native state. We identified two residues within this destabilizing segment that deviate from the consensus sequence at a position that normally forms a stacked leucine ladder in the hydrophobic core. Replacement of these nonconsensus residues with leucine is stabilizing. This stability enhancement can be reproduced in the context of nonnative interfaces, but it requires an extended hydrophobic core. Our results demonstrate that different LRRs vary widely in their contribution to stability, and that this variation is context-dependent. These two factors are likely to determine the types of rearrangements that lead to folded, functional proteins, and in turn, are likely to restrict the pathways available for the evolution of linear repeat proteins. PMID:21764506
Mutant HSPB1 causes loss of translational repression by binding to PCBP1, an RNA binding protein with a possible role in neurodegenerative disease.

PubMed

Geuens, Thomas; De Winter, Vicky; Rajan, Nicholas; Achsel, Tilmann; Mateiu, Ligia; Almeida-Souza, Leonardo; Asselbergh, Bob; Bouhy, Delphine; Auer-Grumbach, Michaela; Bagni, Claudia; Timmerman, Vincent

2017-01-11

The small heat shock protein HSPB1 (Hsp27) is an ubiquitously expressed molecular chaperone able to regulate various cellular functions like actin dynamics, oxidative stress regulation and anti-apoptosis. So far disease causing mutations in HSPB1 have been associated with neurodegenerative diseases such as distal hereditary motor neuropathy, Charcot-Marie-Tooth disease and amyotrophic lateral sclerosis. Most mutations in HSPB1 target its highly conserved α-crystallin domain, while other mutations affect the C- or N-terminal regions or its promotor. Mutations inside the α-crystallin domain have been shown to enhance the chaperone activity of HSPB1 and increase the binding to client proteins. However, the HSPB1-P182L mutation, located outside and downstream of the α-crystallin domain, behaves differently. This specific HSPB1 mutation results in a severe neuropathy phenotype affecting exclusively the motor neurons of the peripheral nervous system. We identified that the HSPB1-P182L mutant protein has a specifically increased interaction with the RNA binding protein poly(C)binding protein 1 (PCBP1) and results in a reduction of its translational repressive activity. RNA immunoprecipitation followed by RNA sequencing on mouse brain lead to the identification of PCBP1 mRNA targets. These targets contain larger 3'- and 5'-UTRs than average and are enriched in an RNA motif consisting of the CTCCTCCTCCTCC consensus sequence. Interestingly, next to the clear presence of neuronal transcripts among the identified PCBP1 targets we identified known genes associated with hereditary peripheral neuropathies and hereditary spastic paraplegias. We therefore conclude that HSPB1 can mediate translational repression through interaction with an RNA binding protein further supporting its role in neurodegenerative disease.
R2R - software to speed the depiction of aesthetic consensus RNA secondary structures

PubMed Central

2011-01-01

Background With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or interactive programs specific to RNA. Unfortunately, the use of applications like Illustrator is extremely time consuming, while existing RNA-specific programs produce figures that are useful, but usually not of the same aesthetic quality as those produced at great cost in Illustrator. Additionally, most existing RNA-specific applications are designed for drawing single RNA molecules, not consensus diagrams. Results We created R2R, a computer program that facilitates the generation of aesthetic and readable drawings of RNA consensus diagrams in a fraction of the time required with general-purpose drawing programs. Since the inference of a consensus RNA structure typically requires a multiple-sequence alignment, the R2R user annotates the alignment with commands directing the layout and annotation of the RNA. R2R creates SVG or PDF output that can be imported into Adobe Illustrator, Inkscape or CorelDRAW. R2R can be used to create consensus sequence and secondary structure models for novel RNA structures or to revise models when new representatives for known RNA classes become available. Although R2R does not currently have a graphical user interface, it has proven useful in our efforts to create 100 schematic models of distinct noncoding RNA classes. Conclusions R2R makes it possible to obtain high-quality drawings of the consensus sequence and structural models of many diverse RNA structures with a more practical amount of effort. R2R software is available at http://breaker.research.yale.edu/R2R and as an Additional file. PMID:21205310

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

PubMed

Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

2013-06-01

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
Protein arginine methylation of Npl3 promotes splicing of the SUS1 intron harboring non-consensus 5' splice site and branch site.

PubMed

Muddukrishna, Bhavana; Jackson, Christopher A; Yu, Michael C

2017-06-01

Protein arginine methylation occurs on spliceosomal components and spliceosome-associated proteins, but how this modification contributes to their function in pre-mRNA splicing remains sparse. Here we provide evidence that protein arginine methylation of the yeast SR-/hnRNP-like protein Npl3 plays a role in facilitating efficient splicing of the SUS1 intron that harbors a non-consensus 5' splice site and branch site. In yeast cells lacking the major protein arginine methyltransferase HMT1, we observed a change in the co-transcriptional recruitment of the U1 snRNP subunit Snp1 and Npl3 to pre-mRNAs harboring both consensus (ECM33 and ASC1) and non-consensus (SUS1) 5' splice site and branch site. Using an Npl3 mutant that phenocopies wild-type Npl3 when expressed in Δhmt1 cells, we showed that the arginine methylation of Npl3 is responsible for this. Examination of pre-mRNA splicing efficiency in these mutants reveals the requirement of Npl3 methylation for the efficient splicing of SUS1 intron 1, but not of ECM33 or ASC1. Changing the 5' splice site and branch site in SUS1 intron 1 to the consensus form restored splicing efficiency in an Hmt1-independent manner. Results from biochemical studies show that methylation of Npl3 promotes its optimal association with the U1 snRNP through its association with the U1 snRNP subunit Mud1. Based on these data, we propose a model in which Hmt1, via arginine methylation of Npl3, facilitates U1 snRNP engagement with the pre-mRNA to promote usage of non-consensus splice sites by the splicing machinery. Published by Elsevier B.V.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Yanfeng; Zheng, Yi; Qin, Ling

Beta-hydroxyacid dehydrogenase (β-HAD) genes have been identified in all sequenced genomes of eukaryotes and prokaryotes. Their gene products catalyze the NAD+- or NADP+-dependent oxidation of various β-hydroxy acid substrates into their corresponding semialdehyde. In many fungal and bacterial genomes, multiple β-HAD genes are observed leading to the hypothesis that these gene products may have unique, uncharacterized metabolic roles specific to their species. The genomes of Geobacter sulfurreducens and Geobacter metallireducens each contain two potential β-HAD genes. The protein sequences of one pair of these genes, Gs-βHAD (Q74DE4) and Gm-βHAD (Q39R98), have 65% sequence identity and 77% sequence similarity with eachmore » other. Both proteins reduce succinic semialdehyde, a metabolite of the GABA shunt. To further explore the structural and functional characteristics of these two β-HADs with a potentially unique substrate specificity, crystal structures for Gs-βHAD and Gm-βHAD in complex with NADP+ were determined to a resolution of 1.89 Å and 2.07 Å, respectively. The structure of both proteins are similar, composed of 14 α-helices and nine β-strands organized into two domains. Domain One (1-165) adopts a typical Rossmann fold composed of two α/β units: a six-strand parallel β-sheet surrounded by six α-helices (α1 – α6) followed by a mixed three-strand β-sheet surrounded by two α-helices (α7 and α8). Domain Two (166-287) is composed of a bundle of seven α-helices (α9 – α14). Four functional regions conserved in all β-HADs are spatially located near each other at the interdomain cleft in both Gs-βHAD and Gm-βHAD with a buried molecule of NADP+. The structural features of Gs-βHAD and Gm-βHAD are described in relation to the four conserved consensus sequences characteristic of β-HADs and the potential biochemical importance of these enzymes as an alternative pathway for the degradation of succinic semialdehyde.« less
Consensus-Degenerate Hybrid Oligonucleotide Primers for Amplification of Priming Glycosyltransferase Genes of the Exopolysaccharide Locus in Strains of the Lactobacillus casei Group

PubMed Central

Provencher, Cathy; LaPointe, Gisèle; Sirois, Stéphane; Van Calsteren, Marie-Rose; Roy, Denis

2003-01-01

A primer design strategy named CODEHOP (consensus-degenerate hybrid oligonucleotide primer) for amplification of distantly related sequences was used to detect the priming glycosyltransferase (GT) gene in strains of the Lactobacillus casei group. Each hybrid primer consisted of a short 3′ degenerate core based on four highly conserved amino acids and a longer 5′ consensus clamp region based on six sequences of the priming GT gene products from exopolysaccharide (EPS)-producing bacteria. The hybrid primers were used to detect the priming GT gene of 44 commercial isolates and reference strains of Lactobacillus rhamnosus, L. casei, Lactobacillus zeae, and Streptococcus thermophilus. The priming GT gene was detected in the genome of both non-EPS-producing (EPS−) and EPS-producing (EPS+) strains of L. rhamnosus. The sequences of the cloned PCR products were similar to those of the priming GT gene of various gram-negative and gram-positive EPS+ bacteria. Specific primers designed from the L. rhamnosus RW-9595M GT gene were used to sequence the end of the priming GT gene in selected EPS+ strains of L. rhamnosus. Phylogenetic analysis revealed that Lactobacillus spp. form a distinctive group apart from other lactic acid bacteria for which GT genes have been characterized to date. Moreover, the sequences show a divergence existing among strains of L. rhamnosus with respect to the terminal region of the priming GT gene. Thus, the PCR approach with consensus-degenerate hybrid primers designed with CODEHOP is a practical approach for the detection of similar genes containing conserved motifs in different bacterial genomes. PMID:12788729
Detection and Analysis of Six Lizard Adenoviruses by Consensus Primer PCR Provides Further Evidence of a Reptilian Origin for the Atadenoviruses

PubMed Central

Wellehan, James F. X.; Johnson, April J.; Harrach, Balázs; Benkö, Mária; Pessier, Allan P.; Johnson, Calvin M.; Garner, Michael M.; Childress, April; Jacobson, Elliott R.

2004-01-01

A consensus nested-PCR method was designed for investigation of the DNA polymerase gene of adenoviruses. Gene fragments were amplified and sequenced from six novel adenoviruses from seven lizard species, including four species from which adenoviruses had not previously been reported. Host species included Gila monster, leopard gecko, fat-tail gecko, blue-tongued skink, Tokay gecko, bearded dragon, and mountain chameleon. This is the first sequence information from lizard adenoviruses. Phylogenetic analysis indicated that these viruses belong to the genus Atadenovirus, supporting the reptilian origin of atadenoviruses. This PCR method may be useful for obtaining templates for initial sequencing of novel adenoviruses. PMID:15542689
Detection and analysis of six lizard adenoviruses by consensus primer PCR provides further evidence of a reptilian origin for the atadenoviruses.

PubMed

Wellehan, James F X; Johnson, April J; Harrach, Balázs; Benkö, Mária; Pessier, Allan P; Johnson, Calvin M; Garner, Michael M; Childress, April; Jacobson, Elliott R

2004-12-01

A consensus nested-PCR method was designed for investigation of the DNA polymerase gene of adenoviruses. Gene fragments were amplified and sequenced from six novel adenoviruses from seven lizard species, including four species from which adenoviruses had not previously been reported. Host species included Gila monster, leopard gecko, fat-tail gecko, blue-tongued skink, Tokay gecko, bearded dragon, and mountain chameleon. This is the first sequence information from lizard adenoviruses. Phylogenetic analysis indicated that these viruses belong to the genus Atadenovirus, supporting the reptilian origin of atadenoviruses. This PCR method may be useful for obtaining templates for initial sequencing of novel adenoviruses.
Rapid RNase L-driven arrest of protein synthesis in the dsRNA response without degradation of translation machinery.

PubMed

Donovan, Jesse; Rath, Sneha; Kolet-Mandrikov, David; Korennykh, Alexei

2017-11-01

Mammalian cells respond to double-stranded RNA (dsRNA) by activating a translation-inhibiting endoribonuclease, RNase L. Consensus in the field indicates that RNase L arrests protein synthesis by degrading ribosomal RNAs (rRNAs) and messenger RNAs (mRNAs). However, here we provide evidence for a different and far more efficient mechanism. By sequencing abundant RNA fragments generated by RNase L in human cells, we identify site-specific cleavage of two groups of noncoding RNAs: Y-RNAs, whose function is poorly understood, and cytosolic tRNAs, which are essential for translation. Quantitative analysis of human RNA cleavage versus nascent protein synthesis in lung carcinoma cells shows that RNase L stops global translation when tRNAs, as well as rRNAs and mRNAs, are still intact. Therefore, RNase L does not have to degrade the translation machinery to stop protein synthesis. Our data point to a rapid mechanism that transforms a subtle RNA cleavage into a cell-wide translation arrest. © 2017 Donovan et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Golgi enzymes do not cycle through the endoplasmic reticulum during protein secretion or mitosis

PubMed Central

Villeneuve, Julien; Duran, Juan; Scarpa, Margherita; Bassaganyas, Laia; Van Galen, Josse; Malhotra, Vivek

2017-01-01

Golgi-specific sialyltransferase (ST) expressed as a chimera with the rapamycin-binding domain of mTOR, FRB, relocates to the endoplasmic reticulum (ER) in cells exposed to rapamycin that also express invariant chain (Ii)-FKBP in the ER. This result has been taken to indicate that Golgi-resident enzymes cycle to the ER constitutively. We show that ST-FRB is trapped in the ER even without Ii-FKBP upon rapamycin addition. This is because ER-Golgi–cycling FKBP proteins contain a C-terminal KDEL-like sequence, bind ST-FRB in the Golgi, and are transported together back to the ER by KDEL receptor–mediated retrograde transport. Moreover, depletion of KDEL receptor prevents trapping of ST-FRB in the ER by rapamycin. Thus ST-FRB cycles artificially by binding to FKBP domain–containing proteins. In addition, Golgi-specific O-linked glycosylation of a resident ER protein occurs only upon artificial fusion of Golgi membranes with ER. Together these findings support the consensus view that there is no appreciable mixing of Golgi-resident enzymes with ER under normal conditions. PMID:27807044
Engineering an efficient and tight D-amino acid-inducible gene expression system in Rhodosporidium/Rhodotorula species.

PubMed

Liu, Yanbin; Koh, Chong Mei John; Ngoh, Si Te; Ji, Lianghui

2015-10-26

Rhodosporidium and Rhodotorula are two genera of oleaginous red yeast with great potential for industrial biotechnology. To date, there is no effective method for inducible expression of proteins and RNAs in these hosts. We have developed a luciferase gene reporter assay based on a new codon-optimized LUC2 reporter gene (RtLUC2), which is flanked with CAR2 homology arms and can be integrated into the CAR2 locus in the nuclear genome at >90 % efficiency. We characterized the upstream DNA sequence of a D-amino acid oxidase gene (DAO1) from R. toruloides ATCC 10657 by nested deletions. By comparing the upstream DNA sequences of several putative DAO1 homologs of Basidiomycetous fungi, we identified a conserved DNA motif with a consensus sequence of AGGXXGXAGX11GAXGAXGG within a 0.2 kb region from the mRNA translation initiation site. Deletion of this motif led to strong mRNA transcription under non-inducing conditions. Interestingly, DAO1 promoter activity was enhanced about fivefold when the 108 bp intron 1 was included in the reporter construct. We identified a conserved CT-rich motif in the intron with a consensus sequence of TYTCCCYCTCCYCCCCACWYCCGA, deletion or point mutations of which drastically reduced promoter strength under both inducing and non-inducing conditions. Additionally, we created a selection marker-free DAO1-null mutant (∆dao1e) which displayed greatly improved inducible gene expression, particularly when both glucose and nitrogen were present in high levels. To avoid adding unwanted peptide to proteins to be expressed, we converted the original translation initiation codon to ATC and re-created a translation initiation codon at the start of exon 2. This promoter, named P DAO1-in1m1 , showed very similar luciferase activity to the wild-type promoter upon induction with D-alanine. The inducible system was tunable by adjusting the levels of inducers, carbon source and nitrogen source. The intron 1-containing DAO1 promoters coupled with a DAO1 null mutant makes an efficient and tight D-amino acid-inducible gene expression system in Rhodosporidium and Rhodotorula genera. The system will be a valuable tool for metabolic engineering and enzyme expression in these yeast hosts.
Cloning and characterization of an inulinase gene from the marine yeast Candida membranifaciens subsp. flavinogenie W14-3 and its expression in Saccharomyces sp. W0 for ethanol production.

PubMed

Zhang, Lin-Lin; Tan, Mei-Juan; Liu, Guang-Lei; Chi, Zhe; Wang, Guang-Yuan; Chi, Zhen-Ming

2015-04-01

The INU1 gene encoding an exo-inulinase from the marine-derived yeast Candida membranifaciens subsp. flavinogenie W14-3 was cloned and characterized. It had an open reading frame of 1,536 bp long encoding an inulinase. The coding region of it was not interrupted by any intron. The cloned gene encoded 512 amino acid residues of a protein with a putative signal peptide of 23 amino acids and a calculated molecular mass of 57.8 kDa. The protein sequence deduced from the inulinase gene contained the inulinase consensus sequences (WMNDPNGL), (RDP), ECP FS and Q. The protein also had six conserved putative N-glycosylation sites. The deduced inulinase from the yeast strain W14-3 was found to be closely related to that from Candida kutaonensis sp. nov. KRF1, Kluyveromyces marxianus, and Cryptococcus aureus G7a. The inulinase gene with its signal peptide encoding sequence was subcloned into the pMIRSC11 expression vector and expressed in Saccharomyces sp. W0. The recombinant yeast strain W14-3-INU-112 obtained could produce 16.8 U/ml of inulinase activity and 12.5 % (v/v) ethanol from 250 g/l of inulin within 168 h. The monosaccharides were detected after the hydrolysis of inulin with the crude inulinase (the yeast culture). All the results indicated that the cloned gene and the recombinant yeast strain W14-3-INU-112 had potential applications in biotechnology.
Comparative Analysis of V-Akt Murine Thymoma Viral Oncogene Homolog 3 (AKT3) Gene between Cow and Buffalo Reveals Substantial Differences for Mastitis.

PubMed

Ullah, Farman; Bhattarai, Dinesh; Cheng, Zhangrui; Liang, Xianwei; Deng, Tingxian; Rehman, Zia Ur; Talpur, Hira Sajjad; Worku, Tesfaye; Brohi, Rahim Dad; Safdar, Muhammad; Ahmad, Muhammad Jamil; Salim, Mohammad; Khan, Momen; Ahmad, Hafiz Ishfaq; Zhang, Shujun

2018-01-01

AKT3 gene is a constituent of the serine/threonine protein kinase family and plays a crucial role in synthesis of milk fats and cholesterol by regulating activity of the sterol regulatory element binding protein (SREBP). AKT3 is highly conserved in mammals and its expression levels during the lactation periods of cattle are markedly increased. AKT3 is highly expressed in the intestine followed by mammary gland and it is also expressed in immune cells. It is involved in the TLR pathways as effectively as proinflammatory cytokines. The aims of this study were to investigate the sequences differences between buffalo and cow. Our results showed that there were substantial differences between buffalo and cow in some exons and noteworthy differences of the gene size in different regions. We also identified the important consensus sequence motifs, variation in 2000 upstream of ATG, substantial difference in the "3'UTR" region, and miRNA association in the buffalo sequences compared with the cow. In addition, genetic analyses, such as gene structure, phylogenetic tree, position of different motifs, and functional domains, were performed to establish their correlation with other species. This may indicate that a buffalo breed has potential resistance to disease, environment changes, and airborne microorganisms and some good production and reproductive traits.
Comparative Analysis of V-Akt Murine Thymoma Viral Oncogene Homolog 3 (AKT3) Gene between Cow and Buffalo Reveals Substantial Differences for Mastitis

PubMed Central

Bhattarai, Dinesh; Cheng, Zhangrui; Liang, Xianwei; Deng, Tingxian; Rehman, Zia Ur; Talpur, Hira Sajjad; Worku, Tesfaye; Brohi, Rahim Dad; Safdar, Muhammad; Ahmad, Muhammad Jamil; Salim, Mohammad; Khan, Momen; Ahmad, Hafiz Ishfaq

2018-01-01

AKT3 gene is a constituent of the serine/threonine protein kinase family and plays a crucial role in synthesis of milk fats and cholesterol by regulating activity of the sterol regulatory element binding protein (SREBP). AKT3 is highly conserved in mammals and its expression levels during the lactation periods of cattle are markedly increased. AKT3 is highly expressed in the intestine followed by mammary gland and it is also expressed in immune cells. It is involved in the TLR pathways as effectively as proinflammatory cytokines. The aims of this study were to investigate the sequences differences between buffalo and cow. Our results showed that there were substantial differences between buffalo and cow in some exons and noteworthy differences of the gene size in different regions. We also identified the important consensus sequence motifs, variation in 2000 upstream of ATG, substantial difference in the “3′UTR” region, and miRNA association in the buffalo sequences compared with the cow. In addition, genetic analyses, such as gene structure, phylogenetic tree, position of different motifs, and functional domains, were performed to establish their correlation with other species. This may indicate that a buffalo breed has potential resistance to disease, environment changes, and airborne microorganisms and some good production and reproductive traits. PMID:29862252
The PR/SET Domain Zinc Finger Protein Prdm4 Regulates Gene Expression in Embryonic Stem Cells but Plays a Nonessential Role in the Developing Mouse Embryo

PubMed Central

Bogani, Debora; Morgan, Marc A. J.; Nelson, Andrew C.; Costello, Ita; McGouran, Joanna F.; Kessler, Benedikt M.

2013-01-01

Prdm4 is a highly conserved member of the Prdm family of PR/SET domain zinc finger proteins. Many well-studied Prdm family members play critical roles in development and display striking loss-of-function phenotypes. Prdm4 functional contributions have yet to be characterized. Here, we describe its widespread expression in the early embryo and adult tissues. We demonstrate that DNA binding is exclusively mediated by the Prdm4 zinc finger domain, and we characterize its tripartite consensus sequence via SELEX (systematic evolution of ligands by exponential enrichment) and ChIP-seq (chromatin immunoprecipitation-sequencing) experiments. In embryonic stem cells (ESCs), Prdm4 regulates key pluripotency and differentiation pathways. Two independent strategies, namely, targeted deletion of the zinc finger domain and generation of a EUCOMM LacZ reporter allele, resulted in functional null alleles. However, homozygous mutant embryos develop normally and adults are healthy and fertile. Collectively, these results strongly suggest that Prdm4 functions redundantly with other transcriptional partners to cooperatively regulate gene expression in the embryo and adult animal. PMID:23918801
[Polymorphism of KPI-A genes from plants of the subgenus Potatoe (sect. Petota, Estolonifera and Lycopersicum) and subgenus Solanum].

PubMed

Krinitsyna, A A; Mel'nikova, N V; Belenikin, M S; Poltronieri, P; Santino, A; Kudriavtseva, A V; Savilova, A M; Speranskaia, A S

2013-01-01

Kunitz-type proteinase inhibitor proteins of group A (KPI-A) are involved in the protection of potato plants from pathogens and pests. Although sequences of large number of the KPI-A genes from different species of cultivated potato (Solanum tuberosum subsp. tuberosum) and a few genes from tomato (Solanum lycopersicum) are known to date, information about the allelic diversity of these genes in other species of the genus Solanum is lacking. In our work, the consensus sequences of the KPI-A genes were established in two species of subgenus Potatoe sect. Petota (Solanum tuberosum subsp. andigenum--5 genes and Solanum stoloniferum--2 genes) and in the subgenus Solanum (Solanum nigrum--5 genes) by amplification, cloning, sequencing and subsequent analysis. The determined sequences of KPI-A genes were 97-100% identical to known sequences of the cultivated potato of sect. Petota (cultivated potato Solanum tuberosum subsp. tuberosum) and sect. Etuberosum (S. palustre). The interspecific variability of these genes did not exceed the intraspecific variability for all studied species except Solanum lycopersicum. The distribution of highly variable and conserved sequences in the mature protein-encoding regions was uniform for all investigated KPI-A genes. However, our attempts to amplify the homologous genes using the same primers and the genomes of Solanum dulcamarum, Solanum lycopersicum and Mandragora officinarum resulted in no product formation. Phylogenetic analysis of KPI-A diversity showed that the sequences of the S. lycopersicum form independent cluster, whereas KPI-A of S. nigrum and species of sect. Etuberosum and sect. Petota are closely related and do not form species-specific subclasters. Although Solanum nigrum is resistant to all known races of economically one of the most important diseases of solanaceous plants oomycete Phytophthora infestans aminoacid sequences encoding by KPI-A genes from its genome have nearly or absolutely no differences to the same from genomes of cultivated potatoes involved by P. infestans.
Evidence for Horizontal Gene Transfer in Evolution of Elongation Factor Tu in Enterococci

PubMed Central

Ke, Danbing; Boissinot, Maurice; Huletsky, Ann; Picard, François J.; Frenette, Johanne; Ouellette, Marc; Roy, Paul H.; Bergeron, Michel G.

2000-01-01

The elongation factor Tu, encoded by tuf genes, is a GTP binding protein that plays a central role in protein synthesis. One to three tuf genes per genome are present, depending on the bacterial species. Most low-G+C-content gram-positive bacteria carry only one tuf gene. We have designed degenerate PCR primers derived from consensus sequences of the tuf gene to amplify partial tuf sequences from 17 enterococcal species and other phylogenetically related species. The amplified DNA fragments were sequenced either by direct sequencing or by sequencing cloned inserts containing putative amplicons. Two different tuf genes (tufA and tufB) were found in 11 enterococcal species, including Enterococcus avium, Enterococcus casseliflavus, Enterococcus dispar, Enterococcus durans, Enterococcus faecium, Enterococcus gallinarum, Enterococcus hirae, Enterococcus malodoratus, Enterococcus mundtii, Enterococcus pseudoavium, and Enterococcus raffinosus. For the other six enterococcal species (Enterococcus cecorum, Enterococcus columbae, Enterococcus faecalis, Enterococcus sulfureus, Enterococcus saccharolyticus, and Enterococcus solitarius), only the tufA gene was present. Based on 16S rRNA gene sequence analysis, the 11 species having two tuf genes all have a common ancestor, while the six species having only one copy diverged from the enterococcal lineage before that common ancestor. The presence of one or two copies of the tuf gene in enterococci was confirmed by Southern hybridization. Phylogenetic analysis of tuf sequences demonstrated that the enterococcal tufA gene branches with the Bacillus, Listeria, and Staphylococcus genera, while the enterococcal tufB gene clusters with the genera Streptococcus and Lactococcus. Primary structure analysis showed that four amino acid residues encoded within the sequenced regions are conserved and unique to the enterococcal tufB genes and the tuf genes of streptococci and Lactococcus lactis. The data suggest that an ancestral streptococcus or a streptococcus-related species may have horizontally transferred a tuf gene to the common ancestor of the 11 enterococcal species which now carry two tuf genes. PMID:11092850
Evidence of Divergent Amino Acid Usage in Comparative Analyses of R5- and X4-Associated HIV-1 Vpr Sequences

PubMed Central

Antell, Gregory C.; Zhong, Wen; Kercher, Katherine; Passic, Shendra; Williams, Jean; Liu, Yucheng; James, Tony; Jacobson, Jeffrey M.; Szep, Zsofia

2017-01-01

Vpr is an HIV-1 accessory protein that plays numerous roles during viral replication, and some of which are cell type dependent. To test the hypothesis that HIV-1 tropism extends beyond the envelope into the vpr gene, studies were performed to identify the associations between coreceptor usage and Vpr variation in HIV-1-infected patients. Colinear HIV-1 Env-V3 and Vpr amino acid sequences were obtained from the LANL HIV-1 sequence database and from well-suppressed patients in the Drexel/Temple Medicine CNS AIDS Research and Eradication Study (CARES) Cohort. Genotypic classification of Env-V3 sequences as X4 (CXCR4-utilizing) or R5 (CCR5-utilizing) was used to group colinear Vpr sequences. To reveal the sequences associated with a specific coreceptor usage genotype, Vpr amino acid sequences were assessed for amino acid diversity and Jensen-Shannon divergence between the two groups. Five amino acid alphabets were used to comprehensively examine the impact of amino acid substitutions involving side chains with similar physiochemical properties. Positions 36, 37, 41, 89, and 96 of Vpr were characterized by statistically significant divergence across multiple alphabets when X4 and R5 sequence groups were compared. In addition, consensus amino acid switches were found at positions 37 and 41 in comparisons of the R5 and X4 sequence populations. These results suggest an evolutionary link between Vpr and gp120 in HIV-1-infected patients. PMID:28620613
Diverse Dengue Type 2 Virus Populations Contain Recombinant and Both Parental Viruses in a Single Mosquito Host

PubMed Central

Craig, Scott; Thu, Hlaing Myat; Lowry, Kym; Wang, Xiao-fang; Holmes, Edward C.; Aaskov, John

2003-01-01

Envelope (E) protein genes sampled from populations of dengue 2 (DEN-2) virus in individual Aedes aegypti mosquitoes and in serum from dengue patients were copied to cDNA, cloned, and sequenced. The nucleotide sequences of the E genes in more than 70% of the clones differed from the consensus sequence for the corresponding virus population at up to 11 sites, and 24 of the 94 clones contained at least one stop codon. Virus populations recovered up to 2 years apart yielded clones with similar polymorphisms in the E gene. For one mosquito, the clones obtained fell into two genotypes. One group of sequences was closely related to those of viruses recovered from dengue patients in the same locality (Yangon, Myanmar) since 1995 and were classified as Asian 1 genotype. The second group were Cosmopolitan genotype viruses which were also circulating in Yangon in 2000 and which were related to DEN-2 viruses sampled from southern China in 1999. Finally, one clone was identified as a recombinant genome composed of portions of these two “parental” genotypes. This is the first report of recombinant and parental dengue viruses in a single host. PMID:12634407
Comparative analysis on the structural features of the 5' flanking region of κ-casein genes from six different species

PubMed Central

Gerencsér, Ákos; Barta, Endre; Boa, Simon; Kastanis, Petros; Bösze, Zsuzsanna; Whitelaw, C Bruce A

2002-01-01

κ-casein plays an essential role in the formation, stabilisation and aggregation of milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. We determined the 5'-flanking sequences for the murine, rabbit and human κ-casein genes and compared them to the published ruminant sequences. The most conserved region was not the proximal promoter region but an approximately 400 bp long region centred 800 bp upstream of the TATA box. This region contained two highly conserved MGF/STAT5 sites with common spacing relative to each other. In this region, six conserved short stretches of similarity were also found which did not correspond to known transcription factor consensus sites. On the contrary to ruminant and human 5' regulatory sequences, the rabbit and murine 5'-flanking regions did not harbour any kind of repetitive elements. We generated a phylogenetic tree of the six species based on multiple alignment of the κ-casein sequences. This study identified conserved candidate transcriptional regulatory elements within the κ-casein gene promoter. PMID:11929628
Recognition of p63 by the E3 ligase ITCH: Effect of an ectodermal dysplasia mutant.

PubMed

Bellomaria, A; Barbato, Gaetano; Melino, G; Paci, M; Melino, Sonia

2010-09-15

The E3 ubiquitin ligase Itch mediates the degradation of the p63 protein. Itch contains four WW domains which are pivotal for the substrate recognition process. Indeed, this domain is implicated in several signalling complexes crucially involved in human diseases including Muscular Dystrophy, Alzheimer's Disease and Huntington Disease. WW domains are highly compact protein-protein binding modules that interact with short proline-rich sequences. The four WW domains present in Itch belong to the Group I type, which binds polypeptides with a PY motif characterized by a PP xY consensus sequence, where x can be any residue. Accordingly, the Itch-p63 interaction results from a direct binding of Itch-WW2 domain with the PY motif of p63. Here, we report a structural analysis of the Itch-p63 interaction by fluorescence, CD and NMR spectroscopy. Indeed, we studied the in vitro interaction between Itch-WW2 domain and p63(534-551), an 18-mer peptide encompassing a fragment of the p63 protein including the PY motif. In addition, we evaluated the conformation and the interaction with Itch-WW2 of a site specific mutant of p63, I549T, that has been reported in both Hay-Wells syndrome and Rapp-Hodgkin syndrome. Based on our results, we propose an extended PP xY motif for the Itch recognition motif (P-P-P-Y-x(4)-[ST]-[ILV]), which includes these C-terminal residues to the PP xY motif.
Activation of endothelial-leukocyte adhesion molecule 1 (ELAM-1) gene transcription

DOE Office of Scientific and Technical Information (OSTI.GOV)

Montgomery, K.F.; Tarr, P.I.; Bomsztyk, K.

1991-08-01

Leukocyte adherence to endothelium is in part mediated by the transient expression of endothelial-leukocyte adhesion molecule 1 (ELAM-1) on endothelial surfaces stimulated by tumor necrosis factor {alpha} (TNF), interleukin (IL) 1, or bacterial lipopolysaccharide (LPS). The intracellular factors controlling induction of ELAM-1 mRNA and protein are unknown. In nuclear runoff experiments with cultured human umbilical vein endothelial cells (HUVEC), the authors demonstrate that transcriptional activation of the ELAM-1 gene occurs following stimulation with TNF. Sequence analysis of the 5{prime} flanking region of the ELAM-1 gene reveals consensus DNA-binding sequences for two known transcription factors, NF-{kappa}B and AP-1. Gel mobility shiftmore » assays demonstrate that TNF, IL-1, or LPS induces activation of NF-{kappa}B-like DNA binding activity in HUVEC. Phorbol 12-myristate 13-acetate, a known activator of protein kinase C (PKC), weakly induces NF-{kappa}B-like activity, ELAM-1 mRNA, and ELAM-1 surface expression in HUVEC. However, TNF, IL-1, and LPS do not activate PKC in HUVEC at doses that strongly induce NF-{kappa}B-like protein activation and ELAM-1 gene expression. PKC blockade with H7 does not inhibit activation of these NF-kB-like proteins but does inhibit ELAM-1 gene transcription. They conclude that PKC-independent activation of NF-{kappa}B in HUVEC with TNF, IL-1, or LPS is associated with, but not sufficient for, activation of ELAM-1 gene transcription.« less

Putative Nonribosomal Peptide Synthetase and Cytochrome P450 Genes Responsible for Tentoxin Biosynthesis in Alternaria alternata ZJ33

PubMed Central

Li, You-Hai; Han, Wen-Jin; Gui, Xi-Wu; Wei, Tao; Tang, Shuang-Yan; Jin, Jian-Ming

2016-01-01

Tentoxin, a cyclic tetrapeptide produced by several Alternaria species, inhibits the F1-ATPase activity of chloroplasts, resulting in chlorosis in sensitive plants. In this study, we report two clustered genes, encoding a putative non-ribosome peptide synthetase (NRPS) TES and a cytochrome P450 protein TES1, that are required for tentoxin biosynthesis in Alternaria alternata strain ZJ33, which was isolated from blighted leaves of Eupatorium adenophorum. Using a pair of primers designed according to the consensus sequences of the adenylation domain of NRPSs, two fragments containing putative adenylation domains were amplified from A. alternata ZJ33, and subsequent PCR analyses demonstrated that these fragments belonged to the same NRPS coding sequence. With no introns, TES consists of a single 15,486 base pair open reading frame encoding a predicted 5161 amino acid protein. Meanwhile, the TES1 gene is predicted to contain five introns and encode a 506 amino acid protein. The TES protein is predicted to be comprised of four peptide synthase modules with two additional N-methylation domains, and the number and arrangement of the modules in TES were consistent with the number and arrangement of the amino acid residues of tentoxin, respectively. Notably, both TES and TES1 null mutants generated via homologous recombination failed to produce tentoxin. This study provides the first evidence concerning the biosynthesis of tentoxin in A. alternata. PMID:27490569
Identification of dehydrin-like proteins responsive to chilling in floral buds of blueberry (Vaccinium, section Cyanococcus).

PubMed

Muthalif, M M; Rowland, L J

1994-04-01

The level of three major polypeptides of 65, 60, and 14 kD increased in response to chilling unit accumulation in floral buds of a woody perennial, blueberry (Vaccinium, section Cynaococcus). The level of the polypeptides increased most dramatically within 300 h of chilling and decreased to the prechilling level with the initiation of budbreak. Cold-hardiness levels were assessed for dormant buds of Vaccinium corymbosum and Vaccinium ashei after different chilling treatments until the resumption of growth. These levels coincided with the level of the chilling-responsive polypeptides. Like some other previously described cold-induced proteins in annual plants, the level of the chilling-induced polypeptides also increased in leaves in response to cold treatment; the chilling-induced polypeptides were heat stable, resisting aggregation after incubation at 95 degrees C for 15 min. By fractionating bud proteins first by isoelectric point (pI) and then by molecular mass, the pI values of the 65- and 60-kD polypeptides were found to be 7.5 to 8.0 and the pI value of the 14-kD polypeptide was judged to be 8.5. Purification of the 65- and 60-kD polypeptides, followed by digestion with endoproteinase Lys-C and sequencing of selected fragments, revealed similarities in amino acid composition between the 65- and 60-kD polypeptides and dehydrins. Indeed, antiserum to the lysine-rich consensus sequence EKKGIMDKIKEKLPG of dehydrin proteins cross-reacted to all three of the major chilling-responsive polypeptides of blueberry, identifying these as dehydrins or dehydrin-like proteins.
Identification of dehydrin-like proteins responsive to chilling in floral buds of blueberry (Vaccinium, section Cyanococcus).

PubMed Central

Muthalif, M M; Rowland, L J

1994-01-01

The level of three major polypeptides of 65, 60, and 14 kD increased in response to chilling unit accumulation in floral buds of a woody perennial, blueberry (Vaccinium, section Cynaococcus). The level of the polypeptides increased most dramatically within 300 h of chilling and decreased to the prechilling level with the initiation of budbreak. Cold-hardiness levels were assessed for dormant buds of Vaccinium corymbosum and Vaccinium ashei after different chilling treatments until the resumption of growth. These levels coincided with the level of the chilling-responsive polypeptides. Like some other previously described cold-induced proteins in annual plants, the level of the chilling-induced polypeptides also increased in leaves in response to cold treatment; the chilling-induced polypeptides were heat stable, resisting aggregation after incubation at 95 degrees C for 15 min. By fractionating bud proteins first by isoelectric point (pI) and then by molecular mass, the pI values of the 65- and 60-kD polypeptides were found to be 7.5 to 8.0 and the pI value of the 14-kD polypeptide was judged to be 8.5. Purification of the 65- and 60-kD polypeptides, followed by digestion with endoproteinase Lys-C and sequencing of selected fragments, revealed similarities in amino acid composition between the 65- and 60-kD polypeptides and dehydrins. Indeed, antiserum to the lysine-rich consensus sequence EKKGIMDKIKEKLPG of dehydrin proteins cross-reacted to all three of the major chilling-responsive polypeptides of blueberry, identifying these as dehydrins or dehydrin-like proteins. PMID:8016270
Cellular roles of neuronal calcium sensor-1 and calcium/calmodulin-dependent kinases in fungi.

PubMed

Tamuli, Ranjan; Kumar, Ravi; Deka, Rekha

2011-04-01

The neuronal calcium sensor-1 (NCS-1) possesses a consensus signal for N-terminal myristoylation and four EF-hand Ca(2+)-binding sites, and mediates the effects of cytosolic Ca(2+). Minute changes in free intracellular Ca(2+) are quickly transformed into changes in the activity of several kinases including calcium/calmodulin-dependent protein kinases (Ca(2+)/CaMKs) that are involved in regulating many eukaryotic cell functions. However, our current knowledge of NCS-1 and Ca(2+)/CaMKs comes mostly from studies of the mammalian enzymes. Thus far very few fungal homologues of NCS-1 and Ca(2+)/CaMKs have been characterized and little is known about their cellular roles. In this minireview, we describe the known sequences, interactions with target proteins and cellular roles of NCS-1 and Ca(2+)/CaMKs in fungi. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Selection of specific interactors from phage display library based on sea lamprey variable lymphocyte receptor sequences.

PubMed

Wezner-Ptasinska, Magdalena; Otlewski, Jacek

2015-12-01

Variable lymphocyte receptors (VLRs) are non-immunoglobulin components of adaptive immunity in jawless vertebrates. These proteins composed of leucine-rich repeat modules offer some advantages over antibodies in target binding and therefore are attractive candidates for biotechnological applications. In this paper we report the design and characterization of a phage display library based on a previously proposed dVLR scaffold containing six LRR modules [Wezner-Ptasinska et al., 2011]. Our library was designed based on a consensus approach in which the randomization scheme reflects the frequencies of amino acids naturally occurring in respective positions responsible for antigen recognition. We demonstrate general applicability of the scaffold by selecting dVLRs specific for lysozyme and S100A7 protein with KD values in the micromolar range. The dVLR library could be used as a convenient alternative to antibodies for effective isolation of high affinity binders.
The Human Splicing Factor ASF/SF2 can Specifically Recognize Pre-mRNA 5' Splice Sites

NASA Astrophysics Data System (ADS)

Zuo, Ping; Manley, James L.

1994-04-01

ASF/SF2 is a human protein previously shown to function in in vitro pre-mRNA splicing as an essential factor necessary for all splices and also as an alternative splicing factor, capable of switching selection of 5' splice sites. To begin to study the protein's mechanism of action, we have investigated the RNA binding properties of purified recombinant ASF/SF2. Using UV crosslinking and gel shift assays, we demonstrate that the RNA binding region of ASF/SF2 can interact with RNA in a sequence-specific manner, recognizing the 5' splice site in each of two different pre-mRNAs. Point mutations in the 5' splice site consensus can reduce binding by as much as a factor of 100, with the largest effects observed in competition assays. These findings support a model in which ASF/SF2 aids in the recognition of pre-mRNA 5' splice sites.
Cholesterol Secosterol Aldehydes Induce Amyloidogenesis and Dysfunction of Wild Type Tumor Protein p53

PubMed Central

Nieva, Jorge; Song, Byeong-Doo; Rogel, Joseph K.; Kujawara, David; Altobel, Lawrence; Izharrudin, Alicia; Boldt, Grant E.; Grover, Rajesh K.; Wentworth, Anita D.; Wentworth, Paul

2011-01-01

SUMMARY Epidemiologic and clinical evidence points to an increased risk of cancer when coupled with chronic inflammation. However, the molecular mechanisms that underpin this interrelationship remain largely unresolved. Herein we show that the inflammation-derived cholesterol 5,6-secosterol aldehydes, atheronal-A (KA) and –B (ALD), but not the PUFA-derived aldehydes 4-hydroxynonenal (HNE) and 4-hydroxyhexenal (HHE), induce misfolding of wild-type p53 into an amyloidogenic form that binds thioflavin T and Congo Red dyes but cannot bind to a consensus DNA sequence. Treatment of lung carcinoma cells with KA and ALD leads to a loss of function of extracted p53, as determined by analysis of extracted nuclear protein and in activation of p21. Our results uncover a plausible chemical link between inflammation and cancer and expands the already pivotal role of p53 dysfunction and cancer risk. PMID:21802012
EBP1 is a novel E2F target gene regulated by transforming growth factor-β.

PubMed

Judah, David; Chang, Wing Y; Dagnino, Lina

2010-11-10

Regulation of gene expression requires transcription factor binding to specific DNA elements, and a large body of work has focused on the identification of such sequences. However, it is becoming increasingly clear that eukaryotic transcription factors can exhibit widespread, nonfunctional binding to genomic DNA sites. Conversely, some of these proteins, such as E2F, can also modulate gene expression by binding to non-consensus elements. E2F comprises a family of transcription factors that play key roles in a wide variety of cellular functions, including survival, differentiation, activation during tissue regeneration, metabolism, and proliferation. E2F factors bind to the Erb3-binding protein 1 (EBP1) promoter in live cells. We now show that E2F binding to the EBP1 promoter occurs through two tandem DNA elements that do not conform to typical consensus E2F motifs. Exogenously expressed E2F1 activates EBP1 reporters lacking one, but not both sites, suggesting a degree of redundancy under certain conditions. E2F1 increases the levels of endogenous EBP1 mRNA in breast carcinoma and other transformed cell lines. In contrast, in non-transformed primary epidermal keratinocytes, E2F, together with the retinoblastoma family of proteins, appears to be involved in decreasing EBP1 mRNA abundance in response to growth inhibition by transforming growth factor-β1. Thus, E2F is likely a central coordinator of multiple responses that culminate in regulation of EBP1 gene expression, and which may vary depending on cell type and context.
Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression

PubMed Central

Sun, Xiaoji; Wang, Xuya; Tang, Zuojian; Grivainis, Mark; Kahler, David; Yun, Chi; Mita, Paolo; Fenyö, David

2018-01-01

Transposable elements (TEs) represent a substantial fraction of many eukaryotic genomes, and transcriptional regulation of these factors is important to determine TE activities in human cells. However, due to the repetitive nature of TEs, identifying transcription factor (TF)-binding sites from ChIP-sequencing (ChIP-seq) datasets is challenging. Current algorithms are focused on subtle differences between TE copies and thus bias the analysis to relatively old and inactive TEs. Here we describe an approach termed “MapRRCon” (mapping repeat reads to a consensus) which allows us to identify proteins binding to TE DNA sequences by mapping ChIP-seq reads to the TE consensus sequence after whole-genome alignment. Although this method does not assign binding sites to individual insertions in the genome, it provides a landscape of interacting TFs by capturing factors that bind to TEs under various conditions. We applied this method to screen TFs’ interaction with L1 in human cells/tissues using ENCODE ChIP-seq datasets and identified 178 of the 512 TFs tested as bound to L1 in at least one biological condition with most of them (138) localized to the promoter. Among these L1-binding factors, we focused on Myc and CTCF, as they play important roles in cancer progression and 3D chromatin structure formation. Furthermore, we explored the transcriptomes of The Cancer Genome Atlas breast and ovarian tumor samples in which a consistent anti-/correlation between L1 and Myc/CTCF expression was observed, suggesting that these two factors may play roles in regulating L1 transcription during the development of such tumors. PMID:29802231
Cloning, sequencing and characterization of lipase genes from a polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans

USDA-ARS?s Scientific Manuscript database

Lipase (lip) and lipase-specific foldase (lif) genes of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans NRRL B-2649 were cloned using primers based on consensus sequences, followed by PCR-based genome walking. Sequence analyses showed a putative Lip gene-product (...
The Bacillus subtilis yabG Gene Is Transcribed by SigK RNA Polymerase during Sporulation, and yabG Mutant Spores Have Altered Coat Protein Composition

PubMed Central

Takamatsu, Hiromu; Kodama, Takeko; Imamura, Atsuo; Asai, Kei; Kobayashi, Kazuo; Nakayama, Tatsuo; Ogasawara, Naotake; Watabe, Kazuhito

2000-01-01

The expression of six novel genes located in the region from abrB to spoVC of the Bacillus subtilis chromosome was analyzed, and one of the genes, yabG, had a predicted promoter sequence conserved among SigK-dependent genes. Northern blot analysis revealed that yabG mRNA was first detected from 4 h after the cessation of logarithmic growth (T4) in wild-type cells and in a gerE36 (GerE−) mutant but not in spoIIAC (SigF−), spoIIGAB (SigE−), spoIIIG (SigG−), and spoIVCB (SigK−) mutants. The transcription start point was determined by primer extension analysis; the −10 and −35 regions are very similar to the consensus sequences recognized by SigK-containing RNA polymerase. Inactivation of the yabG gene by insertion of an erythromycin resistance gene did not affect vegetative growth or spore resistance to heat, chloroform, and lysozyme. The germination of yabG spores in l-alanine and in a mixture of l-asparagine, d-glucose, d-fructose, and potassium chloride was also the same as that of wild-type spores. On the other hand, the protein preparation from yabG spores included 15-, 18-, 21-, 23-, 31-, 45-, and 55-kDa polypeptides which were low in or not extracted from wild-type spores under the same conditions. We determined their N-terminal amino acid sequence and found that these polypeptides were CotT, YeeK, YxeE, CotF, YrbA (31 and 45 kDa), and SpoIVA, respectively. The fluorescence of YabG-green fluorescent protein fusion produced in sporulating cells was detectable in the forespores but not in the mother cell compartment under fluorescence microscopy. These results indicate that yabG encodes a sporulation-specific protein which is involved in coat protein composition in B. subtilis. PMID:10714992
Evolution to pathogenicity of the parvovirus minute virus of mice in immunodeficient mice involves genetic heterogeneity at the capsid domain that determines tropism.

PubMed

López-Bueno, Alberto; Segovia, José C; Bueren, Juan A; O'Sullivan, M Gerard; Wang, Feng; Tattersall, Peter; Almendral, José M

2008-02-01

Very little is known about the role that evolutionary dynamics plays in diseases caused by mammalian DNA viruses. To address this issue in a natural host model, we compared the pathogenesis and genetics of the attenuated fibrotropic and the virulent lymphohematotropic strains of the parvovirus minute virus of mice (MVM), and of two invasive fibrotropic MVM (MVMp) variants carrying the I362S or K368R change in the VP2 major capsid protein, in the infection of severe combined immunodeficient (SCID) mice. By 14 to 18 weeks after oronasal inoculation, the I362S and K368R viruses caused lethal leukopenia characterized by tissue damage and inclusion bodies in hemopoietic organs, a pattern of disease found by 7 weeks postinfection with the lymphohematotropic MVM (MVMi) strain. The MVMp populations emerging in leukopenic mice showed consensus sequence changes in the MVMi genotype at residues G321E and A551V of VP2 in the I362S virus infections or A551V and V575A changes in the K368R virus infections, as well as a high level of genetic heterogeneity within a capsid domain at the twofold depression where these residues lay. Amino acids forming this capsid domain are important MVM tropism determinants, as exemplified by the switch in MVMi host range toward mouse fibroblasts conferred by coordinated changes of some of these residues and by the essential character of glutamate at residue 321 for maintaining MVMi tropism toward primary hemopoietic precursors. The few viruses within the spectrum of mutants from mice that maintained the respective parental 321G and 575V residues were infectious in a plaque assay, whereas the viruses with the main consensus sequences exhibited low levels of fitness in culture. Consistent with this finding, a recombinant MVMp virus carrying the consensus sequence mutations arising in the K368R virus background in mice failed to initiate infection in cell lines of different tissue origins, even though it caused rapid-course lethal leukopenia in SCID mice. The parental consensus genotype prevailed during leukopenia development, but plaque-forming viruses with the reversion of the 575A residue to valine emerged in affected organs. The disease caused by the DNA virus in mice, therefore, involves the generation of heterogeneous viral populations that may cooperatively interact for the hemopoietic syndrome. The evolutionary changes delineate a sector of the surface of the capsid that determines tropism and that surrounds the sialic acid receptor binding domain.
TAIL1: an isthmin-like gene, containing type 1 thrombospondin-repeat and AMOP domain, mapped to ARVD1 critical region.

PubMed

Rossi, Valeria; Beffagna, Giorgia; Rampazzo, Alessandra; Bauce, Barbara; Danieli, Gian Antonio

2004-06-23

Isthmins represent a novel family of vertebrate secreted proteins containing one copy of the thrombospondin type 1 repeat (TSR), which in mammals is shared by several proteins with diverse biological functions, including cell adhesion, angiogenesis, and patterning of developing nervous system. We have determined the genomic organization of human TAIL1 (thrombospondin and AMOP containing isthmin-like 1), a novel isthmin-like gene encoding a protein that contains a TSR and a C-terminal AMOP domain (adhesion-associated domain in MUC4 and other proteins), characteristic of extracellular proteins involved in adhesion processes. TAIL1 gene encompasses more than 24.4 kb. Analysis of the DNA sequence surrounding the putative transcriptional start region revealed a TATA-less promoter located in a CpG island. Several consensus binding sites for the transcription factors Sp1 and MZF-1 were identified in this promoter region. In humans, TAIL1 gene is located on chromosome 14q24.3 within ARVD1 (arrhythmogenic right ventricular dysplasia/cardiomyopathy, type 1) critical region; preliminary evidence suggests that it is expressed in several tissues, showing multiple alternative splicing.
In silico analysis of the polygalacturonase inhibiting protein 1 from apple, Malus domestica.

PubMed

Matsaunyane, Lerato Bt; Oelofse, Dean; Dubery, Ian A

2015-03-11

The Malus domestica polygalacturonase inhibiting protein 1 (MdPGIP1) gene, encoding the M. domestica polygalacturonase inhibiting protein 1 (MdPGIP1), was isolated from the Granny Smith apple cultivar (GenBank accession no. DQ185063). The gene was used to transform tobacco and potato for enhanced resistance against fungal diseases. Analysis of the MdPGIP1 nucleotide sequence revealed that the gene comprises 993 nucleotides that encode a 330 amino acid polypeptide. In silico characterization of the MdPGIP1 polypeptide revealed domains typical of PGIP proteins, which include a 24 amino acid putative signal peptide, a potential cleavage site [Alanine-Leucine-Serine (ALS)] for the signal peptide, a 238 amino acid leucine-rich repeat (LRR) domain, a 46 amino acid N-terminal domain and a 22 amino acid C-terminal domain. The hydropathic evaluation of MdPGIP1 indicated a repetitive hydrophobic motif in the LRR domain and a hydrophilic surface area consistent with a globular protein. The typical consensus glycosylation sequence of Asn-X-Ser/Thr was identified in MdPGIP1, indicating potential N-linked glycosylation of MdPGIP1. The molecular mass of non-glycosylated MdPGIP1 was calculated as 36.615 kDa and the theoretical isoelectric point as 6.98. Furthermore, the secondary and tertiary structure of MdPGIP1 was modelled, and revealed that MdPGIP1 is a curved and elongated molecule that contains sheet B1, sheet B2 and 310-helices on its LRR domain. The overall properties of the MdPGIP1 protein is similar to that of the prototypical Phaseolus vulgaris PGIP 2 (PvPGIP2), and the detected differences supported its use in biotechnological applications as an inhibitor of targeted fungal polygalacturonases (PGs).
Generation of a consensus protein domain dictionary

PubMed Central

Schaeffer, R. Dustin; Jonsson, Amanda L.; Simms, Andrew M.; Daggett, Valerie

2011-01-01

Motivation: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. Results: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40% of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD. Availability and implementation: This domain dictionary is available at www.dynameomics.org. Contact: daggett@u.washington.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21068000
Comprehensive Genomic Characterization of Upper Tract Urothelial Carcinoma.

PubMed

Moss, Tyler J; Qi, Yuan; Xi, Liu; Peng, Bo; Kim, Tae-Beom; Ezzedine, Nader E; Mosqueda, Maribel E; Guo, Charles C; Czerniak, Bogdan A; Ittmann, Michael; Wheeler, David A; Lerner, Seth P; Matin, Surena F

2017-10-01

Upper urinary tract urothelial cancer (UTUC) may have unique etiologic and genomic factors compared to bladder cancer. To characterize the genomic landscape of UTUC and provide insights into its biology using comprehensive integrated genomic analyses. We collected 31 untreated snap-frozen UTUC samples from two institutions and carried out whole-exome sequencing (WES) of DNA, RNA sequencing (RNAseq), and protein analysis. Adjusting for batch effects, consensus mutation calls from independent pipelines identified DNA mutations, gene expression clusters using unsupervised consensus hierarchical clustering (UCHC), and protein expression levels that were correlated with relevant clinical variables, The Cancer Genome Atlas, and other published data. WES identified mutations in FGFR3 (74.1%; 92% low-grade, 60% high-grade), KMT2D (44.4%), PIK3CA (25.9%), and TP53 (22.2%). APOBEC and CpG were the most common mutational signatures. UCHC of RNAseq data segregated samples into four molecular subtypes with the following characteristics. Cluster 1: no PIK3CA mutations, nonsmokers, high-grade
Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

PubMed

Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

2018-01-10

Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing cancer cells. Copyright © 2017 Elsevier B.V. All rights reserved.
A first report and complete genome sequence of alfalfa enamovirus from Sudan

USDA-ARS?s Scientific Manuscript database

A full genome sequence of a viral pathogen, provisionally named alfalfa enamovirus 2 (AEV-2), was reconstructed from short reads obtained by Illumina RNA sequencing of alfalfa sample originating from Sudan. Ambiguous nucleotides in the resultant consensus assembly and identity of the predicted virus...
A consensus linkage map of lentil based on DArT markers from three RIL mapping populations.

PubMed

Ates, Duygu; Aldemir, Secil; Alsaleh, Ahmad; Erdogmus, Semih; Nemli, Seda; Kahriman, Abdullah; Ozkan, Hakan; Vandenberg, Albert; Tanyolac, Bahattin

2018-01-01

Lentil (Lens culinaris ssp. culinaris Medikus) is a diploid (2n = 2x = 14), self-pollinating grain legume with a haploid genome size of about 4 Gbp and is grown throughout the world with current annual production of 4.9 million tonnes. A consensus map of lentil (Lens culinaris ssp. culinaris Medikus) was constructed using three different lentils recombinant inbred line (RIL) populations, including "CDC Redberry" x "ILL7502" (LR8), "ILL8006" x "CDC Milestone" (LR11) and "PI320937" x "Eston" (LR39). The lentil consensus map was composed of 9,793 DArT markers, covered a total of 977.47 cM with an average distance of 0.10 cM between adjacent markers and constructed 7 linkage groups representing 7 chromosomes of the lentil genome. The consensus map had no gap larger than 12.67 cM and only 5 gaps were found to be between 12.67 cM and 6.0 cM (on LG3 and LG4). The localization of the SNP markers on the lentil consensus map were in general consistent with their localization on the three individual genetic linkage maps and the lentil consensus map has longer map length, higher marker density and shorter average distance between the adjacent markers compared to the component linkage maps. This high-density consensus map could provide insight into the lentil genome. The consensus map could also help to construct a physical map using a Bacterial Artificial Chromosome library and map based cloning studies. Sequence information of DArT may help localization of orientation scaffolds from Next Generation Sequencing data.
A consensus linkage map of lentil based on DArT markers from three RIL mapping populations

PubMed Central

Ates, Duygu; Aldemir, Secil; Alsaleh, Ahmad; Erdogmus, Semih; Nemli, Seda; Kahriman, Abdullah; Ozkan, Hakan; Vandenberg, Albert

2018-01-01

Background Lentil (Lens culinaris ssp. culinaris Medikus) is a diploid (2n = 2x = 14), self-pollinating grain legume with a haploid genome size of about 4 Gbp and is grown throughout the world with current annual production of 4.9 million tonnes. Materials and methods A consensus map of lentil (Lens culinaris ssp. culinaris Medikus) was constructed using three different lentils recombinant inbred line (RIL) populations, including “CDC Redberry” x “ILL7502” (LR8), “ILL8006” x “CDC Milestone” (LR11) and “PI320937” x “Eston” (LR39). Results The lentil consensus map was composed of 9,793 DArT markers, covered a total of 977.47 cM with an average distance of 0.10 cM between adjacent markers and constructed 7 linkage groups representing 7 chromosomes of the lentil genome. The consensus map had no gap larger than 12.67 cM and only 5 gaps were found to be between 12.67 cM and 6.0 cM (on LG3 and LG4). The localization of the SNP markers on the lentil consensus map were in general consistent with their localization on the three individual genetic linkage maps and the lentil consensus map has longer map length, higher marker density and shorter average distance between the adjacent markers compared to the component linkage maps. Conclusion This high-density consensus map could provide insight into the lentil genome. The consensus map could also help to construct a physical map using a Bacterial Artificial Chromosome library and map based cloning studies. Sequence information of DArT may help localization of orientation scaffolds from Next Generation Sequencing data. PMID:29351563

Human Lineage-Specific Transcriptional Regulation through GA-Binding Protein Transcription Factor Alpha (GABPa)

PubMed Central

Perdomo-Sabogal, Alvaro; Nowick, Katja; Piccini, Ilaria; Sudbrak, Ralf; Lehrach, Hans; Yaspo, Marie-Laure; Warnatz, Hans-Jörg; Querfurth, Robert

2016-01-01

A substantial fraction of phenotypic differences between closely related species are likely caused by differences in gene regulation. While this has already been postulated over 30 years ago, only few examples of evolutionary changes in gene regulation have been verified. Here, we identified and investigated binding sites of the transcription factor GA-binding protein alpha (GABPa) aiming to discover cis-regulatory adaptations on the human lineage. By performing chromatin immunoprecipitation-sequencing experiments in a human cell line, we found 11,619 putative GABPa binding sites. Through sequence comparisons of the human GABPa binding regions with orthologous sequences from 34 mammals, we identified substitutions that have resulted in 224 putative human-specific GABPa binding sites. To experimentally assess the transcriptional impact of those substitutions, we selected four promoters for promoter-reporter gene assays using human and African green monkey cells. We compared the activities of wild-type promoters to mutated forms, where we have introduced one or more substitutions to mimic the ancestral state devoid of the GABPa consensus binding sequence. Similarly, we introduced the human-specific substitutions into chimpanzee and macaque promoter backgrounds. Our results demonstrate that the identified substitutions are functional, both in human and nonhuman promoters. In addition, we performed GABPa knock-down experiments and found 1,215 genes as strong candidates for primary targets. Further analyses of our data sets link GABPa to cognitive disorders, diabetes, KRAB zinc finger (KRAB-ZNF), and human-specific genes. Thus, we propose that differences in GABPa binding sites played important roles in the evolution of human-specific phenotypes. PMID:26814189
SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

PubMed

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

PubMed Central

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
Streptomyces griseus streptomycin phosphotransferase: expression of its gene in Escherichia coli and sequence homology with other antibiotic phosphotransferases and with eukaryotic protein kinases.

PubMed

Lim, C K; Smith, M C; Petty, J; Baumberg, S; Wootton, J C

1989-12-01

The aphD gene of Streptomyces griseus, encoding a streptomycin 6-phosphotransferase (SPH), was sub-cloned in the pBR322-based expression vector pRK9 (which contains the Serratia marcescens trp promoter) with selection for expression of streptomycin resistance in Escherichia coli. Two hybrid plasmids, pCKL631 and pCKL711, were isolated which conferred resistance. Both contained a approximately 2 kbp fragment already suspected to include aphD. The properties of in vitro deletion derivatives of these plasmids were consistent with the presumed location of aphD. In vitro deletion of a sequence including most of the trp promoter largely, but not quite completely, abolished the ability of the plasmid to confer streptomycin resistance, confirming that expression was indeed principally from the trp promoter. A polypeptide of approximately 34.5 kDa was present in minicells containing plasmids that conferred streptomycin resistance, but was absent when the plasmids contained in vitro deletions removing streptomycin resistance. Part of the fragment was sequenced and an open reading frame corresponding to aphD identified. A computer-assisted comparison of the deduced SPH sequence with those of other antibiotic phosphotransferases suggested a common structure A-B-C-D-E, where B and D were conserved between all sequences compared while A, C and E divided between the streptomycin and hygromycin B phosphotransferases on one hand and kanamycin/neomycin ones on the other. A composite sequence data base was searched for homologues to consensus matrices constructed from five approximately 12-residue subsequences within blocks B and D. For one subsequence, corresponding to the N-terminal portion of block D, those sequences from the database that yielded the highest homology scores comprised almost entirely either antibiotic phosphotransferases or eukaryotic protein kinases. Possible evolutionary implications of this homology, previously described by other groups, are discussed.
Consensus strategy in genes prioritization and combined bioinformatics analysis for preeclampsia pathogenesis.

PubMed

Tejera, Eduardo; Cruz-Monteagudo, Maykel; Burgos, Germán; Sánchez, María-Eugenia; Sánchez-Rodríguez, Aminael; Pérez-Castillo, Yunierkis; Borges, Fernanda; Cordeiro, Maria Natália Dias Soeiro; Paz-Y-Miño, César; Rebelo, Irene

2017-08-08

Preeclampsia is a multifactorial disease with unknown pathogenesis. Even when recent studies explored this disease using several bioinformatics tools, the main objective was not directed to pathogenesis. Additionally, consensus prioritization was proved to be highly efficient in the recognition of genes-disease association. However, not information is available about the consensus ability to early recognize genes directly involved in pathogenesis. Therefore our aim in this study is to apply several theoretical approaches to explore preeclampsia; specifically those genes directly involved in the pathogenesis. We firstly evaluated the consensus between 12 prioritization strategies to early recognize pathogenic genes related to preeclampsia. A communality analysis in the protein-protein interaction network of previously selected genes was done including further enrichment analysis. The enrichment analysis includes metabolic pathways as well as gene ontology. Microarray data was also collected and used in order to confirm our results or as a strategy to weight the previously enriched pathways. The consensus prioritized gene list was rationally filtered to 476 genes using several criteria. The communality analysis showed an enrichment of communities connected with VEGF-signaling pathway. This pathway is also enriched considering the microarray data. Our result point to VEGF, FLT1 and KDR as relevant pathogenic genes, as well as those connected with NO metabolism. Our results revealed that consensus strategy improve the detection and initial enrichment of pathogenic genes, at least in preeclampsia condition. Moreover the combination of the first percent of the prioritized genes with protein-protein interaction network followed by communality analysis reduces the gene space. This approach actually identifies well known genes related with pathogenesis. However, genes like HSP90, PAK2, CD247 and others included in the first 1% of the prioritized list need to be further explored in preeclampsia pathogenesis through experimental approaches.
The nucleotide sequence of the intergenic region between the 5.8S and 26S rRNA genes of the yeast ribosomal RNA operon. Possible implications for the interaction between 5.8S and 26S rRNA and the processing of the primary transcript.

PubMed Central

Veldman, G M; Klootwijk, J; van Heerikhuizen, H; Planta, R J

1981-01-01

We have determined the nucleotide sequence of part of a cloned yeast ribosomal RNA operon extending from the 5.8S RNA gene downstream into the 5' -terminal region of the 26S RNA gene. We mapped the pertinent processing sites, viz. the 5' end of 26S rRNA and the 3'ends of 5.8S rRNA and its immediate precursor, 7S RNA. At the 3' end of 7S RNA we find the sequence UCGUUU which is very similar to the type I consensus sequence UCAUUA/U present at the 3' ends of 17S, 5.8S and 26S rRNA as well as 18S precursor rRNA in yeast. At the 5' end of the 26S RNA gene we find a sequence of thirteen nucleotides which is homologous to the type II sequence present at the 5' termini of both the 17S and the 5.8S RNA gene. These findings further support the suggestion put forward earlier (G.M. Veldman et al. (1980) Nucl. Acids Res. 8, 2907-2920) that both consensus sequences are involved in the recognition of precursor rRNA by the processing nuclease(s). We discuss a model for the processing of yeast rRNA in which a processing enzyme sequentially recognizes several combinations of a type I and a type II consensus sequence. We also describe the existence of a significant base complementarity between sequences in the 5' -terminal region of 26S rRNA and the 3' -terminal region of 5.8S rRNA. We suggest that base pairing between these sequences contributes to the binding between 5.8S and 26S rRNA. Images PMID:7312619
Robust signal peptides for protein secretion in Yarrowia lipolytica: identification and characterization of novel secretory tags.

PubMed

Celińska, Ewelina; Borkowska, Monika; Białas, Wojciech; Korpys, Paulina; Nicaud, Jean-Marc

2018-06-01

Upon expression of a given protein in an expression host, its secretion into the culture medium or cell-surface display is frequently advantageous in both research and industrial contexts. Hence, engineering strategies targeting folding, trafficking, and secretion of the proteins gain considerable interest. Yarrowia lipolytica has emerged as an efficient protein expression platform, repeatedly proved to be a competitive secretor of proteins. Although the key role of signal peptides (SPs) in secretory overexpression of proteins and their direct effect on the final protein titers are widely known, the number of reports on manipulation with SPs in Y. lipolytica is rather scattered. In this study, we assessed the potential of ten different SPs for secretion of two heterologous proteins in Y. lipolytica. Genomic and transcriptomic data mining allowed us to select five novel, previously undescribed SPs for recombinant protein secretion in Y. lipolytica. Their secretory potential was assessed in comparison with known, widely exploited SPs. We took advantage of Golden Gate approach, for construction of expression cassettes, and micro-volume enzymatic assays, for functional screening of large libraries of recombinant strains. Based on the adopted strategy, we identified novel secretory tags, characterized their secretory capacity, indicated the most potent SPs, and suggested a consensus sequence of a potentially robust synthetic SP to expand the molecular toolbox for engineering Y. lipolytica.
Consensus statement: Virus taxonomy in the age of metagenomics.

PubMed

Simmonds, Peter; Adams, Mike J; Benkő, Mária; Breitbart, Mya; Brister, J Rodney; Carstens, Eric B; Davison, Andrew J; Delwart, Eric; Gorbalenya, Alexander E; Harrach, Balázs; Hull, Roger; King, Andrew M Q; Koonin, Eugene V; Krupovic, Mart; Kuhn, Jens H; Lefkowitz, Elliot J; Nibert, Max L; Orton, Richard; Roossinck, Marilyn J; Sabanadzovic, Sead; Sullivan, Matthew B; Suttle, Curtis A; Tesh, Robert B; van der Vlugt, René A; Varsani, Arvind; Zerbini, F Murilo

2017-03-01

The number and diversity of viral sequences that are identified in metagenomic data far exceeds that of experimentally characterized virus isolates. In a recent workshop, a panel of experts discussed the proposal that, with appropriate quality control, viruses that are known only from metagenomic data can, and should be, incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV). Although a taxonomy that is based on metagenomic sequence data alone represents a substantial departure from the traditional reliance on phenotypic properties, the development of a robust framework for sequence-based virus taxonomy is indispensable for the comprehensive characterization of the global virome. In this Consensus Statement article, we consider the rationale for why metagenomic sequence data should, and how it can, be incorporated into the ICTV taxonomy, and present proposals that have been endorsed by the Executive Committee of the ICTV.
Characterization of human glucocorticoid receptor complexes formed with DNA fragments containing or lacking glucocorticoid response elements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tully, D.B.; Cidlowski, J.A.

1989-03-07

Sucrose density gradient shift assays were used to study the interactions of human glucocorticoid receptors (GR) with small DNA fragments either containing or lacking glucocorticoid response element (GRE) DNA consensus sequences. When crude cytoplasmic extracts containing ({sup 3}H)triamcinolone acetonide (({sup 3}H)TA) labeled GR were incubated with unlabeled DNA under conditions of DNA excess, a GRE-containing DNA fragment obtained from the 5' long terminal repeat of mouse mammary tumor virus (MMTV LTR) formed a stable 12-16S complex with activated, but not nonactivated, ({sup 3}H)TA receptor. By contrast, if the cytosols were treated with calf thymus DNA-cellulose to deplete non-GR-DNA-binding proteins priormore » to heat activation, a smaller 7-10S complex was formed with the MMTV LTR DNA fragment. Activated ({sup 3}H)TA receptor from DNA-cellulose pretreated cytosols also interacted with two similarly sized fragments from pBR322 DNA. Stability of the complexes formed between GR and these three DNA fragments was strongly affected by even moderate alterations in either the salt concentration or the pH of the gradient buffer. Under all conditions tested, the complex formed with the MMTV LTR DNA fragment was more stable than the complexes formed with either of the pBR322 DNA fragments. Together these observations indicate that the formation of stable complexes between activated GR and isolated DNA fragments requires the presence of GRE consensus sequences in the DNA.« less
Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples

PubMed Central

Quick, Josh; Grubaugh, Nathan D; Pullan, Steven T; Claro, Ingra M; Smith, Andrew D; Gangavarapu, Karthik; Oliveira, Glenn; Robles-Sikisaka, Refugio; Rogers, Thomas F; Beutler, Nathan A; Burton, Dennis R; Lewis-Ximenez, Lia Laura; de Jesus, Jaqueline Goes; Giovanetti, Marta; Hill, Sarah; Black, Allison; Bedford, Trevor; Carroll, Miles W; Nunes, Marcio; Alcantara, Luiz Carlos; Sabino, Ester C; Baylis, Sally A; Faria, Nuno; Loose, Matthew; Simpson, Jared T; Pybus, Oliver G; Andersen, Kristian G; Loman, Nicholas J

2018-01-01

Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples without isolation remains challenging for viruses such as Zika, where metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence complete genomes comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimised library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved starting with clinical samples in 1-2 days following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. PMID:28538739
A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary

PubMed Central

Day, Ryan; Beck, David A.C.; Armen, Roger S.; Daggett, Valerie

2003-01-01

We have determined consensus protein-fold classifications on the basis of three classification methods, SCOP, CATH, and Dali. These classifications make use of different methods of defining and categorizing protein folds that lead to different views of protein-fold space. Pairwise comparisons of domains on the basis of their fold classifications show that much of the disagreement between the classification systems is due to differing domain definitions rather than assigning the same domain to different folds. However, there are significant differences in the fold assignments between the three systems. These remaining differences can be explained primarily in terms of the breadth of the fold classifications. Many structures may be defined as having one fold in one system, whereas far fewer are defined as having the analogous fold in another system. By comparing these folds for a nonredundant set of proteins, the consensus method breaks up broad fold classifications and combines restrictive fold classifications into metafolds, creating, in effect, an averaged view of fold space. This averaged view requires that the structural similarities between proteins having the same metafold be recognized by multiple classification systems. Thus, the consensus map is useful for researchers looking for fold similarities that are relatively independent of the method used to compare proteins. The 30 most populated metafolds, representing the folds of about half of a nonredundant subset of the PDB, are presented here. The full list of metafolds is presented on the Web. PMID:14500873
A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary.

PubMed

Day, Ryan; Beck, David A C; Armen, Roger S; Daggett, Valerie

2003-10-01

We have determined consensus protein-fold classifications on the basis of three classification methods, SCOP, CATH, and Dali. These classifications make use of different methods of defining and categorizing protein folds that lead to different views of protein-fold space. Pairwise comparisons of domains on the basis of their fold classifications show that much of the disagreement between the classification systems is due to differing domain definitions rather than assigning the same domain to different folds. However, there are significant differences in the fold assignments between the three systems. These remaining differences can be explained primarily in terms of the breadth of the fold classifications. Many structures may be defined as having one fold in one system, whereas far fewer are defined as having the analogous fold in another system. By comparing these folds for a nonredundant set of proteins, the consensus method breaks up broad fold classifications and combines restrictive fold classifications into metafolds, creating, in effect, an averaged view of fold space. This averaged view requires that the structural similarities between proteins having the same metafold be recognized by multiple classification systems. Thus, the consensus map is useful for researchers looking for fold similarities that are relatively independent of the method used to compare proteins. The 30 most populated metafolds, representing the folds of about half of a nonredundant subset of the PDB, are presented here. The full list of metafolds is presented on the Web.
Fast and accurate de novo genome assembly from long uncorrected reads

PubMed Central

Vaser, Robert; Sović, Ivan; Nagarajan, Niranjan

2017-01-01

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment–based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster. PMID:28100585
Synthesis and characterization of recombinant abductin-based proteins.

PubMed

Su, Renay S-C; Renner, Julie N; Liu, Julie C

2013-12-09

Recombinant proteins are promising tools for tissue engineering and drug delivery applications. Protein-based biomaterials have several advantages over natural and synthetic polymers, including precise control over amino acid composition and molecular weight, modular swapping of functional domains, and tunable mechanical and physical properties. In this work, we describe recombinant proteins based on abductin, an elastomeric protein that is found in the inner hinge of bivalves and functions as a coil spring to keep shells open. We illustrate, for the first time, the design, cloning, expression, and purification of a recombinant protein based on consensus abductin sequences derived from Argopecten irradians . The molecular weight of the protein was confirmed by mass spectrometry, and the protein was 94% pure. Circular dichroism studies showed that the dominant structures of abductin-based proteins were polyproline II helix structures in aqueous solution and type II β-turns in trifluoroethanol. Dynamic light scattering studies illustrated that the abductin-based proteins exhibit reversible upper critical solution temperature behavior and irreversible aggregation behavior at high temperatures. A LIVE/DEAD assay revealed that human umbilical vein endothelial cells had a viability of 98 ± 4% after being cultured for two days on the abductin-based protein. Initial cell spreading on the abductin-based protein was similar to that on bovine serum albumin. These studies thus demonstrate the potential of abductin-based proteins in tissue engineering and drug delivery applications due to the cytocompatibility and its response to temperature.
Sequence specificity of the human mRNA N6-adenosine methylase in vitro.

PubMed Central

Harper, J E; Miceli, S M; Roberts, R J; Manley, J L

1990-01-01

N6-adenosine methylation is a frequent modification of mRNAs and their precursors, but little is known about the mechanism of the reaction or the function of the modification. To explore these questions, we developed conditions to examine N6-adenosine methylase activity in HeLa cell nuclear extracts. Transfer of the methyl group from S-[3H methyl]-adenosylmethionine to unlabeled random copolymer RNA substrates of varying ribonucleotide composition revealed a substrate specificity consistent with a previously deduced consensus sequence, Pu[G greater than A]AC[A/C/U]. 32-P labeled RNA substrates of defined sequence were used to examine the minimum sequence requirements for methylation. Each RNA was 20 nucleotides long, and contained either the core consensus sequence GGACU, or some variation of this sequence. RNAs containing GGACU, either in single or multiple copies, were good substrates for methylation, whereas RNAs containing single base substitutions within the GGACU sequence gave dramatically reduced methylation. These results demonstrate that the N6-adenosine methylase has a strict sequence specificity, and that there is no requirement for extended sequences or secondary structures for methylation. Recognition of this sequence does not require an RNA component, as micrococcal nuclease pretreatment of nuclear extracts actually increased methylation efficiency. Images PMID:2216767
Pseudouridine and N6-methyladenosine modifications weaken PUF protein/RNA interactions

PubMed Central

AlSadhan, Ishraq; Merriman, Dawn K.; Al-Hashimi, Hashim M.; Herschlag, Daniel

2017-01-01

RNA modifications are ubiquitous in biology, with over 100 distinct modifications. While the vast majority were identified and characterized on abundant noncoding RNA such as tRNA and rRNA, the advent of sensitive sequencing-based approaches has led to the discovery of extensive and regulated modification of eukaryotic messenger RNAs as well. The two most abundant mRNA modifications—pseudouridine (Ψ) and N6-methyladenosine (m6A)—affect diverse cellular processes including mRNA splicing, localization, translation, and decay and modulate RNA structure. Here, we test the hypothesis that RNA modifications directly affect interactions between RNA-binding proteins and target RNA. We show that Ψ and m6A weaken the binding of the human single-stranded RNA binding protein Pumilio 2 (hPUM2) to its consensus motif, with individual modifications having effects up to approximately threefold and multiple modifications giving larger effects. While there are likely to be some cases where RNA modifications essentially fully ablate protein binding, here we see modest responses that may be more common. Such modest effects could nevertheless profoundly alter the complex landscape of RNA:protein interactions, and the quantitative rather than qualitative nature of these effects underscores the need for quantitative, systems-level accounting of RNA:protein interactions to understand post-transcriptional regulation. PMID:28138061
Site-Specific Nitrosoproteomic Identification of Endogenously S-Nitrosylated Proteins in Arabidopsis1

PubMed Central

Hu, Jiliang; Huang, Xiahe; Chen, Lichao; Sun, Xuwu; Lu, Congming; Zhang, Lixin; Wang, Yingchun; Zuo, Jianru

2015-01-01

Nitric oxide (NO) regulates multiple developmental events and stress responses in plants. A major biologically active species of NO is S-nitrosoglutathione (GSNO), which is irreversibly degraded by GSNO reductase (GSNOR). The major physiological effect of NO is protein S-nitrosylation, a redox-based posttranslational modification mechanism by covalently linking an NO molecule to a cysteine thiol. However, little is known about the mechanisms of S-nitrosylation-regulated signaling, partly due to limited S-nitrosylated proteins being identified. In this study, we identified 1,195 endogenously S-nitrosylated peptides in 926 proteins from the Arabidopsis (Arabidopsis thaliana) by a site-specific nitrosoproteomic approach, which, to date, is the largest data set of S-nitrosylated proteins among all organisms. Consensus sequence analysis of these peptides identified several motifs that contain acidic, but not basic, amino acid residues flanking the S-nitrosylated cysteine residues. These S-nitrosylated proteins are involved in a wide range of biological processes and are significantly enriched in chlorophyll metabolism, photosynthesis, carbohydrate metabolism, and stress responses. Consistently, the gsnor1-3 mutant shows the decreased chlorophyll content and altered photosynthetic properties, suggesting that S-nitrosylation is an important regulatory mechanism in these processes. These results have provided valuable resources and new clues to the studies on S-nitrosylation-regulated signaling in plants. PMID:25699590
CapZyme-Seq Comprehensively Defines Promoter-Sequence Determinants for RNA 5' Capping with NAD.

PubMed

Vvedenskaya, Irina O; Bird, Jeremy G; Zhang, Yuanchao; Zhang, Yu; Jiao, Xinfu; Barvík, Ivan; Krásný, Libor; Kiledjian, Megerditch; Taylor, Deanne M; Ebright, Richard H; Nickels, Bryce E

2018-05-03

Nucleoside-containing metabolites such as NAD + can be incorporated as 5' caps on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report CapZyme-seq, a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-seq with multiplexed transcriptomics, we determine efficiencies of NAD + capping by Escherichia coli RNAP for ∼16,000 promoter sequences. The results define preferred transcription start site (TSS) positions for NAD + capping and define a consensus promoter sequence for NAD + capping: HRRASWW (TSS underlined). By applying CapZyme-seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD + -capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs (sRNAs). Our findings define the promoter-sequence determinants for NCIN capping with NAD + and provide a general method for analysis of NCIN capping in vitro and in vivo. Copyright © 2018 Elsevier Inc. All rights reserved.
LiveBench-1: continuous benchmarking of protein structure prediction servers.

PubMed

Bujnicki, J M; Elofsson, A; Fischer, D; Rychlewski, L

2001-02-01

We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers. Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM, and INBGU. The assessment was conducted using as prediction targets a large number of selected protein structures released from October 1999 to April 2000. A target was selected if its sequence showed no significant similarity to any of the proteins previously available in the structural database. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: easy and hard. We found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries. However, among the hard targets--where standard methods such as PSI-BLAST fail--the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which had a significantly accurate sequence-structure alignment. Among the hard targets, the presence of updated libraries appeared to be less critical for the ranking. An "ideally combined consensus" prediction, where the results of all servers are considered, would increase the percentage of correct assignments by 50%. Each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one server in difficult prediction tasks. The LiveBench program (http://BioInfo.PL/LiveBench) is being continued, and all interested developers are cordially invited to join.
Asparagine-linked oligosaccharides present on a non-consensus amino acid sequence in the CH1 domain of human antibodies.

PubMed

Valliere-Douglass, John F; Kodama, Paul; Mujacic, Mirna; Brady, Lowell J; Wang, Wes; Wallace, Alison; Yan, Boxu; Reddy, Pranhitha; Treuheit, Michael J; Balland, Alain

2009-11-20

We report that N-linked oligosaccharide structures can be present on an asparagine residue not adhering to the consensus site motif NX(S/T), where X is not proline, described in the literature. We have observed oligosaccharides on a non-consensus asparaginyl residue in the C(H)1 constant domain of IgG1 and IgG2 antibodies. The initial findings were obtained from characterization of charge variant populations evident in a recombinant human antibody of the IgG2 subclass. HPLC-MS results indicated that cation-exchange chromatography acidic variant populations were enriched in antibody with a second glycosylation site, in addition to the well documented canonical glycosylation site located in the C(H)2 domain. Subsequent tryptic and chymotryptic peptide map data indicated that the second glycosylation site was associated with the amino acid sequence TVSWN(162)SGAL in the C(H)1 domain of the antibody. This highly atypical modification is present at levels of 0.5-2.0% on most of the recombinant antibodies that have been tested and has also been observed in IgG1 antibodies derived from human donors. Site-directed mutagenesis of the C(H)1 domain sequence in a recombinant-human IgG1 antibody resulted in an increase in non-consensus glycosylation to 3.15%, a greater than 4-fold increase over the level observed in the wild type, by changing the -1 and +1 amino acids relative to the asparagine residue at position 162. We believe that further understanding of the phenomenon of non-consensus glycosylation can be used to gain fundamental insights into the fidelity of the cellular glycosylation machinery.

Magnetic resonance imaging for the detection, localisation, and characterisation of prostate cancer: recommendations from a European consensus meeting.

PubMed

Dickinson, Louise; Ahmed, Hashim U; Allen, Clare; Barentsz, Jelle O; Carey, Brendan; Futterer, Jurgen J; Heijmink, Stijn W; Hoskin, Peter J; Kirkham, Alex; Padhani, Anwar R; Persad, Raj; Puech, Philippe; Punwani, Shonit; Sohaib, Aslam S; Tombal, Bertrand; Villers, Arnauld; van der Meulen, Jan; Emberton, Mark

2011-04-01

Multiparametric magnetic resonance imaging (mpMRI) may have a role in detecting clinically significant prostate cancer in men with raised serum prostate-specific antigen levels. Variations in technique and the interpretation of images have contributed to inconsistency in its reported performance characteristics. Our aim was to make recommendations on a standardised method for the conduct, interpretation, and reporting of prostate mpMRI for prostate cancer detection and localisation. A consensus meeting of 16 European prostate cancer experts was held that followed the UCLA-RAND Appropriateness Method and facilitated by an independent chair. Before the meeting, 520 items were scored for "appropriateness" by panel members, discussed face to face, and rescored. Agreement was reached in 67% of 260 items related to imaging sequence parameters. T2-weighted, dynamic contrast-enhanced, and diffusion-weighted MRI were the key sequences incorporated into the minimum requirements. Consensus was also reached on 54% of 260 items related to image interpretation and reporting, including features of malignancy on individual sequences. A 5-point scale was agreed on for communicating the probability of malignancy, with a minimum of 16 prostatic regions of interest, to include a pictorial representation of suspicious foci. Limitations relate to consensus methodology. Dominant personalities are known to affect the opinions of the group and were countered by a neutral chairperson. Consensus was reached on a number of areas related to the conduct, interpretation, and reporting of mpMRI for the detection, localisation, and characterisation of prostate cancer. Before optimal dissemination of this technology, these outcomes will require formal validation in prospective trials. Copyright © 2010 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Directed evolution of an extremely stable fluorescent protein.

PubMed

Kiss, Csaba; Temirov, Jamshid; Chasteen, Leslie; Waldo, Geoffrey S; Bradbury, Andrew R M

2009-05-01

In this paper we describe the evolution of eCGP123, an extremely stable green fluorescent protein based on a previously described fluorescent protein created by consensus engineering (CGP: consensus green protein). eCGP123 could not be denatured by a standard thermal melt, preserved almost full fluorescence after overnight incubation at 80 degrees C and possessed a free energy of denaturation of 12.4 kcal/mol. It was created from CGP by a recursive process involving the sequential introduction of three destabilizing heterologous inserts, evolution to overcome the destabilization and finally 'removal' of the destabilizing insert by gene synthesis. We believe that this approach may be generally applicable to the stabilization of other proteins.
Tumour suppressor protein p53 regulates the stress activated bilirubin oxidase cytochrome P450 2A6

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hu, Hao, E-mail: hao.hu1@uqconnect.edu.au; Yu, Ting, E-mail: t.yu2@uq.edu.au; Arpiainen, Satu, E-mail: Satu.Juhila@orion.fi

2015-11-15

Human cytochrome P450 (CYP) 2A6 enzyme has been proposed to play a role in cellular defence against chemical-induced oxidative stress. The encoding gene is regulated by various stress activated transcription factors. This paper demonstrates that p53 is a novel transcriptional regulator of the gene. Sequence analysis of the CYP2A6 promoter revealed six putative p53 binding sites in a 3 kb proximate promoter region. The site closest to transcription start site (TSS) is highly homologous with the p53 consensus sequence. Transfection with various stepwise deletions of CYP2A6-5′-Luc constructs – down to − 160 bp from the TSS – showed p53 responsivenessmore » in p53 overexpressed C3A cells. However, a further deletion from − 160 to − 74 bp, including the putative p53 binding site, totally abolished the p53 responsiveness. Electrophoretic mobility shift assay with a probe containing the putative binding site showed specific binding of p53. A point mutation at the binding site abolished both the binding and responsiveness of the recombinant gene to p53. Up-regulation of the endogenous p53 with benzo[α]pyrene – a well-known p53 activator – increased the expression of the p53 responsive positive control and the CYP2A6-5′-Luc construct containing the intact p53 binding site but not the mutated CYP2A6-5′-Luc construct. Finally, inducibility of the native CYP2A6 gene by benzo[α]pyrene was demonstrated by dose-dependent increases in CYP2A6 mRNA and protein levels along with increased p53 levels in the nucleus. Collectively, the results indicate that p53 protein is a regulator of the CYP2A6 gene in C3A cells and further support the putative cytoprotective role of CYP2A6. - Highlights: • CYP2A6 is an immediate target gene of p53. • Six putative p53REs located on 3 kb proximate CYP2A6 promoter region. • The region − 160 bp from TSS is highly homologous with the p53 consensus sequence. • P53 specifically bind to the p53RE on the − 160 bp region. • HNF4α may interact with p53 in regulating CYP2A6 expression.« less
A filtering method to generate high quality short reads using illumina paired-end technology.

PubMed

Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L

2013-01-01

Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.
Identification of a Membrane Targeting and Degradation Signal in the p42 Protein of Influenza C Virus

PubMed Central

Pekosz, Andrew; Lamb, Robert A.

2000-01-01

Two mRNA species are derived from the influenza C virus RNA segment six, (i) a colinear transcript containing a 374-amino-acid residue open reading frame (referred to herein as the seg 6 ORF) which is translated to yield the p42 protein, and (ii) a spliced mRNA which encodes the influenza C virus matrix (CM1) protein consisting of the first 242 amino acids of p42. The p42 protein undergoes proteolytic cleavage at a consensus signal peptidase cleavage site after residue 259, yielding the p31 and CM2 proteins. Translocation of p42 into the endoplasmic reticulum membrane occurs cotranslationally and requires the hydrophobic internal signal peptide (residues 239 to 259), as well as the predicted transmembrane domain of CM2 (residues 285 to 308). The p31 protein was found to undergo rapid degradation after cleavage from p42. Addition of the 26S proteasome inhibitor lactacystin to influenza C virus-infected or seg 6 ORF cDNA-transfected cells drastically reduced p31 degradation. Transfer of the 17-residue C-terminal region of p31 to heterologous proteins resulted in their rapid turnover. The hydrophobic nature, but not the specific amino acid sequence of the 17-amino-acid C terminus of p31 appears to act as the signal for targeting the protein to membranes and for degradation. PMID:11044092
Cloning and pharmacological characterization of the rabbit bradykinin B2 receptor.

PubMed

Bachvarov, D R; Saint-Jacques, E; Larrivée, J F; Levesque, L; Rioux, F; Drapeau, G; Marceau, F

1995-12-01

Degenerate primers, corresponding to consensus sequences of third and sixth transmembrane domains of G protein-coupled receptor superfamily, were used for the polymerase chain reaction amplification and consecutive characterization of G protein-coupled receptors present in cultured rabbit aortic smooth muscle cells. One of the isolated resulting fragments was highly homologous to the corresponding region of the bradykinin (BK) B2 receptor cloned in other species. The polymerase chain reaction fragment was used to screen a rabbit genomic library, which allowed the identification of an intronless 1101-nucleotide open reading frame which codes for a 367-amino acid receptor protein. The rabbit B2 receptor sequence is more than 80% identical to the ones determined in three other species and retain putative glycosylation, palmitoylation and phosphorylation sites. In the rabbit genomic sequence, an acceptor splice sequence was found 8 base pairs upstream of the start codon. Northern blot analysis showed a high expression of a major transcript (4.2 kilobases) in the rabbit kidney and duodenum, and a less abundant expression in other tissues. Southern blot experiments suggest that a single copy of this gene exists in the rabbit genome. The cloned rabbit B2 receptor expressed in COS-1 cells binds [3H]BK in a saturable manner (KD 2.1 nM) and this ligand competes with a series of kinin agonists and antagonist with a rank order consistent with the B2 receptor identity. The insurmountable character of the antagonism exerted by Hoe 140 against BK on the rabbit B2 receptor, previously shown in pharmacological experiments, was confirmed in binding experiments with the cloned receptor expressed in a controlled manner. By contrast, Hoe 140 competed with [3H]BK in a surmountable manner for the human B2 receptor expressed in COS-1 cells. The cloning of the rabbit B2 receptor will be useful notably for the study of the structural basis of antagonist binding and for studies on receptor regulation in a relatively large animal.
The Kell protein of the common K2 phenotype is a catalytically active metalloprotease, whereas the rare Kell K1 antigen is inactive. Identification of novel substrates for the Kell protein.

PubMed

Clapéron, Audrey; Rose, Christiane; Gane, Pierre; Collec, Emmanuel; Bertrand, Olivier; Ouimet, Tanja

2005-06-03

The Kell blood group is a highly polymorphic system containing over 20 different antigens borne by the protein Kell, a 93-kDa type II glycoprotein that displays high sequence homology with members of the M13 family of zinc-dependent metalloproteases whose prototypical member is neprilysin. Kell K1 is an antigen expressed in 9% of the Caucasian population, characterized by a point mutation (T193M) of the Kell K2 antigen, and located within a putative N-glycosylation consensus sequence. Recently, a recombinant, non-physiological, soluble form of Kell was shown to cleave Big ET-3 to produce the mature vasoconstrictive peptide. To better characterize the enzymatic activity of the Kell protein and the possible differences introduced by antigenic point mutations affecting post-translational processing, the membrane-bound forms of the Kell K1 and Kell K2 antigens were expressed either in K562 cells, an erythroid cell line, or in HEK293 cells, a non-erythroid system, and their pharmacological profiles and enzymatic specificities toward synthetic and natural peptides were evaluated. Results presented herein reveal that the two antigens possess considerable differences in their enzymatic activities, although not in their trafficking pattern. Indeed, although both antigens are expressed at the cell surface, Kell K1 protein is shown to be inactive, whereas the Kell K2 antigen binds neprilysin inhibitory compounds such as phosphoramidon and thiorphan with high affinity, cleaves the precursors of the endothelin peptides, and inactivates members of the tachykinin family with enzymatic properties resembling those of other members of the M13 family of metalloproteases to which it belongs.
The influence of specific binding of collagen-silk chimeras to silk biomaterials on hMSC behavior

PubMed Central

An, Bo; DesRochers, Teresa M.; Qin, Guokui; Xia, Xiaoxia; Thiagarajan, Geetha; Brodsky, Barbara; Kaplan, David

2012-01-01

Collagen-like proteins in the bacteria Streptococcus pyogenes adopt a triple-helix structure with a thermal stability similar to that of animal collagens, can be expressed in high yield in E. coli and can be easily modified through molecular biology techniques. However, potential applications for such recombinant collagens are limited by their lack of higher order structure to achieve the physical properties needed for most biomaterials. To overcome this problem, the S. pyrogenes collagen domain was fused to a repetitive Bombyx mori silk consensus sequence, as a strategy to direct specific non-covalent binding onto solid silk materials whose superior stability, mechanical and material properties have been previously established. This approach resulted in the successful binding of these new collagen-silk chimeric proteins to silk films and porous scaffolds, and the binding affinity could be controlled by varying the number of repeats in the silk sequence. To explore the potential of collagen-silk chimera for regulating biological activity, integrin (Int) and fibronectin (Fn) binding sequences from mammalian collagens were introduced into the bacterial collagen domain. The attachment of bioactive collagen-silk chimeras to solid silk biomaterials promoted hMSC spreading and proliferation substantially in comparison to the controls. The ability to combine the biomaterial features of silk with the biological activities of collagen allowed more rapid cell interactions with silk-based biomaterials, improved regulation of stem cell growth and differentiation, as well as the formation of artificial extracellular matrices useful for tissue engineering applications. PMID:23088839
Wheat CBF gene family: identification of polymorphisms in the CBF coding sequence.

PubMed

Mohseni, Sara; Che, Hua; Djillali, Zakia; Dumont, Estelle; Nankeu, Joseph; Danyluk, Jean

2012-12-01

Expression of cold-regulated genes needed for protection against freezing stress is mediated, in part, by the CBF transcription factor family. Previous studies with temperate cereals suggested that the CBF gene family in wheat was large, and that CBF genes were at the base of an important low temperature tolerance trait. Therefore, the goal of our study was to identify the CBF repertoire in the freezing-tolerant hexaploid wheat cultivar Norstar, and then to examine if the coding region of CBF genes in two spring cultivars contain polymorphisms that could affect the protein sequence and structure. Our analyses reveal that hexaploid wheat contains a complex CBF family consisting of at least 65 CBF genes of which 60 are known to be expressed in the cultivar Norstar. They represent 27 paralogous genes with 1-3 homeologous copies for the A, B, and D genomes. The cultivar Norstar contains two pseudogenes and at least 24 additional proteins having sequences and (or) structures that deviate from the consensus in the conserved AP2 DNA-binding and (or) C-terminal activation-domains. This suggests that in cultivars such as Norstar, low temperature tolerance may be increased through breeding of additional optimal alleles. The examination of the CBF repertoire present in the two spring cultivars, Chinese Spring and Manitou, reveals that they have additional polymorphisms affecting conserved positions in these domains. Understanding the effects of these polymorphisms will provide additional information for the selection of optimum CBF alleles in Triticeae breeding programs.
Identification and Characterization of the Insecticidal Toxin “Makes Caterpillars Floppy” in Photorhabdus temperata M1021 Using a Cosmid Library

PubMed Central

Ullah, Ihsan; Jang, Eun-Kyung; Kim, Min-Sung; Shin, Jin-Ho; Park, Gun-Seok; Khan, Abdur Rahim; Hong, Sung-Jun; Jung, Byung-Kwon; Choi, JungBae; Park, YeongJun; Kwak, Yunyoung; Shin, Jae-Ho

2014-01-01

Photorhabdus temperata is an entomopathogenic enterobacterium; it is a nematode symbiont that possesses pathogenicity islands involved in insect virulence. Herein, we constructed a P. temperata M1021 cosmid library in Escherichia coli XL1-Blue MRF` and obtained 7.14 × 105 clones. However, only 1020 physiologically active clones were screened for insect virulence factors by injection of each E. coli cosmid clone into Galleria mellonella and Tenebrio molitor larvae. A single cosmid clone, PtC1015, was consequently selected due to its characteristic virulent properties, e.g., loss of body turgor followed by death of larvae when the clone was injected into the hemocoel. The sequence alignment against the available sequences in Swiss-Prot and NCBI databases, confirmed the presence of the mcf gene homolog in the genome of P. temperata M1021 showing 85% homology and 98% query coverage with the P. luminescens counterpart. Furthermore, a 2932 amino acid long Mcf protein revealed limited similarity with three protein domains. The N-terminus of the Mcf encompassed consensus sequence for a BH3 domain, the central region revealed similarity to toxin B, and the C-terminus of Mcf revealed similarity to the bacterial export domain of ApxIVA, an RTX-like toxin. In short, the Mcf toxin is likely to play a role in the elimination of insect pests, making it a promising model for use in the agricultural field. PMID:25014195
A tree of life based on ninety-eight expressed genes conserved across diverse eukaryotic species

PubMed Central

Jayaswal, Pawan Kumar; Dogra, Vivek; Shanker, Asheesh; Sharma, Tilak Raj

2017-01-01

Rapid advances in DNA sequencing technologies have resulted in the accumulation of large data sets in the public domain, facilitating comparative studies to provide novel insights into the evolution of life. Phylogenetic studies across the eukaryotic taxa have been reported but on the basis of a limited number of genes. Here we present a genome-wide analysis across different plant, fungal, protist, and animal species, with reference to the 36,002 expressed genes of the rice genome. Our analysis revealed 9831 genes unique to rice and 98 genes conserved across all 49 eukaryotic species analysed. The 98 genes conserved across diverse eukaryotes mostly exhibited binding and catalytic activities and shared common sequence motifs; and hence appeared to have a common origin. The 98 conserved genes belonged to 22 functional gene families including 26S protease, actin, ADP–ribosylation factor, ATP synthase, casein kinase, DEAD-box protein, DnaK, elongation factor 2, glyceraldehyde 3-phosphate, phosphatase 2A, ras-related protein, Ser/Thr protein phosphatase family protein, tubulin, ubiquitin and others. The consensus Bayesian eukaryotic tree of life developed in this study demonstrated widely separated clades of plants, fungi, and animals. Musa acuminata provided an evolutionary link between monocotyledons and dicotyledons, and Salpingoeca rosetta provided an evolutionary link between fungi and animals, which indicating that protozoan species are close relatives of fungi and animals. The divergence times for 1176 species pairs were estimated accurately by integrating fossil information with synonymous substitution rates in the comprehensive set of 98 genes. The present study provides valuable insight into the evolution of eukaryotes. PMID:28922368
The rpoE operon regulates heat stress response in Burkholderia pseudomallei.

PubMed

Vanaporn, Muthita; Vattanaviboon, Paiboon; Thongboonkerd, Visith; Korbsrisate, Sunee

2008-07-01

Burkholderia pseudomallei is a gram-negative bacterium and the causative agent of melioidosis, one of the important lethal diseases in tropical regions. In this article, we demonstrate the crucial role of the B. pseudomallei rpoE locus in the response to heat stress. The rpoE operon knockout mutant exhibited growth retardation and reduced survival when exposed to a high temperature. Expression analysis using rpoH promoter-lacZ fusion revealed that heat stress induction of rpoH, which encodes heat shock sigma factor (sigma(H)), was abolished in the B. pseudomallei rpoE mutant. Analysis of the rpoH promoter region revealed sequences sharing high homology to the consensus sequence of sigma(E)-dependent promoters. Moreover, the putative heat-induced sigma(H)-regulated heat shock proteins (i.e. GroEL and HtpG) were also absent in the rpoE operon mutant. Altogether, our data suggest that the rpoE operon regulates B. pseudomallei heat stress response through the function of rpoH.
Genome-wide chromatin footprinting reveals changes in replication origin architecture induced by pre-RC assembly

PubMed Central

MacAlpine, Heather K.; Lubelsky, Yoav; Hartemink, Alexander J.

2015-01-01

Start sites of DNA replication are marked by the origin recognition complex (ORC), which coordinates Mcm2–7 helicase loading to form the prereplicative complex (pre-RC). Although pre-RC assembly is well characterized in vitro, the process is poorly understood within the local chromatin environment surrounding replication origins. To reveal how the chromatin architecture modulates origin selection and activation, we “footprinted” nucleosomes, transcription factors, and replication proteins at multiple points during the Saccharomyces cerevisiae cell cycle. Our nucleotide-resolution protein occupancy profiles resolved a precise ORC-dependent footprint at 269 origins in G2. A separate class of inefficient origins exhibited protein occupancy only in G1, suggesting that stable ORC chromatin association in G2 is a determinant of origin efficiency. G1 nucleosome remodeling concomitant with pre-RC assembly expanded the origin nucleosome-free region and enhanced activation efficiency. Finally, the local chromatin environment restricts the loading of the Mcm2–7 double hexamer either upstream of or downstream from the ARS consensus sequence (ACS). PMID:25593310
A proactive role of water molecules in acceptor recognition by protein O-fucosyltransferase 2.

PubMed

Valero-González, Jessika; Leonhard-Melief, Christina; Lira-Navarrete, Erandi; Jiménez-Osés, Gonzalo; Hernández-Ruiz, Cristina; Pallarés, María Carmen; Yruela, Inmaculada; Vasudevan, Deepika; Lostao, Anabel; Corzana, Francisco; Takeuchi, Hideyuki; Haltiwanger, Robert S; Hurtado-Guerrero, Ramon

2016-04-01

Protein O-fucosyltransferase 2 (POFUT2) is an essential enzyme that fucosylates serine and threonine residues of folded thrombospondin type 1 repeats (TSRs). To date, the mechanism by which this enzyme recognizes very dissimilar TSRs has been unclear. By engineering a fusion protein, we report the crystal structure of Caenorhabditis elegans POFUT2 (CePOFUT2) in complex with GDP and human TSR1 that suggests an inverting mechanism for fucose transfer assisted by a catalytic base and shows that nearly half of the TSR1 is embraced by CePOFUT2. A small number of direct interactions and a large network of water molecules maintain the complex. Site-directed mutagenesis demonstrates that POFUT2 fucosylates threonine preferentially over serine and relies on folded TSRs containing the minimal consensus sequence C-X-X-S/T-C. Crystallographic and mutagenesis data, together with atomic-level simulations, uncover a binding mechanism by which POFUT2 promiscuously recognizes the structural fingerprint of poorly homologous TSRs through a dynamic network of water-mediated interactions.
Predictive Structure and Topology of Peroxisomal ATP-Binding Cassette (ABC) Transporters

PubMed Central

Andreoletti, Pierre; Raas, Quentin; Gondcaille, Catherine; Cherkaoui-Malki, Mustapha; Trompier, Doriane; Savary, Stéphane

2017-01-01

The peroxisomal ATP-binding Cassette (ABC) transporters, which are called ABCD1, ABCD2 and ABCD3, are transmembrane proteins involved in the transport of various lipids that allow their degradation inside the organelle. Defective ABCD1 leads to the accumulation of very long-chain fatty acids and is associated with a complex and severe neurodegenerative disorder called X-linked adrenoleukodystrophy (X-ALD). Although the nucleotide-binding domain is highly conserved and characterized within the ABC transporters family, solid data are missing for the transmembrane domain (TMD) of ABCD proteins. The lack of a clear consensus on the secondary and tertiary structure of the TMDs weakens any structure-function hypothesis based on the very diverse ABCD1 mutations found in X-ALD patients. Therefore, we first reinvestigated thoroughly the structure-function data available and performed refined alignments of ABCD protein sequences. Based on the 2.85 Å resolution crystal structure of the mitochondrial ABC transporter ABCB10, here we propose a structural model of peroxisomal ABCD proteins that specifies the position of the transmembrane and coupling helices, and highlight functional motifs and putative important amino acid residues. PMID:28737695
The yeast genome may harbor hypoxia response elements (HRE).

PubMed

Ferreira, Túlio César; Hertzberg, Libi; Gassmann, Max; Campos, Elida Geralda

2007-01-01

The hypoxia-inducible factor-1 (HIF-1) is a heterodimeric transcription factor activated when cells are submitted to hypoxia. The heterodimer is composed of two subunits, HIF-1alpha and the constitutively expressed HIF-1beta. During normoxia, HIF-1alpha is degraded by the 26S proteasome, but hypoxia causes HIF-1alpha to be stabilized, enter the nucleus and bind to HIF-1beta, thus forming the active complex. The complex then binds to the regulatory sequences of various genes involved in physiological and pathological processes. The specific regulatory sequence recognized by HIF-1 is the hypoxia response element (HRE) that has the consensus sequence 5'BRCGTGVBBB3'. Although the basic transcriptional regulation machinery is conserved between yeast and mammals, Saccharomyces cerevisiae does not express HIF-1 subunits. However, we hypothesized that baker's yeast has a protein analogous to HIF-1 which participates in the response to changes in oxygen levels by binding to HRE sequences. In this study we screened the yeast genome for HREs using probabilistic motif search tools. We described 24 yeast genes containing motifs with high probability of being HREs (p-value<0.1) and classified them according to biological function. Our results show that S. cerevisiae may harbor HREs and indicate that a transcription factor analogous to HIF-1 may exist in this organism.
The complete sequence and promoter activity of the human A-raf-1 gene (ARAF1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, J.E.; Beck, T.W.; Brennscheidt, U.

1994-03-01

The raf proto-oncogenes encode cytoplasmic protein serine/threonine kinases, which play a critical role in cell growth and development. One of these, A-raf-1 (human gene symbol, ARAF1), which is predominantly expressed in mouse urogenital tissues, has been mapped to an evolutionarily conserved linkage group composed of ARAF1, SYN1, TIMP, and properdin located at human chromosome Xp11.2. The authors have isolated human genomic DNA clones containing the expressed gene (ARAF1) on the X chromosome and a pseudogene (ARAF2) on chromosome 7p12-q11.21. Analysis of the nucleotide sequence from the ARAF1 genomic clones demonstrated that it consists of 16 exons encoded by minimally 10,776more » nucleotides. The major transcriptional start site (+1) was determined by RNase protection and primer extension assays. Promoter activity was confirmed by functional assays using DNA fragments fused to a CAT reporter gene. The ARAF1 minimal promoter, located between nucleotides -59 and +93, has a low G + C content and lacks consensus TATA and Inr sequences but shows sequence similarity at position -1 to the E box that is known to interact with USF and TFII-I transcription factors. 65 refs., 7 figs., 1 tab.« less
Molecular Control of Polyene Macrolide Biosynthesis

PubMed Central

Santos-Aberturas, Javier; Vicente, Cláudia M.; Guerra, Susana M.; Payero, Tamara D.; Martín, Juan F.; Aparicio, Jesús F.

2011-01-01

Control of polyene macrolide production in Streptomyces natalensis is mediated by the transcriptional activator PimM. This regulator, which combines an N-terminal PAS domain with a C-terminal helix-turn-helix motif, is highly conserved among polyene biosynthetic gene clusters. PimM, truncated forms of the protein without the PAS domain (PimMΔPAS), and forms containing just the DNA-binding domain (DBD) (PimMDBD) were overexpressed in Escherichia coli as GST-fused proteins. GST-PimM binds directly to eight promoters of the pimaricin cluster, as demonstrated by electrophoretic mobility shift assays. Assays with truncated forms of the protein revealed that the PAS domain does not mediate specificity or the distinct recognition of target genes, which rely on the DBD domain, but significantly reduces binding affinity up to 500-fold. Transcription start points were identified by 5′-rapid amplification of cDNA ends, and the binding regions of PimMDBD were investigated by DNase I protection studies. In all cases, binding took place covering the −35 hexamer box of each promoter, suggesting an interaction of PimM and RNA polymerase to cause transcription activation. Information content analysis of the 16 sequences protected in target promoters was used to deduce the structure of the PimM-binding site. This site displays dyad symmetry, spans 14 nucleotides, and adjusts to the consensus TVGGGAWWTCCCBA. Experimental validation of this binding site was performed by using synthetic DNA duplexes. Binding of PimM to the promoter region of one of the polyketide synthase genes from the Streptomyces nodosus amphotericin cluster containing the consensus binding site was also observed, thus proving the applicability of the findings reported here to other antifungal polyketides. PMID:21187288
Unusual glycosylation of proteins: Beyond the universal sequon and other amino acids.

PubMed

Dutta, Devawati; Mandal, Chhabinath; Mandal, Chitra

2017-12-01

Glycosylation of proteins is the most common, multifaceted co- and post-translational modification responsible for many biological processes and cellular functions. Significant alterations and aberrations of these processes are related to various pathological conditions, and often turn out to be disease biomarkers. Conventional N-glycosylation occurs through the recognition of the consensus sequon, asparagine (Asn)-X-serine (Ser)/threonine (Thr), where X is any amino acid except for proline, with N-acetylglucosamine (GlcNAc) as the first glycosidic linkage. Usually, O-glycosylation adds a glycan to the hydroxyl group of Ser or Thr beginning with N-acetylgalactosamine (GalNAc). Protein glycosylation is further governed by additional diversifications in sequon and structure, which are yet to be fully explored. This review mainly focuses on the occurrence of N-glycosylation in non-consensus motifs, where Ser/Thr at the +2 position is substituted by other amino acids. Additionally, N-glycosylation is also observed in other amide/amine group-containing amino acids. Similarly, O-glycosylation occurs at hydroxyl group-containing amino acids other than serine/threonine. The neighbouring amino acids and local structural features around the potential glycosylation site also play a significant role in determining the extent of glycosylation. All of these phenomena that yield glycosylation at the atypical sites are reported in a variety of biological systems, including different pathological conditions. Therefore, the discovery of more novel sequence patterns for N- and O-glycosylation may help in understanding the functions of complex biological processes and cellular functions. Taken together, all these information provided in this review would be helpful for the biological readers. Copyright © 2017 Elsevier B.V. All rights reserved.
A systems wide mass spectrometric based linear motif screen to identify dominant in-vivo interacting proteins for the ubiquitin ligase MDM2.

PubMed

Nicholson, Judith; Scherl, Alex; Way, Luke; Blackburn, Elizabeth A; Walkinshaw, Malcolm D; Ball, Kathryn L; Hupp, Ted R

2014-06-01

Linear motifs mediate protein-protein interactions (PPI) that allow expansion of a target protein interactome at a systems level. This study uses a proteomics approach and linear motif sub-stratifications to expand on PPIs of MDM2. MDM2 is a multi-functional protein with over one hundred known binding partners not stratified by hierarchy or function. A new linear motif based on a MDM2 interaction consensus is used to select novel MDM2 interactors based on Nutlin-3 responsiveness in a cell-based proteomics screen. MDM2 binds a subset of peptide motifs corresponding to real proteins with a range of allosteric responses to MDM2 ligands. We validate cyclophilin B as a novel protein with a consensus MDM2 binding motif that is stabilised by Nutlin-3 in vivo, thus identifying one of the few known interactors of MDM2 that is stabilised by Nutlin-3. These data invoke two modes of peptide binding at the MDM2 N-terminus that rely on a consensus core motif to control the equilibrium between MDM2 binding proteins. This approach stratifies MDM2 interacting proteins based on the linear motif feature and provides a new biomarker assay to define clinically relevant Nutlin-3 responsive MDM2 interactors. Copyright © 2014 Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.