Sample records for protein primary sequence

  1. Metamorphic Proteins: Emergence of Dual Protein Folds from One Primary Sequence.

    PubMed

    Lella, Muralikrishna; Mahalakshmi, Radhakrishnan

    2017-06-20

    Every amino acid exhibits a different propensity for distinct structural conformations. Hence, decoding how the primary amino acid sequence undergoes the transition to a defined secondary structure and its final three-dimensional fold is presently considered predictable with reasonable certainty. However, protein sequences that defy the first principles of secondary structure prediction (they attain two different folds) have recently been discovered. Such proteins, aptly named metamorphic proteins, decrease the conformational constraint by increasing flexibility in the secondary structure and thereby result in efficient functionality. In this review, we discuss the major factors driving the conformational switch related both to protein sequence and to structure using illustrative examples. We discuss the concept of an evolutionary transition in sequence and structure, the functional impact of the tertiary fold, and the pressure of intrinsic and external factors that give rise to metamorphic proteins. We mainly focus on the major components of protein architecture, namely, the α-helix and β-sheet segments, which are involved in conformational switching within the same or highly similar sequences. These chameleonic sequences are widespread in both cytosolic and membrane proteins, and these folds are equally important for protein structure and function. We discuss the implications of metamorphic proteins and chameleonic peptide sequences in de novo peptide design.

  2. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  3. Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

    PubMed Central

    Schaeffer, E; Sninsky, J J

    1984-01-01

    Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835

  4. Transcriptomic Analysis of the Salivary Glands of an Invasive Whitefly

    PubMed Central

    Su, Yun-Lin; Li, Jun-Min; Li, Meng; Luan, Jun-Bo; Ye, Xiao-Dong; Wang, Xiao-Wei; Liu, Shu-Sheng

    2012-01-01

    Background Some species of the whitefly Bemisia tabaci complex cause tremendous losses to crops worldwide through feeding directly and virus transmission indirectly. The primary salivary glands of whiteflies are critical for their feeding and virus transmission. However, partly due to their tiny size, research on whitefly salivary glands is limited and our knowledge on these glands is scarce. Methodology/Principal Findings We sequenced the transcriptome of the primary salivary glands of the Mediterranean species of B. tabaci complex using an effective cDNA amplification method in combination with short read sequencing (Illumina). In a single run, we obtained 13,615 unigenes. The quantity of the unigenes obtained from the salivary glands of the whitefly is at least four folds of the salivary gland genes from other plant-sucking insects. To reveal the functions of the primary glands, sequence similarity search and comparisons with the whole transcriptome of the whitefly were performed. The results demonstrated that the genes related to metabolism and transport were significantly enriched in the primary salivary glands. Furthermore, we found that a number of highly expressed genes in the salivary glands might be involved in secretory protein processing, secretion and virus transmission. To identify potential proteins of whitefly saliva, the translated unigenes were put into secretory protein prediction. Finally, 295 genes were predicted to encode secretory proteins and some of them might play important roles in whitefly feeding. Conclusions/Significance: The combined method of cDNA amplification, Illumina sequencing and de novo assembly is suitable for transcriptomic analysis of tiny organs in insects. Through analysis of the transcriptome, genomic features of the primary salivary glands were dissected and biologically important proteins, especially secreted proteins, were predicted. Our findings provide substantial sequence information for the primary salivary glands of whiteflies and will be the basis for future studies on whitefly-plant interactions and virus transmission. PMID:22745728

  5. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  6. Coiled-coil length: Size does matter.

    PubMed

    Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B

    2015-12-01

    Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.

  7. Protein Interaction Profile Sequencing (PIP-seq).

    PubMed

    Foley, Shawn W; Gregory, Brian D

    2016-10-10

    Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  8. A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP

    ERIC Educational Resources Information Center

    Medin, Carey L.; Nolin, Katie L.

    2011-01-01

    Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…

  9. Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950

    PubMed Central

    Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick

    1982-01-01

    The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262

  10. Controlling Self-Assembly of Engineered Peptides on Graphite by Rational Mutation

    PubMed Central

    So, Christopher R.; Hayamizu, Yuhei; Yazici, Hilal; Gresswell, Carolyn; Khatayevich, Dmitriy; Tamerler, Candan; Sarikaya, Mehmet

    2012-01-01

    Self-assembly of proteins on surfaces is utilized in many fields to integrate intricate biological structures and diverse functions with engineered materials. Controlling proteins at bio-solid interfaces relies on establishing key correlations between their primary sequences and resulting spatial organizations on substrates. Protein self-assembly, however, remains an engineering challenge. As a novel approach, we demonstrate here that short dodecapeptides selected by phage display are capable of self-assembly on graphite and form long-range ordered biomolecular nanostructures. Using atomic force microscopy and contact angle studies, we identify three amino-acid domains along the primary sequence that steer peptide ordering and lead to nanostructures with uniformly displayed residues. The peptides are further engineered via simple mutations to control fundamental interfacial processes, including initial binding, surface aggregation and growth kinetics, and intermolecular interactions. Tailoring short peptides via their primary sequence offers versatile control over molecular self-assembly, resulting in well-defined surface properties essential in building engineered, chemically rich, bio-solid interfaces. PMID:22233341

  11. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence.

    PubMed

    Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin

    2011-08-21

    Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems. Copyright © 2011 Elsevier Ltd. All rights reserved.

  12. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints

    PubMed Central

    Chan, Yvonne H.; Venev, Sergey V.; Zeldovich, Konstantin B.; Matthews, C. Robert

    2017-01-01

    Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. PMID:28262665

  13. The primary structure of fatty-acid-binding protein from nurse shark liver. Structural and evolutionary relationship to the mammalian fatty-acid-binding protein family.

    PubMed

    Medzihradszky, K F; Gibson, B W; Kaur, S; Yu, Z H; Medzihradszky, D; Burlingame, A L; Bass, N M

    1992-02-01

    The primary structure of a fatty-acid-binding protein (FABP) isolated from the liver of the nurse shark (Ginglymostoma cirratum) was determined by high-performance tandem mass spectrometry (employing multichannel array detection) and Edman degradation. Shark liver FABP consists of 132 amino acids with an acetylated N-terminal valine. The chemical molecular mass of the intact protein determined by electrospray ionization mass spectrometry (Mr = 15124 +/- 2.5) was in good agreement with that calculated from the amino acid sequence (Mr = 15121.3). The amino acid sequence of shark liver FABP displays significantly greater similarity to the FABP expressed in mammalian heart, peripheral nerve myelin and adipose tissue (61-53% sequence similarity) than to the FABP expressed in mammalian liver (22% similarity). Phylogenetic trees derived from the comparison of the shark liver FABP amino acid sequence with the members of the mammalian fatty-acid/retinoid-binding protein gene family indicate the initial divergence of an ancestral gene into two major subfamilies: one comprising the genes for mammalian liver FABP and gastrotropin, the other comprising the genes for mammalian cellular retinol-binding proteins I and II, cellular retinoic-acid-binding protein myelin P2 protein, adipocyte FABP, heart FABP and shark liver FABP, the latter having diverged from the ancestral gene that ultimately gave rise to the present day mammalian heart-FABP, adipocyte FABP and myelin P2 protein sequences. The sequence for intestinal FABP from the rat could be assigned to either subfamily, depending on the approach used for phylogenetic tree construction, but clearly diverged at a relatively early evolutionary time point. Indeed, sequences proximately ancestral or closely related to mammalian intestinal FABP, liver FABP, gastrotropin and the retinoid-binding group of proteins appear to have arisen prior to the divergence of shark liver FABP and should therefore also be present in elasmobranchs. The presence in shark liver of an FABP which differs substantially in primary structure from mammalian liver FABP, while being closely related to the FABP expressed in mammalian heart muscle, peripheral nerve myelin and adipocytes, opens a further dimension regarding the question of the existence of structure-dependent and tissue-specific specialization of FABP function in lipid metabolism.

  14. Primary structures of ribosomal proteins from the archaebacterium Halobacterium marismortui and the eubacterium Bacillus stearothermophilus.

    PubMed

    Arndt, E; Scholzen, T; Krömer, W; Hatakeyama, T; Kimura, M

    1991-06-01

    Approximately 40 ribosomal proteins from each Halobacterium marismortui and Bacillus stearothermophilus have been sequenced either by direct protein sequence analysis or by DNA sequence analysis of the appropriate genes. The comparison of the amino acid sequences from the archaebacterium H marismortui with the available ribosomal proteins from the eubacterial and eukaryotic kingdoms revealed four different groups of proteins: 24 proteins are related to both eubacterial as well as eukaryotic proteins. Eleven proteins are exclusively related to eukaryotic counterparts. For three proteins only eubacterial relatives-and for another three proteins no counterpart-could be found. The similarities of the halobacterial ribosomal proteins are in general somewhat higher to their eukaryotic than to their eubacterial counterparts. The comparison of B stearothermophilus proteins with their E coli homologues showed that the proteins evolved at different rates. Some proteins are highly conserved with 64-76% identity, others are poorly conserved with only 25-34% identical amino acid residues.

  15. SubCellProt: predicting protein subcellular localization using machine learning approaches.

    PubMed

    Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan

    2009-01-01

    High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.

  16. The primary structures of ribosomal proteins L16, L23 and L33 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Hatakeyama, T; Kimura, M

    1988-11-21

    The complete amino acid sequences of ribosomal proteins L16, L23 and L33 from the archaebacterium Halobacterium marismortui were determined. The sequences were established by manual sequencing of peptides produced with several proteases as well as by cleavage with dilute HCl. Proteins L16, L23 and L33 consist of 119, 154 and 69 amino acid residues, and their molecular masses are 13,538, 16,812 and 7620 Da, respectively. The comparison of their sequences with those of ribosomal proteins from other organisms revealed that L23 and L33 are related to eubacterial ribosomal proteins from Escherichia coli and Bacillus stearothermophilus, while protein L16 was found to be homologous to a eukaryotic ribosomal protein from yeast. These results provide information about the special phylogenetic position of archaebacteria.

  17. Single-molecule Protein Unfolding in Solid State Nanopores

    PubMed Central

    Talaga, David S.; Li, Jiali

    2009-01-01

    We use single silicon nitride nanopores to study folded, partially folded and unfolded single proteins by measuring their excluded volumes. The DNA-calibrated translocation signals of β-lactoglobulin and histidine-containing phosphocarrier protein match quantitatively with that predicted by a simple sum of the partial volumes of the amino acids in the polypeptide segment inside the pore when translocation stalls due to the primary charge sequence. Our analysis suggests that the majority of the protein molecules were linear or looped during translocation and that the electrical forces present under physiologically relevant potentials can unfold proteins. Our results show that the nanopore translocation signals are sensitive enough to distinguish the folding state of a protein and distinguish between proteins based on the excluded volume of a local segment of the polypeptide chain that transiently stalls in the nanopore due to the primary sequence of charges. PMID:19530678

  18. System in biology leading to cell pathology: stable protein-protein interactions after covalent modifications by small molecules or in transgenic cells.

    PubMed

    Malina, Halina Z

    2011-01-19

    The physiological processes in the cell are regulated by reversible, electrostatic protein-protein interactions. Apoptosis is such a regulated process, which is critically important in tissue homeostasis and development and leads to complete disintegration of the cell. Pathological apoptosis, a process similar to apoptosis, is associated with aging and infection. The current study shows that pathological apoptosis is a process caused by the covalent interactions between the signaling proteins, and a characteristic of this pathological network is the covalent binding of calmodulin to regulatory sequences. Small molecules able to bind covalently to the amino group of lysine, histidine, arginine, or glutamine modify the regulatory sequences of the proteins. The present study analyzed the interaction of calmodulin with the BH3 sequence of Bax, and the calmodulin-binding sequence of myristoylated alanine-rich C-kinase substrate in the presence of xanthurenic acid in primary retinal epithelium cell cultures and murine epithelial fibroblast cell lines transformed with SV40 (wild type [WT], Bid knockout [Bid-/-], and Bax-/-/Bak-/- double knockout [DKO]). Cell death was observed to be associated with the covalent binding of calmodulin, in parallel, to the regulatory sequences of proteins. Xanthurenic acid is known to activate caspase-3 in primary cell cultures, and the results showed that this activation is also observed in WT and Bid-/- cells, but not in DKO cells. However, DKO cells were not protected against death, but high rates of cell death occurred by detachment. The results showed that small molecules modify the basic amino acids in the regulatory sequences of proteins leading to covalent interactions between the modified sequences (e.g., calmodulin to calmodulin-binding sites). The formation of these polymers (aggregates) leads to an unregulated and, consequently, pathological protein network. The results suggest a mechanism for the involvement of small molecules in disease development. In the knockout cells, incorrect interactions between proteins were observed without the protein modification by small molecules, indicating the abnormality of the protein network in the transgenic system. The irreversible protein-protein interactions lead to protein aggregation and cell degeneration, which are observed in all aging-associated diseases.

  19. System in biology leading to cell pathology: stable protein-protein interactions after covalent modifications by small molecules or in transgenic cells

    PubMed Central

    2011-01-01

    Background The physiological processes in the cell are regulated by reversible, electrostatic protein-protein interactions. Apoptosis is such a regulated process, which is critically important in tissue homeostasis and development and leads to complete disintegration of the cell. Pathological apoptosis, a process similar to apoptosis, is associated with aging and infection. The current study shows that pathological apoptosis is a process caused by the covalent interactions between the signaling proteins, and a characteristic of this pathological network is the covalent binding of calmodulin to regulatory sequences. Results Small molecules able to bind covalently to the amino group of lysine, histidine, arginine, or glutamine modify the regulatory sequences of the proteins. The present study analyzed the interaction of calmodulin with the BH3 sequence of Bax, and the calmodulin-binding sequence of myristoylated alanine-rich C-kinase substrate in the presence of xanthurenic acid in primary retinal epithelium cell cultures and murine epithelial fibroblast cell lines transformed with SV40 (wild type [WT], Bid knockout [Bid-/-], and Bax-/-/Bak-/- double knockout [DKO]). Cell death was observed to be associated with the covalent binding of calmodulin, in parallel, to the regulatory sequences of proteins. Xanthurenic acid is known to activate caspase-3 in primary cell cultures, and the results showed that this activation is also observed in WT and Bid-/- cells, but not in DKO cells. However, DKO cells were not protected against death, but high rates of cell death occurred by detachment. Conclusions The results showed that small molecules modify the basic amino acids in the regulatory sequences of proteins leading to covalent interactions between the modified sequences (e.g., calmodulin to calmodulin-binding sites). The formation of these polymers (aggregates) leads to an unregulated and, consequently, pathological protein network. The results suggest a mechanism for the involvement of small molecules in disease development. In the knockout cells, incorrect interactions between proteins were observed without the protein modification by small molecules, indicating the abnormality of the protein network in the transgenic system. The irreversible protein-protein interactions lead to protein aggregation and cell degeneration, which are observed in all aging-associated diseases. PMID:21247434

  20. Coarse-grained sequences for protein folding and design.

    PubMed

    Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa

    2003-09-16

    We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.

  1. Coarse-grained sequences for protein folding and design

    PubMed Central

    Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa

    2003-01-01

    We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815

  2. Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

    PubMed

    Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

    2014-11-20

    Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wemmer, D.E.; Kumar, N.V.; Metrione, R.M.

    Toxin II from Radianthus paumotensis (Rp/sub II/) has been investigated by high-resolution NMR and chemical sequencing methods. Resonance assignments have been obtained for this protein by the sequential approach. NMR assignments could not be made consistent with the previously reported primary sequence for this protein, and chemical methods have been used to determine a sequence with which the NMR data are consistent. Analysis of the 2D NOE spectra shows that the protein secondary structure is comprised of two sequences of ..beta..-sheet, probably joined into a distorted continuous sheet, connected by turns and extended loops, without any regular ..cap alpha..-helical segments.more » The residues previously implicated in activity in this class of proteins, D8 and R13, occur in a loop region.« less

  4. Comparison of intrinsic dynamics of cytochrome p450 proteins using normal mode analysis

    PubMed Central

    Dorner, Mariah E; McMunn, Ryan D; Bartholow, Thomas G; Calhoon, Brecken E; Conlon, Michelle R; Dulli, Jessica M; Fehling, Samuel C; Fisher, Cody R; Hodgson, Shane W; Keenan, Shawn W; Kruger, Alyssa N; Mabin, Justin W; Mazula, Daniel L; Monte, Christopher A; Olthafer, Augustus; Sexton, Ashley E; Soderholm, Beatrice R; Strom, Alexander M; Hati, Sanchita

    2015-01-01

    Cytochrome P450 enzymes are hemeproteins that catalyze the monooxygenation of a wide-range of structurally diverse substrates of endogenous and exogenous origin. These heme monooxygenases receive electrons from NADH/NADPH via electron transfer proteins. The cytochrome P450 enzymes, which constitute a diverse superfamily of more than 8,700 proteins, share a common tertiary fold but < 25% sequence identity. Based on their electron transfer protein partner, cytochrome P450 proteins are classified into six broad classes. Traditional methods of pro are based on the canonical paradigm that attributes proteins' function to their three-dimensional structure, which is determined by their primary structure that is the amino acid sequence. It is increasingly recognized that protein dynamics play an important role in molecular recognition and catalytic activity. As the mobility of a protein is an intrinsic property that is encrypted in its primary structure, we examined if different classes of cytochrome P450 enzymes display any unique patterns of intrinsic mobility. Normal mode analysis was performed to characterize the intrinsic dynamics of five classes of cytochrome P450 proteins. The present study revealed that cytochrome P450 enzymes share a strong dynamic similarity (root mean squared inner product > 55% and Bhattacharyya coefficient > 80%), despite the low sequence identity (< 25%) and sequence similarity (< 50%) across the cytochrome P450 superfamily. Noticeable differences in Cα atom fluctuations of structural elements responsible for substrate binding were noticed. These differences in residue fluctuations might be crucial for substrate selectivity in these enzymes. PMID:26130403

  5. Cloning and expression of cDNA coding for bouganin.

    PubMed

    den Hartog, Marcel T; Lubelli, Chiara; Boon, Louis; Heerkens, Sijmie; Ortiz Buijsse, Antonio P; de Boer, Mark; Stirpe, Fiorenzo

    2002-03-01

    Bouganin is a ribosome-inactivating protein that recently was isolated from Bougainvillea spectabilis Willd. In this work, the cloning and expression of the cDNA encoding for bouganin is described. From the cDNA, the amino-acid sequence was deduced, which correlated with the primary sequence data obtained by amino-acid sequencing on the native protein. Bouganin is synthesized as a pro-peptide consisting of 305 amino acids, the first 26 of which act as a leader signal while the 29 C-terminal amino acids are cleaved during processing of the molecule. The mature protein consists of 250 amino acids. Using the cDNA sequence encoding the mature protein of 250 amino acids, a recombinant protein was expressed, purified and characterized. The recombinant molecule had similar activity in a cell-free protein synthesis assay and had comparable toxicity on living cells as compared to the isolated native bouganin.

  6. Hyperdiversity of Genes Encoding Integral Light-Harvesting Proteins in the Dinoflagellate Symbiodinium sp

    PubMed Central

    Boldt, Lynda; Yellowlees, David; Leggat, William

    2012-01-01

    The superfamily of light-harvesting complex (LHC) proteins is comprised of proteins with diverse functions in light-harvesting and photoprotection. LHC proteins bind chlorophyll (Chl) and carotenoids and include a family of LHCs that bind Chl a and c. Dinophytes (dinoflagellates) are predominantly Chl c binding algal taxa, bind peridinin or fucoxanthin as the primary carotenoid, and can possess a number of LHC subfamilies. Here we report 11 LHC sequences for the chlorophyll a-chlorophyll c 2-peridinin protein complex (acpPC) subfamily isolated from Symbiodinium sp. C3, an ecologically important peridinin binding dinoflagellate taxa. Phylogenetic analysis of these proteins suggests the acpPC subfamily forms at least three clades within the Chl a/c binding LHC family; Clade 1 clusters with rhodophyte, cryptophyte and peridinin binding dinoflagellate sequences, Clade 2 with peridinin binding dinoflagellate sequences only and Clades 3 with heterokontophytes, fucoxanthin and peridinin binding dinoflagellate sequences. PMID:23112815

  7. Predicting Human Protein Subcellular Locations by the Ensemble of Multiple Predictors via Protein-Protein Interaction Network with Edge Clustering Coefficients

    PubMed Central

    Du, Pufeng; Wang, Lusheng

    2014-01-01

    One of the fundamental tasks in biology is to identify the functions of all proteins to reveal the primary machinery of a cell. Knowledge of the subcellular locations of proteins will provide key hints to reveal their functions and to understand the intricate pathways that regulate biological processes at the cellular level. Protein subcellular location prediction has been extensively studied in the past two decades. A lot of methods have been developed based on protein primary sequences as well as protein-protein interaction network. In this paper, we propose to use the protein-protein interaction network as an infrastructure to integrate existing sequence based predictors. When predicting the subcellular locations of a given protein, not only the protein itself, but also all its interacting partners were considered. Unlike existing methods, our method requires neither the comprehensive knowledge of the protein-protein interaction network nor the experimentally annotated subcellular locations of most proteins in the protein-protein interaction network. Besides, our method can be used as a framework to integrate multiple predictors. Our method achieved 56% on human proteome in absolute-true rate, which is higher than the state-of-the-art methods. PMID:24466278

  8. Direct Calculation of Protein Fitness Landscapes through Computational Protein Design

    PubMed Central

    Au, Loretta; Green, David F.

    2016-01-01

    Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A∗ search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones. PMID:26745411

  9. Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach

    PubMed Central

    Pal Choudhury, Pabitra

    2017-01-01

    Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850

  10. Topology and cellular localization of the small hydrophobic protein of avian metapneumovirus

    USDA-ARS?s Scientific Manuscript database

    The small hydrophobic protein (SH) is a type II integral membrane protein that is packaged into virions and is only present in certain paramyxoviruses including metapneumovirus. In addition to a highly divergent primary sequence, SH proteins vary significantly in size among the different viruses. Hu...

  11. Designing pH induced fold switch in proteins

    NASA Astrophysics Data System (ADS)

    Baruah, Anupaul; Biswas, Parbati

    2015-05-01

    This work investigates the computational design of a pH induced protein fold switch based on a self-consistent mean-field approach by identifying the ensemble averaged characteristics of sequences that encode a fold switch. The primary challenge to balance the alternative sets of interactions present in both target structures is overcome by simultaneously optimizing two foldability criteria corresponding to two target structures. The change in pH is modeled by altering the residual charge on the amino acids. The energy landscape of the fold switch protein is found to be double funneled. The fold switch sequences stabilize the interactions of the sites with similar relative surface accessibility in both target structures. Fold switch sequences have low sequence complexity and hence lower sequence entropy. The pH induced fold switch is mediated by attractive electrostatic interactions rather than hydrophobic-hydrophobic contacts. This study may provide valuable insights to the design of fold switch proteins.

  12. Isolation and determination of the primary structure of a lectin protein from the serum of the American alligator (Alligator mississippiensis).

    PubMed

    Darville, Lancia N F; Merchant, Mark E; Maccha, Venkata; Siddavarapu, Vivekananda Reddy; Hasan, Azeem; Murray, Kermit K

    2012-02-01

    Mass spectrometry in conjunction with de novo sequencing was used to determine the amino acid sequence of a 35kDa lectin protein isolated from the serum of the American alligator that exhibits binding to mannose. The protein N-terminal sequence was determined using Edman degradation and enzymatic digestion with different proteases was used to generate peptide fragments for analysis by liquid chromatography tandem mass spectrometry (LC MS/MS). Separate analysis of the protein digests with multiple enzymes enhanced the protein sequence coverage. De novo sequencing was accomplished using MASCOT Distiller and PEAKS software and the sequences were searched against the NCBI database using MASCOT and BLAST to identify homologous peptides. MS analysis of the intact protein indicated that it is present primarily as monomer and dimer in vitro. The isolated 35kDa protein was ~98% sequenced and found to have 313 amino acids and nine cysteine residues and was identified as an alligator lectin. The alligator lectin sequence was aligned with other lectin sequences using DIALIGN and ClustalW software and was found to exhibit 58% and 59% similarity to both human and mouse intelectin-1. The alligator lectin exhibited strong binding affinities toward mannan and mannose as compared to other tested carbohydrates. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding.

    PubMed

    Shahi, Payam; Kim, Samuel C; Haliburton, John R; Gartner, Zev J; Abate, Adam R

    2017-03-14

    Proteins are the primary effectors of cellular function, including cellular metabolism, structural dynamics, and information processing. However, quantitative characterization of proteins at the single-cell level is challenging due to the tiny amount of protein available. Here, we present Abseq, a method to detect and quantitate proteins in single cells at ultrahigh throughput. Like flow and mass cytometry, Abseq uses specific antibodies to detect epitopes of interest; however, unlike these methods, antibodies are labeled with sequence tags that can be read out with microfluidic barcoding and DNA sequencing. We demonstrate this novel approach by characterizing surface proteins of different cell types at the single-cell level and distinguishing between the cells by their protein expression profiles. DNA-tagged antibodies provide multiple advantages for profiling proteins in single cells, including the ability to amplify low-abundance tags to make them detectable with sequencing, to use molecular indices for quantitative results, and essentially limitless multiplexing.

  14. Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding

    NASA Astrophysics Data System (ADS)

    Shahi, Payam; Kim, Samuel C.; Haliburton, John R.; Gartner, Zev J.; Abate, Adam R.

    2017-03-01

    Proteins are the primary effectors of cellular function, including cellular metabolism, structural dynamics, and information processing. However, quantitative characterization of proteins at the single-cell level is challenging due to the tiny amount of protein available. Here, we present Abseq, a method to detect and quantitate proteins in single cells at ultrahigh throughput. Like flow and mass cytometry, Abseq uses specific antibodies to detect epitopes of interest; however, unlike these methods, antibodies are labeled with sequence tags that can be read out with microfluidic barcoding and DNA sequencing. We demonstrate this novel approach by characterizing surface proteins of different cell types at the single-cell level and distinguishing between the cells by their protein expression profiles. DNA-tagged antibodies provide multiple advantages for profiling proteins in single cells, including the ability to amplify low-abundance tags to make them detectable with sequencing, to use molecular indices for quantitative results, and essentially limitless multiplexing.

  15. Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding

    PubMed Central

    Shahi, Payam; Kim, Samuel C.; Haliburton, John R.; Gartner, Zev J.; Abate, Adam R.

    2017-01-01

    Proteins are the primary effectors of cellular function, including cellular metabolism, structural dynamics, and information processing. However, quantitative characterization of proteins at the single-cell level is challenging due to the tiny amount of protein available. Here, we present Abseq, a method to detect and quantitate proteins in single cells at ultrahigh throughput. Like flow and mass cytometry, Abseq uses specific antibodies to detect epitopes of interest; however, unlike these methods, antibodies are labeled with sequence tags that can be read out with microfluidic barcoding and DNA sequencing. We demonstrate this novel approach by characterizing surface proteins of different cell types at the single-cell level and distinguishing between the cells by their protein expression profiles. DNA-tagged antibodies provide multiple advantages for profiling proteins in single cells, including the ability to amplify low-abundance tags to make them detectable with sequencing, to use molecular indices for quantitative results, and essentially limitless multiplexing. PMID:28290550

  16. Signal peptide discrimination and cleavage site identification using SVM and NN.

    PubMed

    Kazemian, H B; Yusuf, S A; White, K

    2014-02-01

    About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model. © 2013 Published by Elsevier Ltd.

  17. Structure and Function of Lipopolysaccharide Binding Protein

    NASA Astrophysics Data System (ADS)

    Schumann, Ralf R.; Leong, Steven R.; Flaggs, Gail W.; Gray, Patrick W.; Wright, Samuel D.; Mathison, John C.; Tobias, Peter S.; Ulevitch, Richard J.

    1990-09-01

    The primary structure of lipopolysaccharide binding protein (LBP), a trace plasma protein that binds to the lipid A moiety of bacterial lipopolysaccharides (LPSs), was deduced by sequencing cloned complementary DNA. LBP shares sequence identity with another LPS binding protein found in granulocytes, bactericidal/permeability-increasing protein, and with cholesterol ester transport protein of the plasma. LBP may control the response to LPS under physiologic conditions by forming high-affinity complexes with LPS that bind to monocytes and macrophages, which then secrete tumor necrosis factor. The identification of this pathway for LPS-induced monocyte stimulation may aid in the development of treatments for diseases in which Gram-negative sepsis or endotoxemia are involved.

  18. An approach to large scale identification of non-obvious structural similarities between proteins

    PubMed Central

    Cherkasov, Artem; Jones, Steven JM

    2004-01-01

    Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578

  19. Sequence and structural implications of a bovine corneal keratan sulfate proteoglycan core protein. Protein 37B represents bovine lumican and proteins 37A and 25 are unique

    NASA Technical Reports Server (NTRS)

    Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

    1993-01-01

    Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.

  20. Identification and sequence analyses of novel lipase encoding novel thermophillic bacilli isolated from Armenian geothermal springs.

    PubMed

    Shahinyan, Grigor; Margaryan, Armine; Panosyan, Hovik; Trchounian, Armen

    2017-05-02

    Among the huge diversity of thermophilic bacteria mainly bacilli have been reported as active thermostable lipase producers. Geothermal springs serve as the main source for isolation of thermostable lipase producing bacilli. Thermostable lipolytic enzymes, functioning in the harsh conditions, have promising applications in processing of organic chemicals, detergent formulation, synthesis of biosurfactants, pharmaceutical processing etc. In order to study the distribution of lipase-producing thermophilic bacilli and their specific lipase protein primary structures, three lipase producers from different genera were isolated from mesothermal (27.5-70 °C) springs distributed on the territory of Armenia and Nagorno Karabakh. Based on phenotypic characteristics and 16S rRNA gene sequencing the isolates were identified as Geobacillus sp., Bacillus licheniformis and Anoxibacillus flavithermus strains. The lipase genes of isolates were sequenced by using initially designed primer sets. Multiple alignments generated from primary structures of the lipase proteins and annotated lipase protein sequences, conserved regions analysis and amino acid composition have illustrated the similarity (98-99%) of the lipases with true lipases (family I) and GDSL esterase family (family II). A conserved sequence block that determines the thermostability has been identified in the multiple alignments of the lipase proteins. The results are spreading light on the lipase producing bacilli distribution in geothermal springs in Armenia and Nagorno Karabakh. Newly isolated bacilli strains could be prospective source for thermostable lipases and their genes.

  1. SSMART: Sequence-structure motif identification for RNA-binding proteins.

    PubMed

    Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe

    2018-06-11

    RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.

  2. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  3. Predicting protein crystallization propensity from protein sequence

    PubMed Central

    2011-01-01

    The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein’s propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein’s propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor. PMID:20177794

  4. The C-Terminal Sequence of RhoB Directs Protein Degradation through an Endo-Lysosomal Pathway

    PubMed Central

    Ramos, Irene; Herrera, Mónica; Stamatakis, Konstantinos

    2009-01-01

    Background Protein degradation is essential for cell homeostasis. Targeting of proteins for degradation is often achieved by specific protein sequences or posttranslational modifications such as ubiquitination. Methodology/Principal Findings By using biochemical and genetic tools we have monitored the localization and degradation of endogenous and chimeric proteins in live primary cells by confocal microscopy and ultra-structural analysis. Here we identify an eight amino acid sequence from the C-terminus of the short-lived GTPase RhoB that directs the rapid degradation of both RhoB and chimeric proteins bearing this sequence through a lysosomal pathway. Elucidation of the RhoB degradation pathway unveils a mechanism dependent on protein isoprenylation and palmitoylation that involves sorting of the protein into multivesicular bodies, mediated by the ESCRT machinery. Moreover, RhoB sorting is regulated by late endosome specific lipid dynamics and is altered in human genetic lipid traffic disease. Conclusions/Significance Our findings characterize a short-lived cytosolic protein that is degraded through a lysosomal pathway. In addition, we define a novel motif for protein sorting and rapid degradation, which allows controlling protein levels by means of clinically used drugs. PMID:19956591

  5. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  6. Tracking the origin of simultaneous endometrial and ovarian cancer by next-generation sequencing - a case report.

    PubMed

    Valtcheva, Nadejda; Lang, Franziska M; Noske, Aurelia; Samartzis, Eleftherios P; Schmidt, Anna-Maria; Bellini, Elisa; Fink, Daniel; Moch, Holger; Rechsteiner, Markus; Dedes, Konstantin J; Wild, Peter J

    2017-01-19

    Endometrioid adenocarcinoma of the uterus and ovarian endometrioid carcinoma share many morphological and molecular features. Differentiation between simultaneous primary carcinomas and ovarian metastases of an endometrial cancer may be very challenging but is essential for prognostic and therapeutic considerations. In the present case study of a 33 year-old patient we used targeted amplicon next-generation re-sequencing for clarifying the origin of synchronous endometrioid cancer of the corpus uteri and the left ovary. The patient developed a metachronous lung metastasis of an endometrioid adenocarcinoma four years after hyster- and adnexectomy, vaginal brachytherapy and treatment with the synthetic steroid tibolone. Removal of the metastasis and megestrol treatment for seven years led to a complete remission. A total of 409 genes from the Ampliseq Comprehensive Cancer Panel (Ion Torrent, Thermo Fisher) were analysed by next generation sequencing and mutations in 10 genes, including ARID1A, CTNNB1, PIK3CA and PTEN were identified and confirmed by Sanger sequencing. Primary endometrial as well as ovarian cancer showed an identical mutational profile, suggesting the presence of an ovarian metastasis of the endometrial cancer, rather than a simultaneous endometrial and ovarian cancer. The metachronous lung metastasis showed a different mutational profile compared to the primary cancer. Immunohistochemical staining of the corresponding proteins suggested that the tumour development was driven by alterations in the protein function rather than by changes of the protein abundance in the cell. Our results have demonstrated next generation sequencing as a valuable tool in the differentiation of synchronous primary tumours and metastases, which has an important impact on the clinical decision making process. Similar to breast cancer, targeted therapies based on mutational tumour profiling will become increasingly important in endometrial and ovarian cancer. In summary, our results support the usage of next generation sequencing as a supplementary diagnostic tool, assisting in personalized precision medicine.

  7. Isolation and sequence of partial cDNA clones of human L1: homology of human and rodent L1 in the cytoplasmic region.

    PubMed

    Harper, J R; Prince, J T; Healy, P A; Stuart, J K; Nauman, S J; Stallcup, W B

    1991-03-01

    We have isolated cDNA clones coding for the human homologue of the neuronal cell adhesion molecule L1. The nucleotide sequence of the cDNA clones and the deduced primary amino acid sequence of the carboxy terminal portion of the human L1 are homologous to the corresponding sequences of mouse L1 and rat NILE glycoprotein, with an especially high sequences identity in the cytoplasmic regions of the proteins. There is also protein sequence homology with the cytoplasmic region of the Drosophila cell adhesion molecule, neuroglian. The conservation of the cytoplasmic domain argues for an important functional role for this portion of the molecule.

  8. An additional function of the rough endoplasmic reticulum protein complex prolyl 3-hydroxylase 1·cartilage-associated protein·cyclophilin B: the CXXXC motif reveals disulfide isomerase activity in vitro.

    PubMed

    Ishikawa, Yoshihiro; Bächinger, Hans Peter

    2013-11-01

    Collagen biosynthesis occurs in the rough endoplasmic reticulum, and many molecular chaperones and folding enzymes are involved in this process. The folding mechanism of type I procollagen has been well characterized, and protein disulfide isomerase (PDI) has been suggested as a key player in the formation of the correct disulfide bonds in the noncollagenous carboxyl-terminal and amino-terminal propeptides. Prolyl 3-hydroxylase 1 (P3H1) forms a hetero-trimeric complex with cartilage-associated protein and cyclophilin B (CypB). This complex is a multifunctional complex acting as a prolyl 3-hydroxylase, a peptidyl prolyl cis-trans isomerase, and a molecular chaperone. Two major domains are predicted from the primary sequence of P3H1: an amino-terminal domain and a carboxyl-terminal domain corresponding to the 2-oxoglutarate- and iron-dependent dioxygenase domains similar to the α-subunit of prolyl 4-hydroxylase and lysyl hydroxylases. The amino-terminal domain contains four CXXXC sequence repeats. The primary sequence of cartilage-associated protein is homologous to the amino-terminal domain of P3H1 and also contains four CXXXC sequence repeats. However, the function of the CXXXC sequence repeats is not known. Several publications have reported that short peptides containing a CXC or a CXXC sequence show oxido-reductase activity similar to PDI in vitro. We hypothesize that CXXXC motifs have oxido-reductase activity similar to the CXXC motif in PDI. We have tested the enzyme activities on model substrates in vitro using a GCRALCG peptide and the P3H1 complex. Our results suggest that this complex could function as a disulfide isomerase in the rough endoplasmic reticulum.

  9. MIPS: analysis and annotation of proteins from whole genomes

    PubMed Central

    Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  10. Improving protein complex classification accuracy using amino acid composition profile.

    PubMed

    Huang, Chien-Hung; Chou, Szu-Yu; Ng, Ka-Lok

    2013-09-01

    Protein complex prediction approaches are based on the assumptions that complexes have dense protein-protein interactions and high functional similarity between their subunits. We investigated those assumptions by studying the subunits' interaction topology, sequence similarity and molecular function for human and yeast protein complexes. Inclusion of amino acids' physicochemical properties can provide better understanding of protein complex properties. Principal component analysis is carried out to determine the major features. Adopting amino acid composition profile information with the SVM classifier serves as an effective post-processing step for complexes classification. Improvement is based on primary sequence information only, which is easy to obtain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. Structure-related statistical singularities along protein sequences: a correlation study.

    PubMed

    Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

    2005-01-01

    A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.

  12. Isolation and in silico analysis of a novel H+-pyrophosphatase gene orthologue from the halophytic grass Leptochloa fusca

    NASA Astrophysics Data System (ADS)

    Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid

    2017-02-01

    Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.

  13. The primary structure of L37--a rat ribosomal protein with a zinc finger-like motif.

    PubMed

    Chan, Y L; Paz, V; Olvera, J; Wool, I G

    1993-04-30

    The amino acid sequence of the rat 60S ribosomal subunit protein L37 was deduced from the sequence of nucleotides in a recombinant cDNA. Ribosomal protein L37 has 96 amino acids, the NH2-terminal methionine is removed after translation of the mRNA, and has a molecular weight of 10,939. Ribosomal protein L37 has a single zinc finger-like motif of the C2-C2 type. Hybridization of the cDNA to digests of nuclear DNA suggests that there are 13 or 14 copies of the L37 gene. The mRNA for the protein is about 500 nucleotides in length. Rat L37 is related to Saccharomyces cerevisiae ribosomal protein YL35 and to Caenorhabditis elegans L37. We have identified in the data base a DNA sequence that encodes the chicken homolog of rat L37.

  14. LymPHOS 2.0: an update of a phosphosite database of primary human T cells

    PubMed Central

    Nguyen, Tien Dung; Vidal-Cortes, Oriol; Gallardo, Oscar; Abian, Joaquin; Carrascal, Montserrat

    2015-01-01

    LymPHOS is a web-oriented database containing peptide and protein sequences and spectrometric information on the phosphoproteome of primary human T-Lymphocytes. Current release 2.0 contains 15 566 phosphorylation sites from 8273 unique phosphopeptides and 4937 proteins, which correspond to a 45-fold increase over the original database description. It now includes quantitative data on phosphorylation changes after time-dependent treatment with activators of the TCR-mediated signal transduction pathway. Sequence data quality has also been improved with the use of multiple search engines for database searching. LymPHOS can be publicly accessed at http://www.lymphos.org. Database URL: http://www.lymphos.org. PMID:26708986

  15. Homology of aspartyl- and lysyl-tRNA synthetases.

    PubMed Central

    Gampel, A; Tzagoloff, A

    1989-01-01

    The yeast nuclear gene MSD1 coding for mitochondrial aspartyl-tRNA synthetase has been cloned and sequenced. The identity of the gene is confirmed by the following evidence. (i) The primary structure of the protein derived from the gene sequence is similar to that of the yeast cytoplasmic aspartyl-tRNA synthetase. (ii) In situ disruption of MSD1 in a respiratory-competent haploid strain of yeast induces a pleiotropic phenotype consistent with a lesion in mitochondrial protein synthesis. (iii) Mitochondria from a mutant with a disrupted chromosomal copy of MSD1 are unable to acylate mitochondrial aspartyl-tRNA. The primary structures of the cytoplasmic and mitochondrial aspartyl-tRNA synthetases are similar to the yeast cytoplasmic lysyl-tRNA synthetase, suggesting that the two types of synthetases may have a common evolutionary origin. Searches of the current protein banks also have revealed a high degree of sequence similarity of the lysyl-tRNA synthetase to the product of the Escherichia coli herC gene and to the partial sequence of a protein encoded by an unidentified reading frame located adjacent to the E. coli frdA gene. Based on the sequence similarities and the map positions of the herC and frdA loci, we propose herC to be the structural gene of the constitutively expressed lysyl-tRNA synthetase of E. coli and the unidentified reading frame to be the structural gene of the heat-inducible lysyl-tRNA synthetase. Images PMID:2668951

  16. Protein classification using modified n-grams and skip-grams.

    PubMed

    Islam, S M Ashiqul; Heil, Benjamin J; Kearney, Christopher Michel; Baker, Erich J

    2018-05-01

    Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce a supervised protein classification method with a novel means of automating the work-intensive feature generation step via a Natural Language Processing (NLP)-dependent model, using a modified combination of n-grams and skip-grams (m-NGSG). A meta-comparison of cross-validation accuracy with twelve training datasets from nine different published studies demonstrates a consistent increase in accuracy of m-NGSG when compared to contemporary classification and feature generation models. We expect this model to accelerate the classification of proteins from primary sequence data and increase the accessibility of protein characteristic prediction to a broader range of scientists. m-NGSG is freely available at Bitbucket: https://bitbucket.org/sm_islam/mngsg/src. A web server is available at watson.ecs.baylor.edu/ngsg. erich_baker@baylor.edu. Supplementary data are available at Bioinformatics online.

  17. The primary structures of ribosomal proteins S14 and S16 from the archaebacterium Halobacterium marismortui. Comparison with eubacterial and eukaryotic ribosomal proteins.

    PubMed

    Kimura, J; Kimura, M

    1987-09-05

    The amino acid sequences of two ribosomal proteins, S14 and S16, from the archaebacterium Halobacterium marismortui have been determined. Sequence data were obtained by the manual and solid-phase sequencing of peptides derived from enzymatic digestions with trypsin, chymotrypsin, pepsin, and Staphylococcus aureus protease as well as by chemical cleavage with cyanogen bromide. Proteins S14 and S16 contain 109 and 126 amino acid residues and have Mr values of 11,964 and 13,515, respectively. Comparison of the sequences with those of ribosomal proteins from other organisms demonstrates that S14 has a significant homology with the rat liver ribosomal protein S11 (36% identity) as well as with the Escherichia coli ribosomal protein S17 (37%), and that S16 is related to the yeast ribosomal protein YS22 (40%) and proteins S8 from E. coli (28%) and Bacillus stearothermophilus (30%). A comparison of the amino acid residues in the homologous regions of halophilic and nonhalophilic ribosomal proteins reveals that halophilic proteins have more glutamic acids, asparatic acids, prolines, and alanines, and less lysines, arginines, and isoleucines than their nonhalophilic counterparts. These amino acid substitutions probably contribute to the structural stability of halophilic ribosomal proteins.

  18. GeneSilico protein structure prediction meta-server.

    PubMed

    Kurowski, Michal A; Bujnicki, Janusz M

    2003-07-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.

  19. GeneSilico protein structure prediction meta-server

    PubMed Central

    Kurowski, Michal A.; Bujnicki, Janusz M.

    2003-01-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313

  20. Using random forests for assistance in the curation of G-protein coupled receptor databases.

    PubMed

    Shkurin, Aleksei; Vellido, Alfredo

    2017-08-18

    Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.

  1. Novel numerical and graphical representation of DNA sequences and proteins.

    PubMed

    Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D

    2006-12-01

    We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.

  2. Isolation and primary structural analysis of two conjugated polyketone reductases from Candida parapsilosis.

    PubMed

    Hidalgo, A R; Akond, M A; Kita, K; Kataoka, M; Shimizu, S

    2001-12-01

    Two conjugated polyketone reductases (CPRs) were isolated from Candida parapsilosis IFO 0708. The primary structures of CPRs (C1 and C2) were analyzed by amino acid sequencing. The amino acid sequences of both enzymes had high similarity to those of several proteins of the aldo-keto-reductase (AKR) superfamily. However, several amino acid residues in the putative active sites of AKRs were not conserved in CPRs-C1 and -C2.

  3. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  4. Developmental and Subcellular Organization of Single-Cell C₄ Photosynthesis in Bienertia sinuspersici Determined by Large-Scale Proteomics and cDNA Assembly from 454 DNA Sequencing.

    PubMed

    Offermann, Sascha; Friso, Giulia; Doroshenk, Kelly A; Sun, Qi; Sharpe, Richard M; Okita, Thomas W; Wimmer, Diana; Edwards, Gerald E; van Wijk, Klaas J

    2015-05-01

    Kranz C4 species strictly depend on separation of primary and secondary carbon fixation reactions in different cell types. In contrast, the single-cell C4 (SCC4) species Bienertia sinuspersici utilizes intracellular compartmentation including two physiologically and biochemically different chloroplast types; however, information on identity, localization, and induction of proteins required for this SCC4 system is currently very limited. In this study, we determined the distribution of photosynthesis-related proteins and the induction of the C4 system during development by label-free proteomics of subcellular fractions and leaves of different developmental stages. This was enabled by inferring a protein sequence database from 454 sequencing of Bienertia cDNAs. Large-scale proteome rearrangements were observed as C4 photosynthesis developed during leaf maturation. The proteomes of the two chloroplasts are different with differential accumulation of linear and cyclic electron transport components, primary and secondary carbon fixation reactions, and a triose-phosphate shuttle that is shared between the two chloroplast types. This differential protein distribution pattern suggests the presence of a mRNA or protein-sorting mechanism for nuclear-encoded, chloroplast-targeted proteins in SCC4 species. The combined information was used to provide a comprehensive model for NAD-ME type carbon fixation in SCC4 species.

  5. Profiles of Brain Metastases: Prioritization of Therapeutic Targets.

    PubMed

    Ferguson, Sherise D; Zheng, Siyuan; Xiu, Joanne; Zhou, Shouhao; Khasraw, Mustafa; Brastianos, Priscilla K; Kesari, Santosh; Hu, Jethro; Rudnick, Jeremy; Salacz, Michael E; Piccioni, David; Huang, Suyun; Davies, Michael A; Glitza, Isabella C; Heymach, John V; Zhang, Jianjun; Ibrahim, Nuhad K; DeGroot, John F; McCarty, Joseph; O'Brien, Barbara J; Sawaya, Raymond; Verhaak, Roeland G W; Reddy, Sandeep K; Priebe, Waldemar; Gatalica, Zoran; Spetzler, David; Heimberger, Amy B

    2018-06-19

    We sought to compare the tumor profiles of brain metastases from common cancers with those of primary tumors and extracranial metastases in order to identify potential targets and prioritize rational treatment strategies. Tumor samples were collected from both the primary and metastatic sites of non-small cell lung cancer, breast cancer, and melanoma from patients in locations worldwide, and these were submitted to Caris Life Sciences for tumor multiplatform analysis, including gene sequencing (Sanger and next-generation sequencing with a targeted 47-gene panel), protein expression (assayed by immunohistochemistry), and gene amplification (assayed by in situ hybridization). The data analysis considered differential protein expression, gene amplification, and mutations among brain metastases, extracranial metastases, and primary tumors. The analyzed population included: 16,999 unmatched primary tumor and/or metastasis samples: 8178 non-small cell lung cancers (5098 primaries; 2787 systemic metastases; 293 brain metastases), 7064 breast cancers (3496 primaries; 3469 systemic metastases; 99 brain metastases), and 1757 melanomas (660 primaries; 996 systemic metastases; 101 brain metastases). TOP2A expression was increased in brain metastases from all 3 cancers, and brain metastases overexpressed multiple proteins clustering around functions critical to DNA synthesis and repair and implicated in chemotherapy resistance, including RRM1, TS, ERCC1, and TOPO1. cMET was overexpressed in melanoma brain metastases relative to primary skin specimens. Brain metastasis patients may particularly benefit from therapeutic targeting of enzymes associated with DNA synthesis, replication, and/or repair. This article is protected by copyright. All rights reserved. © 2018 UICC.

  6. Primary structure of Lep d I, the main Lepidoglyphus destructor allergen.

    PubMed

    Varela, J; Ventas, P; Carreira, J; Barbas, J A; Gimenez-Gallego, G; Polo, F

    1994-10-01

    The most relevant allergen of the storage mite Lepidoglyphus destructor (Lep d I) has been characterized. Lep d I is a monomer protein of 13273 Da. The primary structure of Lep d I was determined by N-terminal Edman degradation and partially confirmed by cDNA sequencing. Sequence polymorphism was observed at six positions, with non-conservative substitutions in three of them. No potential N-glycosylation site was revealed by peptide sequencing. The 125-residue sequence of Lep d I shows approximately 40% identity (including the six cysteines) with the overlapping regions of group II allergens from the genus Dermatophagoides, which, however, do not share common allergenic epitopes with Lep d I.

  7. The primary structure of rat liver ribosomal protein L37. Homology with yeast and bacterial ribosomal proteins.

    PubMed

    Lin, A; McNally, J; Wool, I G

    1983-09-10

    The covalent structure of the rat liver 60 S ribosomal subunit protein L37 was determined. Twenty-four tryptic peptides were purified and the sequence of each was established; they accounted for all 111 residues of L37. The sequence of the first 30 residues of L37, obtained previously by automated Edman degradation of the intact protein, provided the alignment of the first 9 tryptic peptides. Three peptides (CN1, CN2, and CN3) were produced by cleavage of protein L37 with cyanogen bromide. The sequence of CN1 (65 residues) was established from the sequence of secondary peptides resulting from cleavage with trypsin and chymotrypsin. The sequence of CN1 in turn served to order tryptic peptides 1 through 14. The sequence of CN2 (15 residues) was determined entirely by a micromanual procedure and allowed the alignment of tryptic peptides 14 through 18. The sequence of the NH2-terminal 28 amino acids of CN3 (31 residues) was determined; in addition the complete sequences of the secondary tryptic and chymotryptic peptides were done. The sequence of CN3 provided the order of tryptic peptides 18 through 24. Thus the sequence of the three cyanogen bromide peptides also accounted for the 111 residues of protein L37. The carboxyl-terminal amino acids were identified after carboxypeptidase A treatment. There is a disulfide bridge between half-cystinyl residues at positions 40 and 69. Rat liver ribosomal protein L37 is homologous with yeast YP55 and with Escherichia coli L34. Moreover, there is a segment of 17 residues in rat L37 that occurs, albeit with modifications, in yeast YP55 and in E. coli S4, L20, and L34.

  8. Systematic Analysis of Primary Sequence Domain Segments for the Discrimination Between Class C GPCR Subtypes.

    PubMed

    König, Caroline; Alquézar, René; Vellido, Alfredo; Giraldo, Jesús

    2018-03-01

    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.

  9. Deletion mapping of the Aequorea victoria green fluorescent protein.

    PubMed

    Dopf, J; Horiagon, T M

    1996-01-01

    Aequorea victoria green fluorescent protein (GFP) is a promising fluorescent marker which is active in a diverse array of prokaryotic and eukaryotic organisms. A key feature underlying the versatility of GFP is its capacity to undergo heterocyclic chromophore formation by cyclization of a tripeptide present in its primary sequence and thereby acquiring fluorescent activity in a variety of intracellular environments. In order to define further the primary structure requirements for chromophore formation and fluorescence in GFP, a series of N- and C-terminal GFP deletion variant expression vectors were created using the polymerase chain reaction. Scanning spectrofluorometric analyses of crude soluble protein extracts derived from eleven GFP expression constructs revealed that amino acid (aa) residues 2-232, of a total of 238 aa in the native protein, were required for the characteristic emission and absorption spectra of native GFP. Heterocyclic chromophore formation was assayed by comparing the absorption spectrum of GFP deletion variants over the 300-500-nm range to the absorption spectra of full-length GFP and GFP deletion variants missing the chromophore substrate domain from the primary sequence. GFP deletion variants lacking fluorescent activity showed no evidence of heterocyclic ring structure formation when the soluble extracts of their bacterial expression hosts were studied at pH 7.9. These observations suggest that the primary structure requirements for the fluorescent activity of GFP are relatively extensive and are compatible with the view that much of the primary structure serves an autocatalytic function.

  10. A Score of the Ability of a Three-Dimensional Protein Model to Retrieve Its Own Sequence as a Quantitative Measure of Its Quality and Appropriateness

    PubMed Central

    Martínez-Castilla, León P.; Rodríguez-Sotres, Rogelio

    2010-01-01

    Background Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. Principal Findings The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449–460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. Conclusion Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone. PMID:20830209

  11. Engineering and Evolution of Molecular Chaperones and Protein Disaggregases with Enhanced Activity

    PubMed Central

    Mack, Korrie L.; Shorter, James

    2016-01-01

    Cells have evolved a sophisticated proteostasis network to ensure that proteins acquire and retain their native structure and function. Critical components of this network include molecular chaperones and protein disaggregases, which function to prevent and reverse deleterious protein misfolding. Nevertheless, proteostasis networks have limits, which when exceeded can have fatal consequences as in various neurodegenerative disorders, including Parkinson's disease and amyotrophic lateral sclerosis. A promising strategy is to engineer proteostasis networks to counter challenges presented by specific diseases or specific proteins. Here, we review efforts to enhance the activity of individual molecular chaperones or protein disaggregases via engineering and directed evolution. Remarkably, enhanced global activity or altered substrate specificity of various molecular chaperones, including GroEL, Hsp70, ClpX, and Spy, can be achieved by minor changes in primary sequence and often a single missense mutation. Likewise, small changes in the primary sequence of Hsp104 yield potentiated protein disaggregases that reverse the aggregation and buffer toxicity of various neurodegenerative disease proteins, including α-synuclein, TDP-43, and FUS. Collectively, these advances have revealed key mechanistic and functional insights into chaperone and disaggregase biology. They also suggest that enhanced chaperones and disaggregases could have important applications in treating human disease as well as in the purification of valuable proteins in the pharmaceutical sector. PMID:27014702

  12. A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

    PubMed Central

    Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth

    2016-01-01

    Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences. PMID:27552220

  13. Synaptotagmin 1 causes phosphatidyl inositol lipid-dependent actin remodeling in cultured non-neuronal and neuronal cells

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnsson, Anna-Karin; Karlsson, Roger, E-mail: roger.karlsson@wgi.su.se

    2012-01-15

    Here we demonstrate that a dramatic actin polymerizing activity caused by ectopic expression of the synaptic vesicle protein synaptotagmin 1 that results in extensive filopodia formation is due to the presence of a lysine rich sequence motif immediately at the cytoplasmic side of the transmembrane domain of the protein. This polybasic sequence interacts with anionic phospholipids in vitro, and, consequently, the actin remodeling caused by this sequence is interfered with by expression of a phosphatidyl inositol (4,5)-bisphosphate (PIP2)-targeted phosphatase, suggesting that it intervenes with the function of PIP2-binding actin control proteins. The activity drastically alters the behavior of a rangemore » of cultured cells including the neuroblastoma cell line SH-SY5Y and primary cortical mouse neurons, and, since the sequence is conserved also in synaptotagmin 2, it may reflect an important fine-tuning role for these two proteins during synaptic vesicle fusion and neurotransmitter release.« less

  14. Classifying Membrane Proteins in the Proteome by Using Artificial Neural Networks Based on the Preferential Parameters of Amino Acids

    NASA Astrophysics Data System (ADS)

    Bose, Subrata K.; Browne, Antony; Kazemian, Hassan; White, Kenneth

    Membrane proteins (MPs) are large set of biological macromolecules that play a fundamental role in physiology and pathophysiology for survival. From a pharma-economical perspective, though it is the fact that MPs constitute ˜75% of possible targets for novel drugs but MPs are one of the most understudied groups of proteins in biochemical research. This is mainly because of the technical difficulties of obtaining structural information about trans-membrane regions (these are small sequences that crossways the bilayer lipid membrane). It is quite useful to predict the location of transmembrane segments down the sequence, since these are the elementary structural building blocks defining their topology. There have been several attempts over the last 20 years to develop tools for predicting membrane-spanning regions but current tools are far away from achieving a considerable reliability in prediction. This study aims to exploit the knowledge and current understanding in the field of artificial neural networks (ANNs) in particular data representation through the development of a system to identify and predict membrane-spanning regions by analysing primary amino acids sequence. In this paper we present a novel neural network (NNs) architecture and algorithms for predicting membrane spanning regions from primary amino acids sequences by using their preference parameters.

  15. Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses.

    PubMed

    Mandl, C W; Holzmann, H; Kunz, C; Heinz, F X

    1993-05-01

    The complete nucleotide sequence of the positive-stranded RNA genome of the tick-borne flavivirus Powassan (10,839 nucleotides) was elucidated and the amino acid sequence of all viral proteins was derived. Based on this sequence as well as serological data, Powassan virus represents the most divergent member of the tick-borne serocomplex within the genus flaviviruses, family Flaviviridae. The primary nucleotide sequence and potential RNA secondary structures of the Powassan virus genome as well as the protein sequences and the reactivities of the virion with a panel of monoclonal antibodies were compared to other tick-borne and mosquito-borne flaviviruses. These analyses corroborated significant differences between tick-borne and mosquito-borne flaviviruses, but also emphasized structural elements that are conserved among both vector groups. The comparisons among tick-borne flaviviruses revealed conserved sequence elements that might represent important determinants of the tick-borne flavivirus phenotype.

  16. Improved purification, crystallization and primary structure of pyruvate:ferredoxin oxidoreductase from Halobacterium halobium.

    PubMed

    Plaga, W; Lottspeich, F; Oesterhelt, D

    1992-04-01

    An improved purification procedure, including nickel chelate affinity chromatography, is reported which resulted in a crystallizable pyruvate:ferredoxin oxidoreductase preparation from Halobacterium halobium. Crystals of the enzyme were obtained using potassium citrate as the precipitant. The genes coding for pyruvate:ferredoxin oxidoreductase were cloned and their nucleotide sequences determined. The genes of both subunits were adjacent to one another on the halobacterial genome. The derived amino acid sequences were confirmed by partial primary structure analysis of the purified protein. The structural motif of thiamin-diphosphate-binding enzymes was unequivocally located in the deduced amino acid sequence of the small subunit.

  17. CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: Statistical interior properties of globular proteins

    NASA Astrophysics Data System (ADS)

    Jiang, Zhou-Ting; Zhang, Lin-Xi; Sun, Ting-Ting; Wu, Tai-Quan

    2009-10-01

    The character of forming long-range contacts affects the three-dimensional structure of globular proteins deeply. As the different ability to form long-range contacts between 20 types of amino acids and 4 categories of globular proteins, the statistical properties are thoroughly discussed in this paper. Two parameters NC and ND are defined to confine the valid residues in detail. The relationship between hydrophobicity scales and valid residue percentage of each amino acid is given in the present work and the linear functions are shown in our statistical results. It is concluded that the hydrophobicity scale defined by chemical derivatives of the amino acids and nonpolar phase of large unilamellar vesicle membranes is the most effective technique to characterise the hydrophobic behavior of amino acid residues. Meanwhile, residue percentage Pi and sequential residue length Li of a certain protein i are calculated under different conditions. The statistical results show that the average value of Pi as well as Li of all-α proteins has a minimum among these 4 classes of globular proteins, indicating that all-α proteins are hardly capable of forming long-range contacts one by one along their linear amino acid sequences. All-β proteins have a higher tendency to construct long-range contacts along their primary sequences related to the secondary configurations, i.e. parallel and anti-parallel configurations of β sheets. The investigation of the interior properties of globular proteins give us the connection between the three-dimensional structure and its primary sequence data or secondary configurations, and help us to understand the structure of protein and its folding process well.

  18. A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package

    PubMed Central

    Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M.

    2013-01-01

    Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs. PMID:24688703

  19. A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package.

    PubMed

    Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M

    2013-01-01

    Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.

  20. Structural Analysis of HMGD-DNA Complexes Reveal Influence of Intercalation on Sequence Selectivity and DNA Bending

    PubMed Central

    Churchill, Mair E.A.; Klass, Janet; Zoetewey, David L.

    2010-01-01

    The ubiquitous eukaryotic High-Mobility-Group-Box (HMGB) chromosomal proteins promote many chromatin-mediated cellular activities through their non-sequence-specific binding and bending of DNA. Minor groove DNA binding by the HMG box results in substantial DNA bending toward the major groove owing to electrostatic interactions, shape complementarity and DNA intercalation that occurs at two sites. Here, the structures of the complexes formed with DNA by a partially DNA intercalation-deficient mutant of Drosophila melanogaster HMGD have been determined by X-ray crystallography at a resolution of 2.85 Å. The six proteins and fifty base pairs of DNA in the crystal structure revealed a variety of bound conformations. All of the proteins bound in the minor groove, bridging DNA molecules, presumably because these DNA regions are easily deformed. The loss of the primary site of DNA intercalation decreased overall DNA bending and shape complementarity. However, DNA bending at the secondary site of intercalation was retained and most protein-DNA contacts were preserved. The mode of binding resembles the HMGB1-boxA-cisplatin-DNA complex, which also lacks a primary intercalating residue. This study provides new insights into the binding mechanisms used by HMG boxes to recognize varied DNA structures and sequences as well as modulate DNA structure and DNA bending. PMID:20800069

  1. Free Energy Landscape - Settlements of Key Residues.

    NASA Astrophysics Data System (ADS)

    Aroutiounian, Svetlana

    2007-03-01

    FEL perspective in studies of protein folding transitions reflects notion that since there are ˜10^N conformations to scan in search of lowest free energy state, random search is beyond biological timescale. Protein folding must follow certain fel pathways and folding kinetics of evolutionary selected proteins dominates kinetic traps. Good model for functional robustness of natural proteins - coarse-grained model protein is not very accurate but affords bringing simulations closer to biological realm; Go-like potential secures the fel funnel shape; biochemical contacts signify the funnel bottleneck. Boltzmann-weighted ensemble of protein conformations and histogram method are used to obtain from MC sampling of protein conformational space the approximate probability distribution. The fel is F(rmsd) = -1/βLn[Hist(rmsd)], β=kBT and rmsd is root-mean-square-deviation from native conformation. The sperm whale myoglobin has rich dynamic behavior, is small and large - on computational scale, has a symmetry in architecture and unusual sextet of residue pairs. Main idea: there is a mathematical relation between protein fel and a key residues set providing stability to folding transition. Is the set evolutionary conserved also for functional reasons? Hypothesis: primary sequence determines the key residues positions conserved as stabilizers and the fel is the battlefield for the folding stability. Preliminary results: primary sequence - not the architecture, is the rule settler, indeed.

  2. Cellular Transfection to Deliver Alanine-Glyoxylate Aminotransferase to Hepatocytes: A Rational Gene Therapy for Primary Hyperoxaluria-1 (PH-1)

    PubMed Central

    Koul, Sweaty; Johnson, Thomas; Pramanik, Saroj; Koul, Hari

    2005-01-01

    Background: Primary hyperoxaluria-type 1 (PH-1) is a rare autosomal recessive disorder of glyoxalate metabolism caused by deficiency in the liver-specific peroxisomal enzyme alanine-glyoxalate transaminase 1 (AGT) resulting in the increased oxidation of glyoxalate to oxalate. Accumulation of oxalate in the kidney and other soft tissues results in loss of renal function and significant morbidity. The present treatment options offer some relief in the short term, but they are not completely successful. In the present study, we tested the feasibility of corrective gene therapy for this metabolic disorder. Methods: A cDNA library was made from HepG2 cells. PCR primers were designed for the AGT sequence with modifications to preclude mistargeting during gene delivery. Amplified AGT cDNA was cloned as a fusion protein with green fluorescent protein (GFP) using the vector EGFP-C1 (Clontech) for monitoring subcellular distribution. Sequence and expression of the fusion protein was verified. Fusion protein vectors were transfected into hepatocytes by liposomal transfection. AGT expression and subcellular distribution was monitored by GFP fluorescence. Results: HepG2 cells express full-length mRNA coding for AGT as confirmed by insert size as well as sequence determination. Selective primers allowed us to generate a modified recombinant GFP-AGT fusion protein. Cellular transfections with Lipofectamine resulted in transfection efficiencies of 60–90%. The recombinant AGT did localize to peroxisomes as monitored by GFP fluorescence. Conclusions: The results demonstrate preliminary in vitro feasibility data for AGT transfection into the hepatocytes. To the best of our knowledge, this is the first study to attempt recombinant AGT gene therapy for treatment of primary hyperoxaluria-1. PMID:15849465

  3. Properties of the intracellular transient receptor potential (TRP) channel in yeast, Yvc1.

    PubMed

    Chang, Yiming; Schlenstedt, Gabriel; Flockerzi, Veit; Beck, Andreas

    2010-05-17

    Transient receptor potential (TRP) channels are found among mammals, flies, worms, ciliates, Chlamydomonas, and yeast but are absent in plants. These channels are believed to be tetramers of proteins containing six transmembrane domains (TMs). Their primary structures are diverse with sequence similarities only in some short amino acid sequence motifs mainly within sequences covering TM5, TM6, and adjacent domains. In the yeast genome, there is one gene encoding a TRP-like sequence. This protein forms an ion channel in the vacuolar membrane and is therefore called Yvc1 for yeast vacuolar conductance 1. In the following we summarize its prominent features. Copyright 2009 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  4. Primary structure and subcellular localization of two fimbrial subunit-like proteins involved in the biosynthesis of K99 fibrillae.

    PubMed

    Roosendaal, E; Jacobs, A A; Rathman, P; Sondermeyer, C; Stegehuis, F; Oudega, B; de Graaf, F K

    1987-09-01

    Analysis of the nucleotide sequence of the distal part of the fan gene cluster encoding the proteins involved in the biosynthesis of the fibrillar adhesin, K99, revealed the presence of two structural genes, fanG and fanH. The amino acid sequence of the gene products (FanG and FanH) showed significant homology to the amino acid sequence of the fibrillar subunit protein (FanC). Introduction of a site-specific frameshift mutation in fanG or fanH resulted in a simultaneous decrease in fibrillae production and adhesive capacity. Analysis of subcellular fractions showed that, in contrast to the K99 fibrillar subunit (FanC), both the FanH and the FanG protein were loosely associated with the outer membrane, possibly on the periplasmic side, but were not components of the fimbriae themselves.

  5. Complete amino acid sequence of the myoglobin from the Pacific sei whale, Balaenoptera borealis.

    PubMed

    Jones, B N; Rothgeb, T M; England, R D; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from Pacific sei whale, Balaenoptera borealis, was determined by specific cleavage of the protein to obtain large peptides which are readily degraded by the automatic sequencer. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. From the sequence analysis of four of these peptides and the apomyoglobin, over 75% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by the sequence analysis of peptides that resulted from further digestion of the amino-terminal and central cyanogen bromide fragments. The amino-terminal fragment was specifically cleaved at its two tryptophanyl residues with N-chlorosuccinimide and the central cyanogen bromide fragment was cleaved at its glutamyl residues with staphylococcal protease and at its single tyrosyl residue with N-bromosuccinimide. The primary structure of this myoglobin proved identical with that from the gray whale but differs from that of the finback whale at four positions, from that of the minke whale at three positions and from the myoglobin of the humpback whale at one position. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea.

  6. Predicting DNA binding proteins using support vector machine with hybrid fractal features.

    PubMed

    Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo

    2014-02-21

    DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances. © 2013 The Authors. Published by Elsevier Ltd All rights reserved.

  7. Efficient use of unlabeled data for protein sequence classification: a comparative study.

    PubMed

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-04-29

    Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.

  8. Specific binding of a HeLa cell nuclear protein to RNA sequences in the human immunodeficiency virus transactivating region.

    PubMed Central

    Gaynor, R; Soultanakis, E; Kuwabara, M; Garcia, J; Sigman, D S

    1989-01-01

    The transactivator protein, tat, encoded by the human immunodeficiency virus is a key regulator of viral transcription. Activation by the tat protein requires sequences downstream of the transcription initiation site called the transactivating region (TAR). RNA derived from the TAR is capable of forming a stable stem-loop structure and the maintenance of both the stem structure and the loop sequences located between +19 and +44 is required for complete in vivo activation by tat. Gel retardation assays with RNA from both wild-type and mutant TAR constructs generated in vitro with SP6 polymerase indicated specific binding of HeLa nuclear proteins to the TAR. To characterize this RNA-protein interaction, a method of chemical "imprinting" has been developed using photoactivated uranyl acetate as the nucleolytic agent. This reagent nicks RNA under physiological conditions at all four nucleotides in a reaction that is independent of sequence and secondary structure. Specific interaction of cellular proteins with TAR RNA could be detected by enhanced cleavages or imprints surrounding the loop region. Mutations that either disrupted stem base-pairing or extensively changed the primary sequence resulted in alterations in the cleavage pattern of the TAR RNA. Structural features of the TAR RNA stem-loop essential for tat activation are also required for specific binding of the HeLa cell nuclear protein. Images PMID:2544877

  9. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .

  10. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934

  11. Definition of the complete Schistosoma mansoni hemoglobinase mRNA sequence and gene expression in developing parasites.

    PubMed

    el Meanawy, M A; Aji, T; Phillips, N F; Davis, R E; Salata, R A; Malhotra, I; McClain, D; Aikawa, M; Davis, A H

    1990-07-01

    Schistosoma mansoni uses a variety of proteases termed hemoglobinases to obtain nutrition from host globin. Previous reports have characterized cDNAs encoding 1 of these enzymes. However, these sequences did not define the primary structures of the mRNA and protein. The complete sequence of the 1390 base mRNA has now been determined. It encodes a 50 kDa primary translation product. In vitro translations coupled with immunoprecipitations and Western blots of parasite lysates allowed visualization of the 50 kDa form. Production of the 31 kDa mature hemoglobinase from the 50 kDa species involves removal of both NH2 and COOH terminal residues from the primary translation product. Expression of hemoglobinase mRNA and protein was examined during larval parasite development. Low levels were observed in young schistosomula. After 6-9 days in culture, high hemoglobinase levels were seen which correlated with the onset of red blood cell feeding. Immunoelectron microscopy was employed to examine hemoglobinase location and function. In adult worms the enzyme was associated with the gut lumen and gut epithelium. In cercariae, the protease was observed in the head gland, suggesting new roles for the protease.

  12. Cloning and characterization of the novel D-aspartyl endopeptidase, paenidase, from Paenibacillus sp. B38.

    PubMed

    Nirasawa, Satoru; Nakahara, Kazuhiko; Takahashi, Saori

    2018-02-27

    Paenidase is the first microorganism-derived D-aspartyl endopeptidase that specifically recognizes an internal D-Asp residue to cleave [D-Asp]-X peptide bonds. Using peptide sequences obtained from the protein, we performed PCR with degenerate primers to amplify the paenidase I-encoding gene. Nucleotide sequencing revealed that mature paenidase I consists of 322 amino acid residues and that the protein is encoded as a pro-protein with a 197-amino-acid N-terminal extension compared to the mature protein. Paenidase I exhibits amino acid sequence similarity to several penicillin-binding proteins. In addition, paenidase I was classified into peptidase family S12 based on a MEROPS database search. Family S12 contains serine-type D-Ala-D-Ala carboxypeptidases that have three active site residues (Ser, Lys, and Tyr) in the conserved motifs Ser-Xaa-Thr-Lys and Tyr-Xaa-Asn. These motifs were conserved in the primary structure of paenidase I, and the role of these residues was confirmed by site-directed mutagenesis.

  13. Sequence swapping does not result in conformation swapping for the beta4/beta5 and beta8/beta9 beta-hairpin turns in human acidic fibroblast growth factor.

    PubMed

    Kim, Jaewon; Lee, Jihun; Brych, Stephen R; Logan, Timothy M; Blaber, Michael

    2005-02-01

    The beta-turn is the most common type of nonrepetitive structure in globular proteins, comprising ~25% of all residues; however, a detailed understanding of effects of specific residues upon beta-turn stability and conformation is lacking. Human acidic fibroblast growth factor (FGF-1) is a member of the beta-trefoil superfold and contains a total of five beta-hairpin structures (antiparallel beta-sheets connected by a reverse turn). beta-Turns related by the characteristic threefold structural symmetry of this superfold exhibit different primary structures, and in some cases, different secondary structures. As such, they represent a useful system with which to study the role that turn sequences play in determining structure, stability, and folding of the protein. Two turns related by the threefold structural symmetry, the beta4/beta5 and beta8/beta9 turns, were subjected to both sequence-swapping and poly-glycine substitution mutations, and the effects upon stability, folding, and structure were investigated. In the wild-type protein these turns are of identical length, but exhibit different conformations. These conformations were observed to be retained during sequence-swapping and glycine substitution mutagenesis. The results indicate that the beta-turn structure at these positions is not determined by the turn sequence. Structural analysis suggests that residues flanking the turn are a primary structural determinant of the conformation within the turn.

  14. The primary structure of stinging nettle (Urtica dioica) agglutinin. A two-domain member of the hevein family.

    PubMed

    Beintema, J J; Peumans, W J

    1992-03-09

    The primary structure of stinging nettle (Urtica dioica) agglutinin has been determined by sequence analysis of peptides obtained from three overlapping proteolytic digests. The sequence of 80 residues consists of two hevein-like domains with the same spacing of half-cystine residues and several other conserved residues as observed earlier in other proteins with hevein-like domains. The hinge region between the two domains is four residues longer than those between the four domains in cereal lectins like wheat germ agglutinin.

  15. Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deutscher, J.; Pevec, B.; Beyreuther, K.

    1986-10-21

    The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system,more » HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.« less

  16. Automated quantitative assessment of proteins' biological function in protein knowledge bases.

    PubMed

    Mayr, Gabriele; Lepperdinger, Günter; Lackner, Peter

    2008-01-01

    Primary protein sequence data are archived in databases together with information regarding corresponding biological functions. In this respect, UniProt/Swiss-Prot is currently the most comprehensive collection and it is routinely cross-examined when trying to unravel the biological role of hypothetical proteins. Bioscientists frequently extract single entries and further evaluate those on a subjective basis. In lieu of a standardized procedure for scoring the existing knowledge regarding individual proteins, we here report about a computer-assisted method, which we applied to score the present knowledge about any given Swiss-Prot entry. Applying this quantitative score allows the comparison of proteins with respect to their sequence yet highlights the comprehension of functional data. pfs analysis may be also applied for quality control of individual entries or for database management in order to rank entry listings.

  17. Crystal structure of bacillus subtilis YdaF protein : a putative ribosomal N-acetyltransferase.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brunzelle, J. S.; Wu, R.; Korolev, S. V.

    2004-12-01

    Comparative sequence analysis suggests that the ydaF gene encodes a protein (YdaF) that functions as an N-acetyltransferase, more specifically, a ribosomal N-acetyltransferase. Sequence analysis using basic local alignment search tool (BLAST) suggests that YdaF belongs to a large family of proteins (199 proteins found in 88 unique species of bacteria, archaea, and eukaryotes). YdaF also belongs to the COG1670, which includes the Escherichia coli RimL protein that is known to acetylate ribosomal protein L12. N-acetylation (NAT) has been found in all kingdoms. NAT enzymes catalyze the transfer of an acetyl group from acetyl-CoA (AcCoA) to a primary amino group. Formore » example, NATs can acetylate the N-terminal {alpha}-amino group, the {epsilon}-amino group of lysine residues, aminoglycoside antibiotics, spermine/speridine, or arylalkylamines such as serotonin. The crystal structure of the alleged ribosomal NAT protein, YdaF, from Bacillus subtilis presented here was determined as a part of the Midwest Center for Structural Genomics. The structure maintains the conserved tertiary structure of other known NATs and a high sequence similarity in the presumed AcCoA binding pocket in spite of a very low overall level of sequence identity to other NATs of known structure.« less

  18. Molecular characterisation of Atlantic salmon paramyxovirus (ASPV): A novel paramyxovirus associated with proliferative gill inflammation

    USGS Publications Warehouse

    Falk, K.; Batts, W.N.; Kvellestad, A.; Kurath, G.; Wiik-Nielsen, J.; Winton, J.R.

    2008-01-01

    Atlantic salmon paramyxovirus (ASPV) was isolated in 1995 from gills of farmed Atlantic salmon suffering from proliferative gill inflammation. The complete genome sequence of ASPV was determined, revealing a genome 16,968 nucleotides in length consisting of six non-overlapping genes coding for the nucleo- (N), phospho- (P), matrix- (M), fusion- (F), haemagglutinin-neuraminidase- (HN) and large polymerase (L) proteins in the order 3???-N-P-M-F-HN-L-5???. The various conserved features related to virus replication found in most paramyxoviruses were also found in ASPV. These include: conserved and complementary leader and trailer sequences, tri-nucleotide intergenic regions and highly conserved transcription start and stop signal sequences. The P gene expression strategy of ASPV was like that of the respiro-, morbilli- and henipaviruses, which express the P and C proteins from the primary transcript and edit a portion of the mRNA to encode V and W proteins. Sequence similarities among various features related to virus replication, pairwise comparisons of all deduced ASPV protein sequences with homologous regions from other members of the family Paramyxoviridae, and phylogenetic analyses of these amino acid sequences suggested that ASPV was a novel member of the sub-family Paramyxovirinae, most closely related to the respiroviruses. ?? 2008 Elsevier B.V. All rights reserved.

  19. The Genome of the Obligately Intracellular Bacterium Ehrlichia canis Reveals Themes of Complex Membrane Structure and Immune Evasion Strategies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mavromatis, K; Doyle, C Kuyler; Lykidis, A

    2006-01-01

    Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, {alpha}-proteobacterium, is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, 17 putative pseudogenes, and a substantial proportion of noncoding sequence (27%). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences and a unique serine-threonine bias associated with the potential for O glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associatedmore » with immune evasion were identified, one of which contains poly(G-C) tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Genes associated with pathogen-host interactions were identified, including a small group encoding proteins (n = 12) with tandem repeats and another group encoding proteins with eukaryote-like ankyrin domains (n = 7).« less

  20. The genome of obligately intracellular Ehrlichia canis revealsthemes of complex membrane structure and immune evasion strategies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.

    2005-09-01

    Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein familiesmore » associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).« less

  1. Co-opting sulphur-carrier proteins from primary metabolic pathways for 2-thiosugar biosynthesis.

    PubMed

    Sasaki, Eita; Zhang, Xuan; Sun, He G; Lu, Mei-yeh Jade; Liu, Tsung-lin; Ou, Albert; Li, Jeng-yi; Chen, Yu-hsiang; Ealick, Steven E; Liu, Hung-wen

    2014-06-19

    Sulphur is an essential element for life and is ubiquitous in living systems. Yet how the sulphur atom is incorporated into many sulphur-containing secondary metabolites is poorly understood. For bond formation between carbon and sulphur in primary metabolites, the major ionic sulphur sources are the persulphide and thiocarboxylate groups on sulphur-carrier (donor) proteins. Each group is post-translationally generated through the action of a specific activating enzyme. In all reported bacterial cases, the gene encoding the enzyme that catalyses the carbon-sulphur bond formation reaction and that encoding the cognate sulphur-carrier protein exist in the same gene cluster. To study the production of the 2-thiosugar moiety in BE-7585A, an antibiotic from Amycolatopsis orientalis, we identified a putative 2-thioglucose synthase, BexX, whose protein sequence and mode of action seem similar to those of ThiG, the enzyme that catalyses thiazole formation in thiamine biosynthesis. However, no gene encoding a sulphur-carrier protein could be located in the BE-7585A cluster. Subsequent genome sequencing uncovered a few genes encoding sulphur-carrier proteins that are probably involved in the biosynthesis of primary metabolites but only one activating enzyme gene in the A. orientalis genome. Further experiments showed that this activating enzyme can adenylate each of these sulphur-carrier proteins and probably also catalyses the subsequent thiolation, through its rhodanese domain. A proper combination of these sulphur-delivery systems is effective for BexX-catalysed 2-thioglucose production. The ability of BexX to selectively distinguish sulphur-carrier proteins is given a structural basis using X-ray crystallography. This study is, to our knowledge, the first complete characterization of thiosugar formation in nature and also demonstrates the receptor promiscuity of the A. orientalis sulphur-delivery system. Our results also show that co-opting the sulphur-delivery machinery of primary metabolism for the biosynthesis of sulphur-containing natural products is probably a general strategy found in nature.

  2. The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nylund, Stian; Karlsen, Marius; Nylund, Are

    2008-03-30

    The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less

  3. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  4. The leukocyte common antigen (CD45): a putative receptor-linked protein tyrosine phosphatase.

    PubMed Central

    Charbonneau, H; Tonks, N K; Walsh, K A; Fischer, E H

    1988-01-01

    A major protein tyrosine phosphatase (PTPase 1B) has been isolated in essentially homogeneous form from the soluble and particulate fractions of human placenta. Unexpectedly, partial amino acid sequences displayed no homology with the primary structures of the protein Ser/Thr phosphatases deduced from cDNA clones. However, the sequence is strikingly similar to the tandem C-terminal homologous domains of the leukocyte common antigen (CD45). A 157-residue segment of PTPase 1B displayed 40% and 33% sequence identity with corresponding regions from cytoplasmic domains I and II of human CD45. Similar degrees of identity have been observed among the catalytic domains of families of regulatory proteins such as protein kinases and cyclic nucleotide phosphodiesterases. On this basis, it is proposed that the CD45 family has protein tyrosine phosphatase activity and may represent a set of cell-surface receptors involved in signal transduction. This suggests that the repertoire of signal transduction mechanisms may include the direct control of an intracellular protein tyrosine phosphatase, offering the possibility of a regulatory balance with those protein tyrosine kinases that act at the internal surface of the membrane. Images PMID:2845400

  5. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

  6. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693

  7. Specific Increase of Protein Levels by Enhancing Translation Using Antisense Oligonucleotides Targeting Upstream Open Frames.

    PubMed

    Liang, Xue-Hai; Shen, Wen; Crooke, Stanley T

    2017-01-01

    A number of diseases are caused by low levels of key proteins; therefore, increasing the amount of specific proteins in human bodies is of therapeutic interest. Protein expression is downregulated by some structural or sequence elements present in the 5' UTR of mRNAs, such as upstream open reading frames (uORF). Translation initiation from uORF(s) reduces translation from the downstream primary ORF encoding the main protein product in the same mRNA, leading to a less efficient protein expression. Therefore, it is possible to use antisense oligonucleotides (ASOs) to specifically inhibit translation of the uORF by base-pairing with the uAUG region of the mRNA, redirecting translation machinery to initiate from the primary AUG site. Here we review the recent findings that translation of specific mRNAs can be enhanced using ASOs targeting uORF regions. Appropriately designed and optimized ASOs are highly specific, and they act in a sequence- and position-dependent manner, with very minor off-target effects. Protein levels can be increased using this approach in different types of human and mouse cells, and, importantly, also in mice. Since uORFs are present in around half of human mRNAs, the uORF-targeting ASOs may thus have valuable potential as research tools and as therapeutics to increase the levels of proteins for a variety of genes.

  8. Efficient use of unlabeled data for protein sequence classification: a comparative study

    PubMed Central

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-01-01

    Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450

  9. Predicting Flavonoid UGT Regioselectivity

    PubMed Central

    Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

    2011-01-01

    Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849

  10. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    PubMed

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.

  11. Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors.

    PubMed

    König, Caroline; Cárdenas, Martha I; Giraldo, Jesús; Alquézar, René; Vellido, Alfredo

    2015-09-29

    The characterization of proteins in families and subfamilies, at different levels, entails the definition and use of class labels. When the adscription of a protein to a family is uncertain, or even wrong, this becomes an instance of what has come to be known as a label noise problem. Label noise has a potentially negative effect on any quantitative analysis of proteins that depends on label information. This study investigates class C of G protein-coupled receptors, which are cell membrane proteins of relevance both to biology in general and pharmacology in particular. Their supervised classification into different known subtypes, based on primary sequence data, is hampered by label noise. The latter may stem from a combination of expert knowledge limitations and the lack of a clear correspondence between labels that mostly reflect GPCR functionality and the different representations of the protein primary sequences. In this study, we describe a systematic approach, using Support Vector Machine classifiers, to the analysis of G protein-coupled receptor misclassifications. As a proof of concept, this approach is used to assist the discovery of labeling quality problems in a curated, publicly accessible database of this type of proteins. We also investigate the extent to which physico-chemical transformations of the protein sequences reflect G protein-coupled receptor subtype labeling. The candidate mislabeled cases detected with this approach are externally validated with phylogenetic trees and against further trusted sources such as the National Center for Biotechnology Information, Universal Protein Resource, European Bioinformatics Institute and Ensembl Genome Browser information repositories. In quantitative classification problems, class labels are often by default assumed to be correct. Label noise, though, is bound to be a pervasive problem in bioinformatics, where labels may be obtained indirectly through complex, many-step similarity modelling processes. In the case of G protein-coupled receptors, methods capable of singling out and characterizing those sequences with consistent misclassification behaviour are required to minimize this problem. A systematic, Support Vector Machine-based method has been proposed in this study for such purpose. The proposed method enables a filtering approach to the label noise problem and might become a support tool for database curators in proteomics.

  12. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.

    PubMed

    Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F

    2017-08-18

    Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.

  13. The primary structure of the thymidine kinase gene of fish lymphocystis disease virus.

    PubMed

    Schnitzler, P; Handermann, M; Szépe, O; Darai, G

    1991-06-01

    The DNA nucleotide sequence of the thymidine kinase (TK) gene of fish lymphocystis disease virus (FLDV) which has been localized between the coordinates 0.678 to 0.688 of the viral genome was determined. The analysis of the DNA nucleotide sequence located between the recognition sites of HindIII (0.669 map unit; nucleotide position 1) and AccI (nucleotide position 2032) revealed the presence of an open reading frame of 954 bp on the lower strand of this region between nucleotide positions 1868 (ATG) and 915 (TAA). It encodes for a protein of 318 amino acid residues. The evolutionary relationships of the TK gene of FLDV to the other known TK genes was investigated using the method of progressive sequence alignment. These analyses revealed a high degree of diversity between the protein sequence of FLDV TK gene and the amino acid composition of other TKs tested. However, significant conservations were detected at several regions of amino acid residues of the FLDV TK protein when compared to the amino acid sequence of TKs of African swine fever virus, fowlpox virus, shope fibroma virus, and vaccinia virus and to the amino acid sequences of the cellular cytoplasmic TK of chicken, mouse, and man.

  14. Exploiting sulphur-carrier proteins from primary metabolism for 2-thiosugar biosynthesis

    PubMed Central

    Sasaki, Eita; Zhang, Xuan; Sun, He G.; Lu, Mei-Yeh Jade; Liu, Tsung-lin; Ou, Albert; Li, Jeng-yi; Chen, Yu-hsiang; Ealick, Steven E.; Liu, Hung-wen

    2014-01-01

    Sulphur is an essential element for life and exists ubiquitously in living systems1,2. Yet, how the sulphur atom is incorporated in many sulphur-containing secondary metabolites remains poorly understood. For C-S bond formation in primary metabolites, the major ionic sulphur sources are the protein-persulphide and protein-thiocarboxylate3,4. In each case, the persulphide and thiocarboxylate group on these sulphur-carrier (donor) proteins are post-translationally generated through the action of a specific activating enzyme. In all bacterial cases reported thus far, the genes encoding the enzyme that catalyzes the actual C-S bond formation reaction and its cognate sulphur-carrier protein co-exist in the same gene cluster5. To study 2-thiosugar production in BE-7585A, an antibiotic from Amycolatopsis orientalis, we identified a putative 2-thioglucose synthase, BexX, whose protein sequence and mode of action appear similar to those of ThiG, the enzyme catalyzing thiazole formation in thiamin biosynthesis6,7. However, no sulphur-carrier protein gene could be located in the BE-7585A cluster. Subsequent genome sequencing revealed the presence of a few sulphur-carrier proteins likely involved in the biosynthesis of primary metabolites, but surprisingly only a single activating enzyme gene in the entire genome of A. orientalis. Further experiments showed that this activating enzyme is capable of adenylating each of these sulphur-carrier proteins, and likely also catalyzing the subsequent thiolation taking advantage of its rhodanese activity. A proper combination of these sulphur delivery systems is effective for BexX-catalyzed 2-thioglucose production. The ability of BexX to selectively distinguish sulphur-carrier proteins is given a structural basis using X-ray crystallography. These studies represent the first complete characterization of a thiosugar formation in nature and also demonstrate the receptor promiscuity of the sulphur-delivery system in A. orientalis. Our results also provide evidence that exploitation of sulphur-delivery machineries of primary metabolism for the biosynthesis of sulphur-containing natural products is likely a general strategy found in nature. PMID:24814342

  15. AllergenOnline: A peer-reviewed, curated allergen database to assess novel food proteins for potential cross-reactivity.

    PubMed

    Goodman, Richard E; Ebisawa, Motohiro; Ferreira, Fatima; Sampson, Hugh A; van Ree, Ronald; Vieths, Stefan; Baumert, Joseph L; Bohle, Barbara; Lalithambika, Sreedevi; Wise, John; Taylor, Steve L

    2016-05-01

    Increasingly regulators are demanding evaluation of potential allergenicity of foods prior to marketing. Primary risks are the transfer of allergens or potentially cross-reactive proteins into new foods. AllergenOnline was developed in 2005 as a peer-reviewed bioinformatics platform to evaluate risks of new dietary proteins in genetically modified organisms (GMO) and novel foods. The process used to identify suspected allergens and evaluate the evidence of allergenicity was refined between 2010 and 2015. Candidate proteins are identified from the NCBI database using keyword searches, the WHO/IUIS nomenclature database and peer reviewed publications. Criteria to classify proteins as allergens are described. Characteristics of the protein, the source and human subjects, test methods and results are evaluated by our expert panel and archived. Food, inhalant, salivary, venom, and contact allergens are included. Users access allergen sequences through links to the NCBI database and relevant references are listed online. Version 16 includes 1956 sequences from 778 taxonomic-protein groups that are accepted with evidence of allergic serum IgE-binding and/or biological activity. AllergenOnline provides a useful peer-reviewed tool for identifying the primary potential risks of allergy for GMOs and novel foods based on criteria described by the Codex Alimentarius Commission (2003). © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Evidence for a vast peptide overlap between West Nile virus and human proteomes.

    PubMed

    Capone, Giovanni; Pagoni, Maria; Delfino, Antonella Pesce; Kanduc, Darja

    2013-10-01

    The primary amino acid sequence of West Nile virus (WNV) polyprotein, GenBank accession number M12294, was analyzed by computional biology. WNV is a mosquito-borne neurotropic flavivirus that has emerged globally as a significant cause of viral encephalitis in humans. Using pentapeptides as scanning units and the perfect peptide match program from PIR International Protein Sequence Database, we compared the WNV polyprotein and the human proteome. WNV polyprotein showed significant sequence similarities to a number of human proteins. Several of these proteins are involved in embryogenesis, neurite outgrowth, cortical neuron branching, formation of mature synapses, semaphorin interactions, and voltage dependent L-type calcium channel subunits. The biocomputional study suggest that common amino acid segments might represent a potential platform for further studies on the neurological pathophysiology of WNV infections. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Tobacco chloroplast tRNALys(UUU) gene contains a 2.5-kilobase-pair intron: An open reading frame and a conserved boundary sequence in the intron

    PubMed Central

    Sugita, Mamoru; Shinozaki, Kazuo; Sugiura, Masahiro

    1985-01-01

    The nucleotide sequence of a tRNALys(UUU) gene on tobacco (Nicotiana tabacum) chloroplast DNA has been determined. This gene is located 215 base pairs upstream from the gene for the 32,000-dalton thylakoid membrane protein on the same DNA strand and has a 2526-base-pair intron in the anticodon loop. The intron boundary sequence does not follow the G-U/A-G rule but is similar to those of tobacco chloroplast split genes for tRNAGly(UCC) and ribosomal proteins L2 and S12. The intron contains one major open reading frame of 509 codons. The codon usage in the open reading frame resembles those observed in the genes for tobacco chloroplast proteins so far analyzed. The primary transcript of this tRNA gene is 2.7 kilobases long. Images PMID:16593561

  18. Tobacco chloroplast tRNA(UUU) gene contains a 2.5-kilobase-pair intron: An open reading frame and a conserved boundary sequence in the intron.

    PubMed

    Sugita, M; Shinozaki, K; Sugiura, M

    1985-06-01

    The nucleotide sequence of a tRNA(Lys)(UUU) gene on tobacco (Nicotiana tabacum) chloroplast DNA has been determined. This gene is located 215 base pairs upstream from the gene for the 32,000-dalton thylakoid membrane protein on the same DNA strand and has a 2526-base-pair intron in the anticodon loop. The intron boundary sequence does not follow the G-U/A-G rule but is similar to those of tobacco chloroplast split genes for tRNA(Gly)(UCC) and ribosomal proteins L2 and S12. The intron contains one major open reading frame of 509 codons. The codon usage in the open reading frame resembles those observed in the genes for tobacco chloroplast proteins so far analyzed. The primary transcript of this tRNA gene is 2.7 kilobases long.

  19. Cross-Specificities between cII-like Proteins and pRE-like Promoters of Lambdoid Bacteriophages

    PubMed Central

    Wulff, Daniel L.; Mahoney, Michael E.

    1987-01-01

    We have investigated the activation of transcription from the pRE promoters of phages λ, 21 and P22 by the λ and 21 cII proteins and the P22 c1 (cII-like) protein, using an in vivo system in which cII protein from a derepressed prophage activates transcription from a pRE DNA fragment on a multicopy plasmid. We find that each protein is highly specific for its own cognate pRE promoter, although measureable cross-reactions are observed. The primary recognition sequence for cII protein on λ pRE is a pair of TTGC repeat sequences in the sequence 5'-TTGCN 6TTGC-3' at the -35 region of the promoter. This same sequence is found in 21 pRE, while P22 pRE has the sequence 5'-TTGCN6TTGT-3', which is the same as that of λctr1, a pRE+ variant of λ. λctr1 pRE is half as active as λ + pRE when assayed with either the λ cII or the P22 c1 proteins. Therefore, the single base change in the P22 repeat sequence cannot explain why the P22 c1 protein is much more active with P22 pRE than λ p RE. The dya5 mutation, a G→A change at position -43 of pRE, makes pRE a stronger promoter when assayed with either the λ or 21 cII proteins or the P22 c1 protein. We conclude that efficient activation of a cII-dependent promoter by a cII protein requires sequence information in addition to the TTGC repeat sequences. We do not know the characteristics of the proteins which are responsible for the specificity of each protein for its own cognate promoter. However, λdya8, which has a Glu27→Lys alteration in the λ cII protein and a cII+ phenotype, results in a mutant cII protein that is much more highly specific than wild-type cII protein for its own cognate λ p RE promoter. This is especially remarkable because the dya8 amino acid alteration makes the helix-2 region (the region of the protein predicted to make contact with the phosphodiester backbone of the DNA) of λ cII protein conform exactly with the helix-2 region of the P22 c1 protein in both charge and charge distribution. PMID:2953649

  20. Aromatic claw: A new fold with high aromatic content that evades structural prediction: Aromatic Claw

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sachleben, Joseph R.; Adhikari, Aashish N.; Gawlak, Grzegorz

    2016-11-10

    We determined the NMR structure of a highly aromatic (13%) protein of unknown function, Aq1974 from Aquifex aeolicus (PDB ID: 5SYQ). The unusual sequence of this protein has a tryptophan content five times the normal (six tryptophan residues of 114 or 5.2% while the average tryptophan content is 1.0%) with the tryptophans occurring in a WXW motif. It has no detectable sequence homology with known protein structures. Although its NMR spectrum suggested that the protein was rich in β-sheet, upon resonance assignment and solution structure determination, the protein was found to be primarily α-helical with a small two-stranded β-sheet withmore » a novel fold that we have termed an Aromatic Claw. As this fold was previously unknown and the sequence unique, we submitted the sequence to CASP10 as a target for blind structural prediction. At the end of the competition, the sequence was classified a hard template based model; the structural relationship between the template and the experimental structure was small and the predictions all failed to predict the structure. CSRosetta was found to predict the secondary structure and its packing; however, it was found that there was little correlation between CSRosetta score and the RMSD between the CSRosetta structure and the NMR determined one. This work demonstrates that even in relatively small proteins, we do not yet have the capacity to accurately predict the fold for all primary sequences. The experimental discovery of new folds helps guide the improvement of structural prediction methods.« less

  1. The TGA codons are present in the open reading frame of selenoprotein P cDNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hill, K.E.; Lloyd, R.S.; Read, R.

    1991-03-11

    The TGA codon in DNA has been shown to direct incorporation of selenocysteine into protein. Several proteins from bacteria and animals contain selenocysteine in their primary structures. Each of the cDNA clones of these selenoproteins contains one TGA codon in the open reading frame which corresponds to the selenocysteine in the protein. A cDNA clone for selenoprotein P (SeP), obtained from a {gamma}ZAP rat liver library, was sequenced by the dideoxy termination method. The correct reading frame was determined by comparison of the deduced amino acid sequence with the amino acid sequence of several peptides from SeP. Using SeP labelledmore » with {sup 75}Se in vivo, the selenocysteine content of the peptides was verified by the collection of carboxymethylated {sup 77}Se-selenocysteine as it eluted from the amino acid analyzer and determination of the radioactivity contained in the collected samples. Ten TGA codons are present in the open reading frame of the cDNA. Peptide fragmentation studies and the deduced sequence indicate that selenium-rich regions are located close to the carboxy terminus. Nine of the 10 selenocysteines are located in the terminal 26% of the sequence with four in the terminal 15 amino acids. The deduced sequence codes for a protein of 385 amino acids. Cleavage of the signal peptide gives the mature protein with 366 amino acids and a calculated mol wt of 41,052 Da. Searches of PIR and SWISSPROT protein databases revealed no similarity with glutathione peroxidase or other selenoproteins.« less

  2. Amino acid sequence and the cellular location of the Na(+)-dependent D-glucose symporters (SGLT1) in the ovine enterocyte and the parotid acinar cell.

    PubMed Central

    Tarpey, P S; Wood, I S; Shirazi-Beechey, S P; Beechey, R B

    1995-01-01

    The Na(+)-dependent D-glucose symporter has been shown to be located on the basolateral domain of the plasma membrane of ovine parotid acinar cells. This is in contrast to the apical location of this transporter in the ovine enterocyte. The amino acid sequences of these two proteins have been determined. They are identical. The results indicated that the signals responsible for the differential targeting of these two proteins to the apical and the basal domains of the plasma membrane are not contained within the primary amino acid sequence. Images Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 PMID:7492327

  3. Sequence basis of Barnacle Cement Nanostructure is Defined by Proteins with Silk Homology

    NASA Astrophysics Data System (ADS)

    So, Christopher R.; Fears, Kenan P.; Leary, Dagmar H.; Scancella, Jenifer M.; Wang, Zheng; Liu, Jinny L.; Orihuela, Beatriz; Rittschof, Dan; Spillmann, Christopher M.; Wahl, Kathryn J.

    2016-11-01

    Barnacles adhere by producing a mixture of cement proteins (CPs) that organize into a permanently bonded layer displayed as nanoscale fibers. These cement proteins share no homology with any other marine adhesives, and a common sequence-basis that defines how nanostructures function as adhesives remains undiscovered. Here we demonstrate that a significant unidentified portion of acorn barnacle cement is comprised of low complexity proteins; they are organized into repetitive sequence blocks and found to maintain homology to silk motifs. Proteomic analysis of aggregate bands from PAGE gels reveal an abundance of Gly/Ala/Ser/Thr repeats exemplified by a prominent, previously unidentified, 43 kDa protein in the solubilized adhesive. Low complexity regions found throughout the cement proteome, as well as multiple lysyl oxidases and peroxidases, establish homology with silk-associated materials such as fibroin, silk gum sericin, and pyriform spidroins from spider silk. Distinct primary structures defined by homologous domains shed light on how barnacles use low complexity in nanofibers to enable adhesion, and serves as a starting point for unraveling the molecular architecture of a robust and unique class of adhesive nanostructures.

  4. Molecular evolution of FtsZ protein sequences encoded within the genomes of archaea, bacteria, and eukaryota.

    PubMed

    Vaughan, Sue; Wickstead, Bill; Gull, Keith; Addinall, Stephen G

    2004-01-01

    The FtsZ protein is a polymer-forming GTPase which drives bacterial cell division and is structurally and functionally related to eukaryotic tubulins. We have searched for FtsZ-related sequences in all freely accessible databases, then used strict criteria based on the tertiary structure of FtsZ and its well-characterized in vitro and in vivo properties to determine which sequences represent genuine homologues of FtsZ. We have identified 225 full-length FtsZ homologues, which we have used to document, phylum by phylum, the primary sequence characteristics of FtsZ homologues from the Bacteria, Archaea, and Eukaryota. We provide evidence for at least five independent ftsZ gene-duplication events in the bacterial kingdom and suggest the existence of three ancestoral euryarchaeal FtsZ paralogues. In addition, we identify "FtsZ-like" sequences from Bacteria and Archaea that, while showing significant sequence similarity to FtsZs, are unlikely to bind and hydrolyze GTP.

  5. TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

    PubMed Central

    Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya

    2012-01-01

    Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565

  6. Functional genetic selection of Helix 66 in Escherichia coli 23S rRNA identified the eukaryotic-binding sequence for ribosomal protein L2

    PubMed Central

    Kitahara, Kei; Kajiura, Akimasa; Sato, Neuza Satomi; Suzuki, Tsutomu

    2007-01-01

    Ribosomal protein L2 is a highly conserved primary 23S rRNA-binding protein. L2 specifically recognizes the internal bulge sequence in Helix 66 (H66) of 23S rRNA and is localized to the intersubunit space through formation of bridge B7b with 16S rRNA. The L2-binding site in H66 is highly conserved in prokaryotic ribosomes, whereas the corresponding site in eukaryotic ribosomes has evolved into distinct classes of sequences. We performed a systematic genetic selection of randomized rRNA sequences in Escherichia coli, and isolated 20 functional variants of the L2-binding site. The isolated variants consisted of eukaryotic sequences, in addition to prokaryotic sequences. These results suggest that L2/L8e does not recognize a specific base sequence of H66, but rather a characteristic architecture of H66. The growth phenotype of the isolated variants correlated well with their ability of subunit association. Upon continuous cultivation of a deleterious variant, we isolated two spontaneous mutations within domain IV of 23S rRNA that compensated for its weak subunit association, and alleviated its growth defect, implying that functional interactions between intersubunit bridges compensate ribosomal function. PMID:17553838

  7. Structural details (kinks and non-α conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors

    PubMed Central

    Rigoutsos, Isidore; Riek, Peter; Graham, Robert M.; Novotny, Jiri

    2003-01-01

    One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular α-helical character (i.e. π-helices, 310-helices and kinks). A ‘search engine’ derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above ‘non-canonical’ helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from α-helicity are encoded locally in sequence patterns only about 7–9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure–function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html. PMID:12888523

  8. Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

    PubMed

    Rigoutsos, Isidore; Riek, Peter; Graham, Robert M; Novotny, Jiri

    2003-08-01

    One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular alpha-helical character (i.e. pi-helices, 3(10)-helices and kinks). A 'search engine' derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above 'non-canonical' helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from alpha-helicity are encoded locally in sequence patterns only about 7-9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure-function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html.

  9. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Eils, Roland

    2006-06-14

    The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  10. bfr1+, a novel gene of Schizosaccharomyces pombe which confers brefeldin A resistance, is structurally related to the ATP-binding cassette superfamily.

    PubMed Central

    Nagao, K; Taguchi, Y; Arioka, M; Kadokura, H; Takatsuki, A; Yoda, K; Yamasaki, M

    1995-01-01

    We have isolated a Schizosaccharomyces pombe gene, bfr1+, which on a multicopy plasmid vector, pDB248', confers resistance to brefeldin A (BFA), an inhibitor of intracellular protein transport. This gene encodes a novel protein of 1,531 amino acids with an intramolecular duplicated structure, each half containing a single ATP-binding consensus sequence and a set of six transmembrane sequences. This structural characteristic of bfr1+ protein resembles that of mammalian P-glycoprotein, which, by exporting a variety of anticancer drugs, has been shown to be responsible for multidrug resistance in tumor cells. Consistent with this is that S. pombe cells harboring bfr1+ on pDB248' are resistant to actinomycin D, cerulenin, and cytochalasin B, as well as to BFA. The relative positions of the ATP-binding sequences and the clusters of transmembrane sequences within the bfr1+ protein are, however, transposed in comparison with those in P-glycoprotein; the bfr1+ protein has N-terminal ATP-binding sequence followed by transmembrane segments in each half of the molecule. The bfr1+ protein exhibited significant homology in primary and secondary structures with two recently identified multidrug resistance gene products of Saccharomyces cerevisiae, Snq2 and Sts1/Pdr5/Ydr1. The bfr1+ gene is not essential for cell growth or mating, but a delta bfr1 mutant exhibited hypersensitivity to BFA. We propose that the bfr1+ protein is another member of the ATP-binding cassette superfamily and serves as an efflux pump of various antibiotics. PMID:7883711

  11. BAYESIAN PROTEIN STRUCTURE ALIGNMENT.

    PubMed

    Rodriguez, Abel; Schmidler, Scott C

    The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.

  12. Acyl carrier protein structural classification and normal mode analysis

    PubMed Central

    Cantu, David C; Forrester, Michael J; Charov, Katherine; Reilly, Peter J

    2012-01-01

    All acyl carrier protein primary and tertiary structures were gathered into the ThYme database. They are classified into 16 families by amino acid sequence similarity, with members of the different families having sequences with statistically highly significant differences. These classifications are supported by tertiary structure superposition analysis. Tertiary structures from a number of families are very similar, suggesting that these families may come from a single distant ancestor. Normal vibrational mode analysis was conducted on experimentally determined freestanding structures, showing greater fluctuations at chain termini and loops than in most helices. Their modes overlap more so within families than between different families. The tertiary structures of three acyl carrier protein families that lacked any known structures were predicted as well. PMID:22374859

  13. Cloning and characterization of the gene for an additional extracellular serine protease of Bacillus subtilis.

    PubMed

    Sloma, A; Rufo, G A; Theriault, K A; Dwyer, M; Wilson, S W; Pero, J

    1991-11-01

    We have purified a minor extracellular serine protease from a strain of Bacillus subtilis bearing null mutations in five extracellular protease genes: apr, npr, epr, bpr, and mpr (A. Sloma, C. Rudolph, G. Rufo, Jr., B. Sullivan, K. Theriault, D. Ally, and J. Pero, J. Bacteriol. 172:1024-1029, 1990). During purification, this novel protease (Vpr) was found bound in a complex in the void volume after gel filtration chromatography. The amino-terminal sequence of the purified protein was determined, and an oligonucleotide probe was constructed on the basis of the amino acid sequence. This probe was used to clone the structural gene (vpr) for this protease. The gene encodes a primary product of 806 amino acids. The amino acid sequence of the mature protein was preceded by a signal sequence of approximately 28 amino acids and a prosequence of approximately 132 amino acids. The mature protein has a predicted molecular weight of 68,197; however, the isolated protein has an apparent molecular weight of 28,500, suggesting that Vpr undergoes C-terminal processing or proteolysis. The vpr gene maps in the ctrA-sacA-epr region of the chromosome and is not required for growth or sporulation.

  14. Domain architecture conservation in orthologs

    PubMed Central

    2011-01-01

    Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. PMID:21819573

  15. Protein classification using sequential pattern mining.

    PubMed

    Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I

    2006-01-01

    Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.

  16. Coherent Somatic Mutation in Autoimmune Disease

    PubMed Central

    Ross, Kenneth Andrew

    2014-01-01

    Background Many aspects of autoimmune disease are not well understood, including the specificities of autoimmune targets, and patterns of co-morbidity and cross-heritability across diseases. Prior work has provided evidence that somatic mutation caused by gene conversion and deletion at segmentally duplicated loci is relevant to several diseases. Simple tandem repeat (STR) sequence is highly mutable, both somatically and in the germ-line, and somatic STR mutations are observed under inflammation. Results Protein-coding genes spanning STRs having markers of mutability, including germ-line variability, high total length, repeat count and/or repeat similarity, are evaluated in the context of autoimmunity. For the initiation of autoimmune disease, antigens whose autoantibodies are the first observed in a disease, termed primary autoantigens, are informative. Three primary autoantigens, thyroid peroxidase (TPO), phogrin (PTPRN2) and filaggrin (FLG), include STRs that are among the eleven longest STRs spanned by protein-coding genes. This association of primary autoantigens with long STR sequence is highly significant (). Long STRs occur within twenty genes that are associated with sixteen common autoimmune diseases and atherosclerosis. The repeat within the TTC34 gene is an outlier in terms of length and a link with systemic lupus erythematosus is proposed. Conclusions The results support the hypothesis that many autoimmune diseases are triggered by immune responses to proteins whose DNA sequence mutates somatically in a coherent, consistent fashion. Other autoimmune diseases may be caused by coherent somatic mutations in immune cells. The coherent somatic mutation hypothesis has the potential to be a comprehensive explanation for the initiation of many autoimmune diseases. PMID:24988487

  17. Mining new crystal protein genes from Bacillus thuringiensis on the basis of mixed plasmid-enriched genome sequencing and a computational pipeline.

    PubMed

    Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang; Sun, Ming

    2012-07-01

    We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.

  18. Purification, amino acid sequence and characterisation of kangaroo IGF-I.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1998-01-01

    Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.

  19. Primary structure and glycosylation of the S-layer protein of Haloferax volcanii.

    PubMed Central

    Sumper, M; Berg, E; Mengele, R; Strobel, I

    1990-01-01

    The outer surface of the archaebacterium Haloferax volcanii (formerly named Halobacterium volcanii) is covered with a hexagonally packed surface (S) layer. The gene coding for the S-layer protein was cloned and sequenced. The mature polypeptide is composed of 794 amino acids and is preceded by a typical signal sequence of 34 amino acid residues. A highly hydrophobic stretch of 20 amino acids at the C-terminal end probably serves as a transmembrane domain. Clusters of threonine residues are located adjacent to this membrane anchor. The S-layer protein is a glycoprotein containing both N- and O-glycosidic bonds. Glucosyl-(1----2)-galactose disaccharides are linked to threonine residues. The primary structure and the glycosylation pattern of the S-layer glycoproteins from Haloferax volcanii and from Halobacterium halobium were compared and found to exhibit distinct differences, despite the fact that three-dimensional reconstructions from electron micrographs revealed no structural differences at least to the 2.5-nm level attained so far (M. Kessel, I. Wildhaber, S. Cohe, and W. Baumeister, EMBO J. 7:1549-1554, 1988). Images PMID:2123862

  20. Primary structure and glycosylation of the S-layer protein of Haloferax volcanii.

    PubMed

    Sumper, M; Berg, E; Mengele, R; Strobel, I

    1990-12-01

    The outer surface of the archaebacterium Haloferax volcanii (formerly named Halobacterium volcanii) is covered with a hexagonally packed surface (S) layer. The gene coding for the S-layer protein was cloned and sequenced. The mature polypeptide is composed of 794 amino acids and is preceded by a typical signal sequence of 34 amino acid residues. A highly hydrophobic stretch of 20 amino acids at the C-terminal end probably serves as a transmembrane domain. Clusters of threonine residues are located adjacent to this membrane anchor. The S-layer protein is a glycoprotein containing both N- and O-glycosidic bonds. Glucosyl-(1----2)-galactose disaccharides are linked to threonine residues. The primary structure and the glycosylation pattern of the S-layer glycoproteins from Haloferax volcanii and from Halobacterium halobium were compared and found to exhibit distinct differences, despite the fact that three-dimensional reconstructions from electron micrographs revealed no structural differences at least to the 2.5-nm level attained so far (M. Kessel, I. Wildhaber, S. Cohe, and W. Baumeister, EMBO J. 7:1549-1554, 1988).

  1. Solid-state mAbs and ADCs subjected to heat-stress stability conditions can be covalently modified with buffer and excipient molecules.

    PubMed

    Valliere-Douglass, John F; Lewis, Patsy; Salas-Solano, Oscar; Jiang, Shan

    2015-02-01

    We report that a unique type of chemical modification occurs on lyophilized proteins. Freeze-dried mAbs and antibody-drug conjugates (ADCs) can be covalently modified with buffer and excipient molecules on the side chains of Glu, Asp, Thr, and Ser amino acids when subjected to temperature stress. The reaction occurs primarily via condensation of common buffers and excipients such as histidine, tris, trehalose and sucrose, with Glu and Asp carboxylates in the primary sequence of proteins. The reaction was also found to proceed through condensation of carboxylate containing buffers such as citrate, with Thr and Ser hydroxyls in the primary sequence of proteins. Based on the mass of the covalent adducts observed on mAbs and ADCs, it is apparent that the reaction produces water as a product and is thus favored in a low moisture environments such as a lyophilized protein cake. Herein, we present the evidence for the covalent modification of proteins drawn from case studies of in-depth characterization of heat-stressed mAbs and ADCs in the solid state. We also demonstrate how common charge variant assays such as imaged capillary isoelectric focusing and mass spectrometry can be used to monitor this specific class of protein modification. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.

  2. Molecular, biochemical, and functional characterization of a Nudix hydrolase protein that stimulates the activity of a nicotinoprotein alcohol dehydrogenase.

    PubMed

    Kloosterman, Harm; Vrijbloed, Jan W; Dijkhuizen, Lubbert

    2002-09-20

    The cytoplasmic coenzyme NAD(+)-dependent alcohol (methanol) dehydrogenase (MDH) employed by Bacillus methanolicus during growth on C(1)-C(4) primary alcohols is a decameric protein with 1 Zn(2+)-ion and 1-2 Mg(2+)-ions plus a tightly bound NAD(H) cofactor per subunit (a nicotinoprotein). Mg(2+)-ions are essential for binding of NAD(H) cofactor in MDH protein expressed in Escherichia coli. The low coenzyme NAD(+)-dependent activity of MDH with C(1)-C(4) primary alcohols is strongly stimulated by a second B. methanolicus protein (ACT), provided that MDH contains NAD(H) cofactor and Mg(2+)-ions are present in the assay mixture. Characterization of the act gene revealed the presence of the highly conserved amino acid sequence motif typical of Nudix hydrolase proteins in the deduced ACT amino acid sequence. The act gene was successfully expressed in E. coli allowing purification and characterization of active ACT protein. MDH activation by ACT involved hydrolytic removal of the nicotinamide mononucleotide NMN(H) moiety of the NAD(H) cofactor of MDH, changing its Ping-Pong type of reaction mechanism into a ternary complex reaction mechanism. Increased cellular NADH/NAD(+) ratios may reduce the ACT-mediated activation of MDH, thus preventing accumulation of toxic aldehydes. This represents a novel mechanism for alcohol dehydrogenase activity regulation.

  3. The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.

    PubMed Central

    Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R

    1982-01-01

    The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791

  4. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.

  5. Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

    USGS Publications Warehouse

    Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

    2004-01-01

    The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.

  6. Identification of cDNAs encoding HSP70 and HSP90 in the abalone Haliotis tuberculata: Transcriptional induction in response to thermal stress in hemocyte primary culture.

    PubMed

    Farcy, Emilie; Serpentini, Antoine; Fiévet, Bruno; Lebel, Jean-Marc

    2007-04-01

    Heat-shock proteins are a multigene family of proteins whose expression is induced by a variety of stress factors. This work reports the cloning and sequencing of HSP70 and HSP90 cDNAs in the gastropod Haliotis tuberculata. The deduced amino acid sequences of both HSP70 and HSP90 from H. tuberculata shared a high degree of homology with their homologues in other species, including typical eukaryotic HSP70 and HSP90 signature sequences. We examined their transcription expression pattern in abalone hemocytes exposed to thermal stress. Real-time PCR analysis indicated that both HSP70 and HSP90 mRNA were expressed in control animals but rapidly increased after heat-shock.

  7. Kinact: a computational approach for predicting activating missense mutations in protein kinases.

    PubMed

    Rodrigues, Carlos H M; Ascher, David B; Pires, Douglas E V

    2018-05-21

    Protein phosphorylation is tightly regulated due to its vital role in many cellular processes. While gain of function mutations leading to constitutive activation of protein kinases are known to be driver events of many cancers, the identification of these mutations has proven challenging. Here we present Kinact, a novel machine learning approach for predicting kinase activating missense mutations using information from sequence and structure. By adapting our graph-based signatures, Kinact represents both structural and sequence information, which are used as evidence to train predictive models. We show the combination of structural and sequence features significantly improved the overall accuracy compared to considering either primary or tertiary structure alone, highlighting their complementarity. Kinact achieved a precision of 87% and 94% and Area Under ROC Curve of 0.89 and 0.92 on 10-fold cross-validation, and on blind tests, respectively, outperforming well established tools (P < 0.01). We further show that Kinact performs equally well on homology models built using templates with sequence identity as low as 33%. Kinact is freely available as a user-friendly web server at http://biosig.unimelb.edu.au/kinact/.

  8. Structural evolution of the 4/1 genes and proteins in non-vascular and lower vascular plants.

    PubMed

    Morozov, Sergey Y; Milyutina, Irina A; Bobrova, Vera K; Ryazantsev, Dmitry Y; Erokhina, Tatiana N; Zavriev, Sergey K; Agranovsky, Alexey A; Solovyev, Andrey G; Troitsky, Alexey V

    2015-12-01

    The 4/1 protein of unknown function is encoded by a single-copy gene in most higher plants. The 4/1 protein of Nicotiana tabacum (Nt-4/1 protein) has been shown to be alpha-helical and predominantly expressed in conductive tissues. Here, we report the analysis of 4/1 genes and the encoded proteins of lower land plants. Sequences of a number of 4/1 genes from liverworts, lycophytes, ferns and gymnosperms were determined and analyzed together with sequences available in databases. Most of the vascular plants were found to encode Magnoliophyta-like 4/1 proteins exhibiting previously described gene structure and protein properties. Identification of the 4/1-like proteins in hornworts, liverworts and charophyte algae (sister lineage to all land plants) but not in mosses suggests that 4/1 proteins are likely important for plant development but not required for a primary metabolic function of plant cell. Copyright © 2015 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.

  9. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    PubMed

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  10. PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

    PubMed Central

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528

  11. Noncoding RNA. piRNA-guided transposon cleavage initiates Zucchini-dependent, phased piRNA production.

    PubMed

    Han, Bo W; Wang, Wei; Li, Chengjian; Weng, Zhiping; Zamore, Phillip D

    2015-05-15

    PIWI-interacting RNAs (piRNAs) protect the animal germ line by silencing transposons. Primary piRNAs, generated from transcripts of genomic transposon "junkyards" (piRNA clusters), are amplified by the "ping-pong" pathway, yielding secondary piRNAs. We report that secondary piRNAs, bound to the PIWI protein Ago3, can initiate primary piRNA production from cleaved transposon RNAs. The first ~26 nucleotides (nt) of each cleaved RNA becomes a secondary piRNA, but the subsequent ~26 nt become the first in a series of phased primary piRNAs that bind Piwi, allowing piRNAs to spread beyond the site of RNA cleavage. The ping-pong pathway increases only the abundance of piRNAs, whereas production of phased primary piRNAs from cleaved transposon RNAs adds sequence diversity to the piRNA pool, allowing adaptation to changes in transposon sequence. Copyright © 2015, American Association for the Advancement of Science.

  12. Human immunodeficiency virus type 1 LTR TATA and TAR region sequences required for transcriptional regulation.

    PubMed Central

    Garcia, J A; Harrich, D; Soultanakis, E; Wu, F; Mitsuyasu, R; Gaynor, R B

    1989-01-01

    The human immunodeficiency virus (HIV) type 1 LTR is regulated at the transcriptional level by both cellular and viral proteins. Using HeLa cell extracts, multiple regions of the HIV LTR were found to serve as binding sites for cellular proteins. An untranslated region binding protein UBP-1 has been purified and fractions containing this protein bind to both the TAR and TATA regions. To investigate the role of cellular proteins binding to both the TATA and TAR regions and their potential interaction with other HIV DNA binding proteins, oligonucleotide-directed mutagenesis of both these regions was performed followed by DNase I footprinting and transient expression assays. In the TATA region, two direct repeats TC/AAGC/AT/AGCTGC surround the TATA sequence. Mutagenesis of both of these direct repeats or of the TATA sequence interrupted binding over the TATA region on the coding strand, but only a mutation of the TATA sequence affected in vivo assays for tat-activation. In addition to TAR serving as the site of binding of cellular proteins, RNA transcribed from TAR is capable of forming a stable stem-loop structure. To determine the relative importance of DNA binding proteins as compared to secondary structure, oligonucleotide-directed mutations in the TAR region were studied. Local mutations that disrupted either the stem or loop structure were defective in gene expression. However, compensatory mutations which restored base pairing in the stem resulted in complete tat-activation. This indicated a significant role for the stem-loop structure in HIV gene expression. To determine the role of TAR binding proteins, mutations were constructed which extensively changed the primary structure of the TAR region, yet left stem base pairing, stem energy and the loop sequence intact. These mutations resulted in decreased protein binding to TAR DNA and defects in tat-activation, and revealed factor binding specifically to the loop DNA sequence. Further mutagenesis which inverted this stem and loop mutation relative to the HIV LTR mRNA start site resulted in even larger decreases in tat-activation. This suggests that multiple determinants, including protein binding, the loop sequence, and RNA or DNA secondary structure, are important in tat-activation and suggests that tat may interact with cellular proteins binding to DNA to increase HIV gene expression. Images PMID:2721501

  13. Ubiquitous and gene-specific regulatory 5' sequences in a sea urchin histone DNA clone coding for histone protein variants.

    PubMed Central

    Busslinger, M; Portmann, R; Irminger, J C; Birnstiel, M L

    1980-01-01

    The DNA sequences of the entire structural H4, H3, H2A and H2B genes and of their 5' flanking regions have been determined in the histone DNA clone h19 of the sea urchin Psammechinus miliaris. In clone h19 the polarity of transcription and the relative arrangement of the histone genes is identical to that in clone h22 of the same species. The histone proteins encoded by h19 DNA differ in their primary structure from those encoded by clone h22 and have been compared to histone protein sequences of other sea urchin species as well as other eukaryotes. A comparative analysis of the 5' flanking DNA sequences of the structural histone genes in both clones revealed four ubiquitous sequence motifs; a pentameric element GATCC, followed at short distance by the Hogness box GTATAAATAG, a conserved sequence PyCATTCPu, in or near which the 5' ends of the mRNAs map in h22 DNA and lastly a sequence A, containing the initiation codon. These sequences are also found, sometimes in modified version, in front of other eukaryotic genes transcribed by polymerase II. When prelude sequences of isocoding histone genes in clone h19 and h22 are compared areas of homology are seen to extend beyond the ubiquitous sequence motifs towards the divergent AT-rich spacer and terminate between approximately 140 and 240 nucleotides away from the structural gene. These prelude regions contain quite large conservative sequence blocks which are specific for each type of histone genes. Images PMID:7443547

  14. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    NASA Technical Reports Server (NTRS)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  15. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.

    PubMed

    Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B

    2016-01-01

    Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.

  16. A germin-like protein with superoxide dismutase activity in pea nodules with high protein sequence identity to a putative rhicadhesin receptor.

    PubMed

    Gucciardo, Sébastian; Wisniewski, Jean-Pierre; Brewin, Nicholas J; Bornemann, Stephen

    2007-01-01

    The cDNAs encoding three germin-like proteins (PsGER1, PsGER2a, and PsGER2b) were isolated from Pisum sativum. The coding sequence of PsGER1 transiently expressed in tobacco leaves gave a protein with superoxide dismutase activity but no detectable oxalate oxidase activity according to in-gel activity stains. The transient expression of wheat germin gf-2.8 oxalate oxidase showed oxalate oxidase but no superoxide dismutase activity under the same conditions. The superoxide dismutase activity of PsGER1 was resistant to high temperature, denaturation by detergent, and high concentrations of hydrogen peroxide. In salt-stressed pea roots, a heat-resistant superoxide dismutase activity was observed with an electrophoretic mobility similar to that of the PsGER1 protein, but this activity was below the detection limit in non-stressed or H(2)O(2)-stressed pea roots. Oxalate oxidase activity was not detected in either pea roots or nodules. Following in situ hybridization in developing pea nodules, PsGER1 transcript was detected in expanding cells just proximal to the meristematic zone and also in the epidermis, but to a lesser extent. PsGER1 is the first known germin-like protein with superoxide dismutase activity to be associated with nodules. It shared protein sequence identity with the N-terminal sequence of a putative plant receptor for rhicadhesin, a bacterial attachment protein. However, its primary location in nodules suggests functional roles other than as a rhicadhesin receptor required for the first stage of bacterial attachment to root hairs.

  17. Gene encoding a novel extracellular metalloprotease in Bacillus subtilis.

    PubMed Central

    Sloma, A; Rudolph, C F; Rufo, G A; Sullivan, B J; Theriault, K A; Ally, D; Pero, J

    1990-01-01

    The gene for a novel extracellular metalloprotease was cloned, and its nucleotide sequence was determined. The gene (mpr) encodes a primary product of 313 amino acids that has little similarity to other known Bacillus proteases. The amino acid sequence of the mature protease was preceded by a signal sequence of approximately 34 amino acids and a pro sequence of 58 amino acids. Four cysteine residues were found in the deduced amino acid sequence of the mature protein, indicating the possible presence of disulfide bonds. The mpr gene mapped in the cysA-aroI region of the chromosome and was not required for growth or sporulation. Images FIG. 2 FIG. 7 PMID:2105291

  18. Quantitative analysis of RNA-protein interactions on a massively parallel array for mapping biophysical and evolutionary landscapes

    PubMed Central

    Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.

    2015-01-01

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714

  19. Molecular recognition of PTS-1 cargo proteins by Pex5p: implications for protein mistargeting in primary hyperoxaluria.

    PubMed

    Mesa-Torres, Noel; Tomic, Nenad; Albert, Armando; Salido, Eduardo; Pey, Angel L

    2015-02-13

    Peroxisomal biogenesis and function critically depends on the import of cytosolic proteins carrying a PTS1 sequence into this organelle upon interaction with the peroxin Pex5p. Recent structural studies have provided important insights into the molecular recognition of cargo proteins by Pex5p. Peroxisomal import is a key feature in the pathogenesis of primary hyperoxaluria type 1 (PH1), where alanine:glyoxylate aminotransferase (AGT) undergoes mitochondrial mistargeting in about a third of patients. Here, we study the molecular recognition of PTS1 cargo proteins by Pex5p using oligopeptides and AGT variants bearing different natural PTS1 sequences, and employing an array of biophysical, computational and cell biology techniques. Changes in affinity for Pex5p (spanning over 3-4 orders of magnitude) reflect different thermodynamic signatures, but overall bury similar amounts of molecular surface. Structure/energetic analyses provide information on the contribution of ancillary regions and the conformational changes induced in Pex5p and the PTS1 cargo upon complex formation. Pex5p stability in vitro is enhanced upon cargo binding according to their binding affinities. Moreover, we provide evidence that the rational modulation of the AGT: Pex5p binding affinity might be useful tools to investigate mistargeting and misfolding in PH1 by pulling the folding equilibria towards the native and peroxisomal import competent state.

  20. Hidden Markov models of biological primary sequence information.

    PubMed Central

    Baldi, P; Chauvin, Y; Hunkapiller, T; McClure, M A

    1994-01-01

    Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences. PMID:8302831

  1. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins.

    PubMed

    Firman, Taylor; Ghosh, Kingshuk

    2018-03-28

    We present an analytical theory to compute conformations of heteropolymers-applicable to describe disordered proteins-as a function of temperature and charge sequence. The theory describes coil-globule transition for a given protein sequence when temperature is varied and has been benchmarked against the all-atom Monte Carlo simulation (using CAMPARI) of intrinsically disordered proteins (IDPs). In addition, the model quantitatively shows how subtle alterations of charge placement in the primary sequence-while maintaining the same charge composition-can lead to significant changes in conformation, even as drastic as a coil (swelled above a purely random coil) to globule (collapsed below a random coil) and vice versa. The theory provides insights on how to control (enhance or suppress) these changes by tuning the temperature (or solution condition) and charge decoration. As an application, we predict the distribution of conformations (at room temperature) of all naturally occurring IDPs in the DisProt database and notice significant size variation even among IDPs with a similar composition of positive and negative charges. Based on this, we provide a new diagram-of-states delineating the sequence-conformation relation for proteins in the DisProt database. Next, we study the effect of post-translational modification, e.g., phosphorylation, on IDP conformations. Modifications as little as two-site phosphorylation can significantly alter the size of an IDP with everything else being constant (temperature, salt concentration, etc.). However, not all possible modification sites have the same effect on protein conformations; there are certain "hot spots" that can cause maximal change in conformation. The location of these "hot spots" in the parent sequence can readily be identified by using a sequence charge decoration metric originally introduced by Sawle and Ghosh. The ability of our model to predict conformations (both expanded and collapsed states) of IDPs at a high-throughput level can provide valuable insights into the different mechanisms by which phosphorylation/charge mutation controls IDP function.

  2. A domain-centric solution to functional genomics via dcGO Predictor

    PubMed Central

    2013-01-01

    Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era. PMID:23514627

  3. Design of a glutamine substrate tag enabling protein labelling mediated by Bacillus subtilis transglutaminase.

    PubMed

    Oteng-Pabi, Samuel K; Clouthier, Christopher M; Keillor, Jeffrey W

    2018-01-01

    Transglutaminases (TGases) are enzymes that catalyse protein cross-linking through a transamidation reaction between the side chain of a glutamine residue on one protein and the side chain of a lysine residue on another. Generally, TGases show low substrate specificity with respect to their amine substrate, such that a wide variety of primary amines can participate in the modification of specific glutamine residue. Although a number of different TGases have been used to mediate these bioconjugation reactions, the TGase from Bacillus subtilis (bTG) may be particularly suited to this application. It is smaller than most TGases, can be expressed in a soluble active form, and lacks the calcium dependence of its mammalian counterparts. However, little is known regarding this enzyme and its glutamine substrate specificity, limiting the scope of its application. In this work, we designed a FRET-based ligation assay to monitor the bTG-mediated conjugation of the fluorescent proteins Clover and mRuby2. This assay allowed us to screen a library of random heptapeptide glutamine sequences for their reactivity with recombinant bTG in bacterial cells, using fluorescence assisted cell sorting. From this library, several reactive sequences were identified and kinetically characterized, with the most reactive sequence (YAHQAHY) having a kcat/KM value of 19 ± 3 μM-1 min-1. This sequence was then genetically appended onto a test protein as a reactive 'Q-tag' and fluorescently labelled with dansyl-cadaverine, in the first demonstration of protein labelling mediated by bTG.

  4. Cloning and characterization of the gene for an additional extracellular serine protease of Bacillus subtilis.

    PubMed Central

    Sloma, A; Rufo, G A; Theriault, K A; Dwyer, M; Wilson, S W; Pero, J

    1991-01-01

    We have purified a minor extracellular serine protease from a strain of Bacillus subtilis bearing null mutations in five extracellular protease genes: apr, npr, epr, bpr, and mpr (A. Sloma, C. Rudolph, G. Rufo, Jr., B. Sullivan, K. Theriault, D. Ally, and J. Pero, J. Bacteriol. 172:1024-1029, 1990). During purification, this novel protease (Vpr) was found bound in a complex in the void volume after gel filtration chromatography. The amino-terminal sequence of the purified protein was determined, and an oligonucleotide probe was constructed on the basis of the amino acid sequence. This probe was used to clone the structural gene (vpr) for this protease. The gene encodes a primary product of 806 amino acids. The amino acid sequence of the mature protein was preceded by a signal sequence of approximately 28 amino acids and a prosequence of approximately 132 amino acids. The mature protein has a predicted molecular weight of 68,197; however, the isolated protein has an apparent molecular weight of 28,500, suggesting that Vpr undergoes C-terminal processing or proteolysis. The vpr gene maps in the ctrA-sacA-epr region of the chromosome and is not required for growth or sporulation. Images FIG. 1 PMID:1938892

  5. Classification of viral zoonosis through receptor pattern analysis.

    PubMed

    Bae, Se-Eun; Son, Hyeon Seok

    2011-04-13

    Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins. Zoonosis can occur not only through direct transmission from vertebrates to humans, but also through intermediate reservoirs or other environmental factors. Viruses can be categorized according to genotype (ssDNA, dsDNA, ssRNA and dsRNA viruses). Among them, the RNA viruses exhibit particularly high mutation rates and are especially problematic for this reason. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species. In this study, we sought to predict zoonotic propensity through the analysis of receptor characteristics. We hypothesized that the major barrier to interspecies virus transmission is that receptor sequences vary among species--in other words, that the specific amino acid sequence of the receptor determines the ability of the viral envelope protein to attach to the cell. We analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species and used a statistical discriminant analysis to predict the likelihood of transmission among species. This study is an attempt to predict zoonosis through simple computational analysis of receptor sequence differences. Our method may be useful in predicting the zoonotic potential of newly discovered viral strains.

  6. Whole Exome Sequencing of Pediatric Gastric Adenocarcinoma Reveals an Atypical Presentation of Li-Fraumeni Syndrome

    PubMed Central

    Chang, Vivian Y.; Federman, Noah; Martinez-Agosto, Julian; Tatishchev, Sergei F.; Nelson, Stanley F.

    2014-01-01

    Background Gastric adenocarcinoma is a rare diagnosis in childhood. A 14-year old male patient presented with metastatic gastric adenocarcinoma, and a strong family history of colon cancer. Clinical sequencing of CDH1 and APC were negative. Whole exome sequencing was therefore applied to capture the majority of protein-coding regions for the identification of single-nucleotide variants, small insertion/deletions, and copy number abnormalities in the patient’s germline as well as primary tumor. Materials and Methods DNA was extracted from the patient’s blood, primary tumor, and the unaffected mother’s blood. DNA libraries were constructed and sequenced on Illumina HiSeq2000. Data were post-processed using Picard and Samtools, then analyzed with the Genome Analysis Toolkit. Variants were annotated using an in-house Ensembl-based program. Copy number was assessed using ExomeCNV. Results Each sample was sequenced to a mean depth of coverage of greater than 120×. A rare non-synonymous coding SNV in TP53 was identified in the germline. There were 10 somatic cancer protein-damaging variants that were not observed in the unaffected mother genome. ExomeCNV comparing tumor to the patient’s germline, identified abnormal copy number, spanning 6,946 genes. Conclusion We present an unusual case of Li-Fraumeni detected by whole exome sequencing. There were also likely driver somatic mutations in the gastric adenocarcinoma. These results highlight the need for more thorough and broad scale germline and cancer analyses to accurately inform patients of inherited risk to cancer and to identify somatic mutations. PMID:23015295

  7. N-terminal dual lipidation-coupled molecular targeting into the primary cilium.

    PubMed

    Kumeta, Masahiro; Panina, Yulia; Yamazaki, Hiroya; Takeyasu, Kunio; Yoshimura, Shige H

    2018-06-13

    The primary cilium functions as an "antenna" for cell signaling, studded with characteristic transmembrane receptors and soluble protein factors, raised above the cell surface. In contrast to the transmembrane proteins, targeting mechanisms of nontransmembrane ciliary proteins are poorly understood. We focused on a pathogenic mutation that abolishes ciliary localization of retinitis pigmentosa 2 protein and revealed a dual acylation-dependent ciliary targeting pathway. Short N-terminal sequences which contain myristoylation and palmitoylation sites are sufficient to target a marker protein into the cilium in a palmitoylation-dependent manner. A Golgi-localized palmitoyltransferase DHHC-21 was identified as the key enzyme controlling this targeting pathway. Rapid turnover of the targeted protein was ensured by cholesterol-dependent membrane fluidity, which balances highly and less-mobile populations of the molecules within the cilium. This targeting signal was found in a set of signal transduction molecules, suggesting a general role of this pathway in proper ciliary organization, and dysfunction in ciliary disorders. © 2018 Molecular Biology Society of Japan and John Wiley & Sons Australia, Ltd.

  8. Conserved thioredoxin fold is present in Pisum sativum L. sieve element occlusion-1 protein

    PubMed Central

    Umate, Pavan; Tuteja, Renu

    2010-01-01

    Homology-based three-dimensional model for Pisum sativum sieve element occlusion 1 (Ps.SEO1) (forisomes) protein was constructed. A stretch of amino acids (residues 320 to 456) which is well conserved in all known members of forisomes proteins was used to model the 3D structure of Ps.SEO1. The structural prediction was done using Protein Homology/analogY Recognition Engine (PHYRE) web server. Based on studies of local sequence alignment, the thioredoxin-fold containing protein [Structural Classification of Proteins (SCOP) code d1o73a_], a member of the glutathione peroxidase family was selected as a template for modeling the spatial structure of Ps.SEO1. Selection was based on comparison of primary sequence, higher match quality and alignment accuracy. Motif 1 (EVF) is conserved in Ps.SEO1, Vicia faba (Vf.For1) and Medicago truncatula (MT.SEO3); motif 2 (KKED) is well conserved across all forisomes proteins and motif 3 (IGYIGNP) is conserved in Ps.SEO1 and Vf.For1. PMID:20404566

  9. Cloning and sequencing of the cDNA species for mammalian dimeric dihydrodiol dehydrogenases.

    PubMed Central

    Arimitsu, E; Aoki, S; Ishikura, S; Nakanishi, K; Matsuura, K; Hara, A

    1999-01-01

    Cynomolgus and Japanese monkey kidneys, dog and pig livers and rabbit lens contain dimeric dihydrodiol dehydrogenase (EC 1.3.1.20) associated with high carbonyl reductase activity. Here we have isolated cDNA species for the dimeric enzymes by reverse transcriptase-PCR from human intestine in addition to the above five animal tissues. The amino acid sequences deduced from the monkey, pig and dog cDNA species perfectly matched the partial sequences of peptides digested from the respective enzymes of these animal tissues, and active recombinant proteins were expressed in a bacterial system from the monkey and human cDNA species. Northern blot analysis revealed the existence of a single 1.3 kb mRNA species for the enzyme in these animal tissues. The human enzyme shared 94%, 85%, 84% and 82% amino acid identity with the enzymes of the two monkey strains (their sequences were identical), the dog, the pig and the rabbit respectively. The sequences of the primate enzymes consisted of 335 amino acid residues and lacked one amino acid compared with the other animal enzymes. In contrast with previous reports that other types of dihydrodiol dehydrogenase, carbonyl reductases and enzymes with either activity belong to the aldo-keto reductase family or the short-chain dehydrogenase/reductase family, dimeric dihydrodiol dehydrogenase showed no sequence similarity with the members of the two protein families. The dimeric enzyme aligned with low degrees of identity (14-25%) with several prokaryotic proteins, in which 47 residues are strictly or highly conserved. Thus dimeric dihydrodiol dehydrogenase has a primary structure distinct from the previously known mammalian enzymes and is suggested to constitute a novel protein family with the prokaryotic proteins. PMID:10477285

  10. A recombinant isoform of the Ole e 7 olive pollen allergen assembled by de novo mass spectrometry retains the allergenic ability of the natural allergen.

    PubMed

    Oeo-Santos, Carmen; Mas, Salvador; Benedé, Sara; López-Lucendo, María; Quiralte, Joaquín; Blanca, Miguel; Mayorga, Cristobalina; Villalba, Mayte; Barderas, Rodrigo

    2018-06-05

    The allergenic non-specific lipid transfer protein Ole e 7 from olive pollen is a major allergen associated with severe symptoms in areas with high olive pollen levels. Despite its clinical importance, its cloning and recombinant production has been unable by classical approaches. This study aimed at determining by mass-spectrometry based proteomics its complete amino acid sequence for its subsequent expression and characterization. To this end, the natural protein was in-2D-gel tryptic digested, and CID and HCD fragmentation spectra obtained by nLC-MS/MS analyzed using PEAKS software. Thirteen out of the 457 de novo sequenced peptides obtained allowed assembling its full-length amino acid sequence. Then, Ole e 7-encoding cDNA was synthesized and cloned in pPICZαA vector for its expression in Pichia pastoris yeast. The analyses by Circular Dichroism, and WB, ELISA and cell-based tests using sera and blood from olive pollen-sensitized patients showed that rOle e 7 mostly retained the structural, allergenic and antigenic properties of the natural allergen. In summary, rOle e 7 allergen assembled by de novo peptide sequencing by MS behaved immunologically similar to the natural allergen scarcely isolated from pollen. Olive pollen is an important cause of allergy. The non-specific lipid binding protein Ole e 7 is a major allergen with a high incidence and a phenotype associated to severe clinical symptoms. Despite its relevance, its cloning and recombinant expression has been unable by classical techniques. Here, we have inferred the primary amino acid sequence of Ole e 7 by mass-spectrometry. We separated Ole e 7 isolated from pollen by 2DE. After in-gel digestion with trypsin and a direct analysis by nLC-MS/MS in an LTQ-Orbitrap Velos, we got the complete de novo sequenced peptides repertoire that allowed the assembling of the primary sequence of Ole e 7. After its protein expression, purification to homogeneity, and structural and immunological characterization using sera from olive pollen allergic patients and cell-based assays, we observed that the recombinant allergen retained the antigenic and allergenic properties of the natural allergen. Collectively, we show that the recombinant protein assembled by proteomics would be suitable for a better in vitro diagnosis of olive pollen allergic patients. Copyright © 2018. Published by Elsevier B.V.

  11. A Molecular Genetics Laboratory Course Applying Bioinformatics and Cell Biology in the Context of Original Research

    PubMed Central

    Pruitt, Wendy M.; Robinson, Lucy C.

    2008-01-01

    Research based laboratory courses have been shown to stimulate student interest in science and to improve scientific skills. We describe here a project developed for a semester-long research-based laboratory course that accompanies a genetics lecture course. The project was designed to allow students to become familiar with the use of bioinformatics tools and molecular biology and genetic approaches while carrying out original research. Students were required to present their hypotheses, experiments, and results in a comprehensive lab report. The lab project concerned the yeast casein kinase 1 (CK1) protein kinase Yck2. CK1 protein kinases are present in all organisms and are well conserved in primary structure. These enzymes display sequence features that differ from other protein kinase subfamilies. Students identified such sequences within the CK1 subfamily, chose a sequence to analyze, used available structural data to determine possible functions for their sequences, and designed mutations within the sequences. After generating the mutant alleles, these were expressed in yeast and tested for function by using two growth assays. The student response to the project was positive, both in terms of knowledge and skills increases and interest in research, and several students are continuing the analysis of mutant alleles as summer projects. PMID:19047427

  12. Molecular cloning, gene expression analysis, and recombinant protein expression of novel silk proteins from larvae of a retreat-maker caddisfly, Stenopsyche marmorata.

    PubMed

    Bai, Xue; Sakaguchi, Mayo; Yamaguchi, Yuko; Ishihara, Shiori; Tsukada, Masuhiro; Hirabayashi, Kimio; Ohkawa, Kousaku; Nomura, Takaomi; Arai, Ryoichi

    2015-08-28

    Retreat-maker larvae of Stenopsyche marmorata, one of the major caddisfly species in Japan, produce silk threads and adhesives to build food capture nets and protective nests in water. Research on these underwater adhesive silk proteins potentially leads to the development of new functional biofiber materials. Recently, we identified four major S. marmorata silk proteins (Smsps), Smsp-1, Smsp-2, Smsp-3, and Smsp-4 from silk glands of S. marmorata larvae. In this study, we cloned full-length cDNAs of Smsp-2, Smsp-3, and Smsp-4 from the cDNA library of the S. marmorata silk glands to reveal the primary sequences of Smsps. Homology search results of the deduced amino acid sequences indicate that Smsp-2 and Smsp-4 are novel proteins. The Smsp-2 sequence [167 amino acids (aa)] has an array of GYD-rich repeat motifs and two (SX)4E motifs. The Smsp-4 sequence (132 aa) contains a number of GW-rich repeat motifs and three (SX)4E motifs. The Smsp-3 sequence (248 aa) exhibits high homology with fibroin light chain of other caddisflies. Gene expression analysis of Smsps by real-time PCR suggested that the gene expression of Smsp-1 and Smsp-3 was relatively stable throughout the year, whereas that of Smsp-2 and Smsp-4 varied seasonally. Furthermore, Smsps recombinant protein expression was successfully performed in Escherichia coli. The study provides new molecular insights into caddisfly aquatic silk and its potential for future applications. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Regulation of Bacteria-Induced Intercellular Adhesion Molecule-1 by CCAAT/Enhancer Binding Proteins

    PubMed Central

    Manzel, Lori J.; Chin, Cecilia L.; Behlke, Mark A.; Look, Dwight C.

    2009-01-01

    Direct interaction between bacteria and epithelial cells may initiate or amplify the airway response through induction of epithelial defense gene expression by nuclear factor-κB (NF-κB). However, multiple signaling pathways modify NF-κB effects to modulate gene expression. In this study, the effects of CCAAT/enhancer binding protein (C/EBP) family members on induction of the leukocyte adhesion glycoprotein intercellular adhesion molecule-1 (ICAM-1) was examined in primary cultures of human tracheobronchial epithelial cells incubated with nontypeable Haemophilus influenzae. Increased ICAM-1 gene transcription in response to H. influenzae required gene sequences located at −200 to −135 in the 5′-flanking region that contain a C/EBP-binding sequence immediately upstream of the NF-κB enhancer site. Constitutive C/EBPβ was found to have an important role in epithelial cell ICAM-1 regulation, while the adjacent NF-κB sequence binds the RelA/p65 and NF-κB1/p50 members of the NF-κB family to induce ICAM-1 expression in response to H. influenzae. The expression of C/EBP proteins is not regulated by p38 mitogen-activated protein kinase activation, but p38 affects gene transcription by increasing the binding of TATA-binding protein to TATA-box–containing gene sequences. Epithelial cell ICAM-1 expression in response to H. influenzae was decreased by expressing dominant-negative protein or RNA interference against C/EBPβ, confirming its role in ICAM-1 regulation. Although airway epithelial cells express multiple constitutive and inducible C/EBP family members that bind C/EBP sequences, the results indicate that C/EBPβ plays a central role in modulation of NF-κB–dependent defense gene expression in human airway epithelial cells after exposure to H. influenzae. PMID:18703796

  14. A constraint logic programming approach to associate 1D and 3D structural components for large protein complexes.

    PubMed

    Dal Palù, Alessandro; Pontelli, Enrico; He, Jing; Lu, Yonggang

    2007-01-01

    The paper describes a novel framework, constructed using Constraint Logic Programming (CLP) and parallelism, to determine the association between parts of the primary sequence of a protein and alpha-helices extracted from 3D low-resolution descriptions of large protein complexes. The association is determined by extracting constraints from the 3D information, regarding length, relative position and connectivity of helices, and solving these constraints with the guidance of a secondary structure prediction algorithm. Parallelism is employed to enhance performance on large proteins. The framework provides a fast, inexpensive alternative to determine the exact tertiary structure of unknown proteins.

  15. Serological profiling of the EBV immune response in Chronic Fatigue Syndrome using a peptide microarray.

    PubMed

    Loebel, Madlen; Eckey, Maren; Sotzny, Franziska; Hahn, Elisabeth; Bauer, Sandra; Grabowski, Patricia; Zerweck, Johannes; Holenya, Pavlo; Hanitsch, Leif G; Wittke, Kirsten; Borchmann, Peter; Rüffer, Jens-Ulrich; Hiepe, Falk; Ruprecht, Klemens; Behrends, Uta; Meindl, Carola; Volk, Hans-Dieter; Reimer, Ulf; Scheibenbogen, Carmen

    2017-01-01

    Epstein-Barr-Virus (EBV) plays an important role as trigger or cofactor for various autoimmune diseases. In a subset of patients with Chronic Fatigue Syndrome (CFS) disease starts with infectious mononucleosis as late primary EBV-infection, whereby altered levels of EBV-specific antibodies can be observed in another subset of patients. We performed a comprehensive mapping of the IgG response against EBV comparing 50 healthy controls with 92 CFS patients using a microarray platform. Patients with multiple sclerosis (MS), systemic lupus erythematosus (SLE) and cancer-related fatigue served as controls. 3054 overlapping peptides were synthesised as 15-mers from 14 different EBV proteins. Array data was validated by ELISA for selected peptides. Prevalence of EBV serotypes was determined by qPCR from throat washing samples. EBV type 1 infections were found in patients and controls. EBV seroarray profiles between healthy controls and CFS were less divergent than that observed for MS or SLE. We found significantly enhanced IgG responses to several EBNA-6 peptides containing a repeat sequence in CFS patients compared to controls. EBNA-6 peptide IgG responses correlated well with EBNA-6 protein responses. The EBNA-6 repeat region showed sequence homologies to various human proteins. Patients with CFS had a quite similar EBV IgG antibody response pattern as healthy controls. Enhanced IgG reactivity against an EBNA-6 repeat sequence and against EBNA-6 protein is found in CFS patients. Homologous sequences of various human proteins with this EBNA-6 repeat sequence might be potential targets for antigenic mimicry.

  16. Self-organization of the protocell was a forward process

    NASA Technical Reports Server (NTRS)

    Fox, S. W.; Matsuno, K.

    1983-01-01

    Yockey's (1981) interpretation of information theory relative to concepts of self-organization in the origin of life is criticized on the ground that it assumes that each amino acid residue type in a given sequence is an unaided information carrier throughout evolution. It is argued that more than one amino acid residue can act as a unit information carrier, and that this was the case in prebiotic protein evolution. Forward-extrapolation should be used to study prebiotic evolution, not backward-extrapolation. Transposing the near-random internal order of modern proteins to primitive proteins, as Yockey has done, is an unsupported assumption and disagrees with the results of experimental models of the primordial type. Studies indicate that early primary information carriers in evolution were mixtures of free alpha amino acids which necessarily had the capability of sequencing themselves.

  17. Application of the MIDAS approach for analysis of lysine acetylation sites.

    PubMed

    Evans, Caroline A; Griffiths, John R; Unwin, Richard D; Whetton, Anthony D; Corfe, Bernard M

    2013-01-01

    Multiple Reaction Monitoring Initiated Detection and Sequencing (MIDAS™) is a mass spectrometry-based technique for the detection and characterization of specific post-translational modifications (Unwin et al. 4:1134-1144, 2005), for example acetylated lysine residues (Griffiths et al. 18:1423-1428, 2007). The MIDAS™ technique has application for discovery and analysis of acetylation sites. It is a hypothesis-driven approach that requires a priori knowledge of the primary sequence of the target protein and a proteolytic digest of this protein. MIDAS essentially performs a targeted search for the presence of modified, for example acetylated, peptides. The detection is based on the combination of the predicted molecular weight (measured as mass-charge ratio) of the acetylated proteolytic peptide and a diagnostic fragment (product ion of m/z 126.1), which is generated by specific fragmentation of acetylated peptides during collision induced dissociation performed in tandem mass spectrometry (MS) analysis. Sequence information is subsequently obtained which enables acetylation site assignment. The technique of MIDAS was later trademarked by ABSciex for targeted protein analysis where an MRM scan is combined with full MS/MS product ion scan to enable sequence confirmation.

  18. Antipeptide antibodies that can distinguish specific subunit polypeptides of glutamine synthetase from bean (Phaseolus vulgaris L.)

    NASA Technical Reports Server (NTRS)

    Cai, X.; Henry, R. L.; Takemoto, L. J.; Guikema, J. A.; Wong, P. P.; Spooner, B. S. (Principal Investigator)

    1992-01-01

    The amino acid sequences of the beta and gamma subunit polypeptides of glutamine synthetase from bean (Phaseolus vulgaris L.) root nodules are very similar. However, there are small regions within the sequences that are significantly different between the two polypeptides. The sequences between amino acids 2 and 9 and between 264 and 274 are examples. Three peptides (gamma 2-9, gamma 264-274, and beta 264-274) corresponding to these sequences were synthesized. Antibodies against these peptides were raised in rabbits and purified with corresponding peptide-Sepharose affinity chromatography. Western blot analysis of polyacrylamide gel electrophoresis of bean nodule proteins demonstrated that the anti-beta 264-274 antibodies reacted specifically with the beta polypeptide and the anti-gamma 264-274 and anti-gamma 2-9 antibodies reacted specifically with the gamma polypeptide of the native and denatured glutamine synthetase. These results showed the feasibility of using synthetic peptides in developing antibodies that are capable of distinguishing proteins with similar primary structures.

  19. Primary structure and cellular localization of chicken brain myosin-V (p190), an unconventional myosin with calmodulin light chains

    PubMed Central

    1992-01-01

    Recent biochemical studies of p190, a calmodulin (CM)-binding protein purified from vertebrate brain, have demonstrated that this protein, purified as a complex with bound CM, shares a number of properties with myosins (Espindola, F. S., E. M. Espreafico, M. V. Coelho, A. R. Martins, F. R. C. Costa, M. S. Mooseker, and R. E. Larson. 1992. J. Cell Biol. 118:359-368). To determine whether or not p190 was a member of the myosin family of proteins, a set of overlapping cDNAs encoding the full-length protein sequence of chicken brain p190 was isolated and sequenced. Verification that the deduced primary structure was that of p190 was demonstrated through microsequence analysis of a cyanogen bromide peptide generated from chick brain p190. The deduced primary structure of chicken brain p190 revealed that this 1,830-amino acid (aa) 212,509-D) protein is a member of a novel structural class of unconventional myosins that includes the gene products encoded by the dilute locus of mouse and the MYO2 gene of Saccharomyces cerevisiae. We have named the p190-CM complex "myosin-V" based on the results of a detailed sequence comparison of the head domains of 29 myosin heavy chains (hc), which has revealed that this myosin, based on head structure, is the fifth of six distinct structural classes of myosin to be described thus far. Like the presumed products of the mouse dilute and yeast MYO2 genes, the head domain of chicken myosin-V hc (aa 1-764) is linked to a "neck" domain (aa 765-909) consisting of six tandem repeats of an approximately 23-aa "IQ-motif." All known myosins contain at least one such motif at their head-tail junctions; these IQ-motifs may function as calmodulin or light chain binding sites. The tail domain of chicken myosin-V consists of an initial 511 aa predicted to form several segments of coiled-coil alpha helix followed by a terminal 410-aa globular domain (aa, 1,421-1,830). Interestingly, a portion of the tail domain (aa, 1,094-1,830) shares 58% amino acid sequence identity with a 723-aa protein from mouse brain reported to be a glutamic acid decarboxylase. The neck region of chicken myosin-V, which contains the IQ-motifs, was demonstrated to contain the binding sites for CM by analyzing CM binding to bacterially expressed fusion proteins containing the head, neck, and tail domains. Immunolocalization of myosin-V in brain and in cultured cells revealed an unusual distribution for this myosin in both neurons and nonneuronal cells.(ABSTRACT TRUNCATED AT 400 WORDS) PMID:1469047

  20. Ostertagia circumcincta: isolation of a partial cDNA encoding an unusual member of the mitochondrial processing peptidase subfamily of M16 metallopeptidases.

    PubMed

    Walker, J; Tait, A

    1997-11-01

    A reverse-transcriptase polymerase chain reaction (PCR) procedure was used to isolate an Ostertagia circumcincta partial cDNA encoding a protein with general primary sequence features characteristic of members of the mitochondrial processing peptidase (MPP) subfamily of M16 metallopeptidases. The structural relationships of the predicted protein (Oc MPPX) with MPP subfamily proteins from other species (including the model free-living nematode Caenorhabditis elegans) were examined, and Northern analysis confirmed the expression of the Oc mppx gene in adult nematodes.

  1. Protein sequence analysis, cloning, and expression of flammutoxin, a pore-forming cytolysin from Flammulina velutipes. Maturation of dimeric precursor to monomeric active form by carboxyl-terminal truncation.

    PubMed

    Tomita, Toshio; Mizumachi, Yoshihiro; Chong, Kang; Ogawa, Kanako; Konishi, Norihide; Sugawara-Tomita, Noriko; Dohmae, Naoshi; Hashimoto, Yohichi; Takio, Koji

    2004-12-24

    Flammutoxin (FTX), a 31-kDa pore-forming cytolysin from Flammulina velutipes, is specifically expressed during the fruiting body formation. We cloned and expressed the cDNA encoding a 272-residue protein with an identical N-terminal sequence with that of FTX but failed to obtain hemolytically active protein. This, together with the presence of multiple FTX family proteins in the mushroom, prompted us to determine the complete primary structure of FTX by protein sequence analysis. The N-terminal 72 and C-terminal 107 residues were sequenced by Edman degradation of the fragments generated from the alkylated FTX by enzymatic digestions with Achromobacter protease I or Staphylococcus aureus V8 protease and by chemical cleavages with CNBr, hydroxylamine, or 1% formic acid. The central part of FTX was sequenced with a surface-adhesive 7-kDa fragment, which was generated by a tryptic digestion of FTX and recovered by rinsing the wall of a test tube with 6 M guanidine HCl. The 7-kDa peptide was cleaved with 12 M HCl, thermolysin, or S. aureus V8 protease to produce smaller peptides for sequence analysis. As a result, FTX consisted of 251 residues, and protein and nucleotide sequences were in accord except for the lack of the initial Met and the C-terminal 20 residues in protein. Recombinant FTX (rFTX) with or without the C-terminal 20 residues (rFTX271 or rFTX251, respectively) was prepared to study the maturation process of FTX. Like natural FTX, rFTX251 existed as a monomer in solution and assembled into an SDS-stable, ring-shaped pore complex on human erythrocytes, causing hemolysis. In contrast, rFTX271, existing as a dimer in solution, bound to the cells but failed to form pore complex. The dimeric rFTX271 was converted to hemolytically active monomers upon the cleavage between Lys(251) and Met(252) by trypsin.

  2. SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.

    PubMed

    Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2018-05-25

    Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.

  3. Characterization of Proteoforms with Unknown Post-translational Modifications Using the MIScore

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kou, Qiang; Zhu, Binhai; Wu, Si

    Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs comparedmore » with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform-spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.« less

  4. Structure and genetic variability of envelope glycoproteins of two antigenic variants of caprine arthritis-encephalitis lentivirus.

    PubMed

    Knowles, D P; Cheevers, W P; McGuire, T C; Brassfield, A L; Harwood, W G; Stem, T A

    1991-11-01

    To define the structure of the caprine arthritis-encephalitis virus (CAEV) env gene and characterize genetic changes which occur during antigenic variation, we sequenced the env genes of CAEV-63 and CAEV-Co, two antigenic variants of CAEV defined by serum neutralization. The deduced primary translation product of the CAEV env gene consists of a 60- to 80-amino-acid signal peptide followed by an amino-terminal surface protein (SU) and a carboxy-terminal transmembrane protein (TM) separated by an Arg-Lys-Lys-Arg cleavage site. The signal peptide cleavage site was verified by amino-terminal amino acid sequencing of native CAEV-63 SU. In addition, immunoprecipitation of [35S]methionine-labeled CAEV-63 proteins by sera from goats immunized with recombinant vaccinia virus expressing the CAEV-63 env gene confirmed that antibodies induced by env-encoded recombinant proteins react specifically with native virion SU and TM. The env genes of CAEV-63 and CAEV-Co encode 28 conserved cysteines and 25 conserved potential N-linked glycosylation sites. Nucleotide sequence variability results in 62 amino acid changes and one deletion within the SU and 34 amino acid changes within the TM.

  5. Structure and genetic variability of envelope glycoproteins of two antigenic variants of caprine arthritis-encephalitis lentivirus.

    PubMed Central

    Knowles, D P; Cheevers, W P; McGuire, T C; Brassfield, A L; Harwood, W G; Stem, T A

    1991-01-01

    To define the structure of the caprine arthritis-encephalitis virus (CAEV) env gene and characterize genetic changes which occur during antigenic variation, we sequenced the env genes of CAEV-63 and CAEV-Co, two antigenic variants of CAEV defined by serum neutralization. The deduced primary translation product of the CAEV env gene consists of a 60- to 80-amino-acid signal peptide followed by an amino-terminal surface protein (SU) and a carboxy-terminal transmembrane protein (TM) separated by an Arg-Lys-Lys-Arg cleavage site. The signal peptide cleavage site was verified by amino-terminal amino acid sequencing of native CAEV-63 SU. In addition, immunoprecipitation of [35S]methionine-labeled CAEV-63 proteins by sera from goats immunized with recombinant vaccinia virus expressing the CAEV-63 env gene confirmed that antibodies induced by env-encoded recombinant proteins react specifically with native virion SU and TM. The env genes of CAEV-63 and CAEV-Co encode 28 conserved cysteines and 25 conserved potential N-linked glycosylation sites. Nucleotide sequence variability results in 62 amino acid changes and one deletion within the SU and 34 amino acid changes within the TM. Images PMID:1656067

  6. The cochaperone shutdown defines a group of biogenesis factors essential for all piRNA populations in Drosophila.

    PubMed

    Olivieri, Daniel; Senti, Kirsten-André; Subramanian, Sailakshmi; Sachidanandam, Ravi; Brennecke, Julius

    2012-09-28

    In animal gonads, PIWI proteins and their bound 23-30 nt piRNAs guard genome integrity by the sequence specific silencing of transposons. Two branches of piRNA biogenesis, namely primary processing and ping-pong amplification, have been proposed. Despite an overall conceptual understanding of piRNA biogenesis, identity and/or function of the involved players are largely unknown. Here, we demonstrate an essential role for the female sterility gene shutdown in piRNA biology. Shutdown, an evolutionarily conserved cochaperone collaborates with Hsp90 during piRNA biogenesis, potentially at the loading step of RNAs into PIWI proteins. We demonstrate that Shutdown is essential for both primary and secondary piRNA populations in Drosophila. An extension of our study to previously described piRNA pathway members revealed three distinct groups of biogenesis factors. Together with data on how PIWI proteins are wired into primary and secondary processing, we propose a unified model for piRNA biogenesis. Copyright © 2012 Elsevier Inc. All rights reserved.

  7. An Autonomous BMP2 Regulatory Element in Mesenchymal Cells

    PubMed Central

    Kruithof, Boudewijn P.T.; Fritz, David T.; Liu, Yijun; Garsetti, Diane E.; Frank, David B.; Pregizer, Steven K.; Gaussin, Vinciane; Mortlock, Douglas P.; Rogers, Melissa B.

    2014-01-01

    BMP2 is a morphogen that controls mesenchymal cell differentiation and behavior. For example, BMP2 concentration controls the differentiation of mesenchymal precursors into myocytes, adipocytes, chondrocytes, and osteoblasts. Sequences within the 3′untranslated region (UTR) of the Bmp2 mRNA mediate a post-transcriptional block of protein synthesis. Interaction of cell and developmental stage-specific trans-regulatory factors with the 3′UTR is a nimble and versatile mechanism for modulating this potent morphogen in different cell types. We show here, that an ultra-conserved sequence in the 3′UTR functions independently of promoter, coding region, and 3′UTR context in primary and immortalized tissue culture cells and in transgenic mice. Our findings indicate that the ultra-conserved sequence is an autonomously functioning post-transcriptional element that may be used to modulate the level of BMP2 and other proteins while retaining tissue specific regulatory elements. PMID:21268088

  8. Grizzly bear corticosteroid binding globulin: Cloning and serum protein expression.

    PubMed

    Chow, Brian A; Hamilton, Jason; Alsop, Derek; Cattet, Marc R L; Stenhouse, Gordon; Vijayan, Mathilakath M

    2010-06-01

    Serum corticosteroid levels are routinely measured as markers of stress in wild animals. However, corticosteroid levels rise rapidly in response to the acute stress of capture and restraint for sampling, limiting its use as an indicator of chronic stress. We hypothesized that serum corticosteroid binding globulin (CBG), the primary transport protein for corticosteroids in circulation, may be a better marker of the stress status prior to capture in grizzly bears (Ursus arctos). To test this, a full-length CBG cDNA was cloned and sequenced from grizzly bear testis and polyclonal antibodies were generated for detection of this protein in bear sera. The deduced nucleotide and protein sequences were 1218 bp and 405 amino acids, respectively. Multiple sequence alignments showed that grizzly bear CBG (gbCBG) was 90% and 83% identical to the dog CBG nucleotide and amino acid sequences, respectively. The affinity purified rabbit gbCBG antiserum detected grizzly bear but not human CBG. There were no sex differences in serum total cortisol concentration, while CBG expression was significantly higher in adult females compared to males. Serum cortisol levels were significantly higher in bears captured by leg-hold snare compared to those captured by remote drug delivery from helicopter. However, serum CBG expression between these two groups did not differ significantly. Overall, serum CBG levels may be a better marker of chronic stress, especially because this protein is not modulated by the stress of capture and restraint in grizzly bears. Copyright 2010 Elsevier Inc. All rights reserved.

  9. PRIMARY STRUCTURE OF THE P450 LANOSTEROL DEMETHYLASE GENE FROM SACCHAROMYCES CEREVISIAE

    EPA Science Inventory

    We have sequenced the structural gene and flanking regions for lanosterol 14 alpha-demethylase (14DM) from Saccharomyces cerevisiae. An open reading frame of 530 codons encodes a 60.7-kDa protein. When this gene is disrupted by integrative transformation, the resulting strain req...

  10. Massively parallel sequencing analysis of mucinous ovarian carcinomas: genomic profiling and differential diagnoses.

    PubMed

    Mueller, Jennifer J; Schlappe, Brooke A; Kumar, Rahul; Olvera, Narciso; Dao, Fanny; Abu-Rustum, Nadeem; Aghajanian, Carol; DeLair, Deborah; Hussein, Yaser R; Soslow, Robert A; Levine, Douglas A; Weigelt, Britta

    2018-05-21

    Mucinous ovarian cancer (MOC) is a rare type of epithelial ovarian cancer resistant to standard chemotherapy regimens. We sought to characterize the repertoire of somatic mutations in MOCs and to define the contribution of massively parallel sequencing to the classification of tumors diagnosed as primary MOCs. Following gynecologic pathology and chart review, DNA samples obtained from primary MOCs and matched normal tissues/blood were subjected to whole-exome (n = 9) or massively parallel sequencing targeting 341 cancer genes (n = 15). Immunohistochemical analysis of estrogen receptor, progesterone receptor, PTEN, ARID1A/BAF250a, and the DNA mismatch (MMR) proteins MSH6 and PMS2 was performed for all cases. Mutational frequencies of MOCs were compared to those of high-grade serous ovarian cancers (HGSOCs) and mucinous tumors from other sites. MOCs were heterogeneous at the genetic level, frequently harboring TP53 (75%) mutations, KRAS (71%) mutations and/or CDKN2A/B homozygous deletions/mutations (33%). Although established criteria for diagnosis were employed, four cases harbored mutational and immunohistochemical profiles similar to those of endometrioid carcinomas, and one case for colorectal or endometrioid carcinoma. Significant differences in the frequencies of KRAS, TP53, CDKN2A, FBXW7, PIK3CA and/or APC mutations between the confirmed primary MOCs (n = 19) and HGSOCs, mucinous gastric and/or mucinous colorectal carcinomas were found, whereas no differences in the 341 genes studied between MOCs and mucinous pancreatic carcinomas were identified. Our findings suggest that the assessment of mutations affecting TP53, KRAS, PIK3CA, ARID1A and POLE, and DNA MMR protein expression may be used to further aid the diagnosis and treatment decision-making of primary MOC. Copyright © 2018 Elsevier Inc. All rights reserved.

  11. Tools to evaluate the conformation of protein products.

    PubMed

    Manta, Bruno; Obal, Gonzalo; Ricciardi, Alejandro; Pritsch, Otto; Denicola, Ana

    2011-06-01

    Production of recombinant proteins is a process intensively used in the research laboratory. In addition, the main biotechnology market products are recombinant proteins and monoclonal antibodies. The biological (and clinical) properties of the protein product strongly depend on the conformation of the polypeptide. Therefore, assessment of the correct conformation of the produced protein is crucial. There is no single method to assess every aspect of protein structure or function. Depending on the protein, the methods of choice vary. There are general methods to evaluate not only mass and primary sequence of the protein, but also higher-order structure. This review outlines the principal techniques for determining the conformation of a protein from structural (biophysical methods) to functional (in vitro binding assays) analyses. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Characterization of photosynthetic ferredoxin from the Antarctic alga Chlamydomonas sp. UWO241 reveals novel features of cold adaptation.

    PubMed

    Cvetkovska, Marina; Szyszka-Mroz, Beth; Possmayer, Marc; Pittock, Paula; Lajoie, Gilles; Smith, David R; Hüner, Norman P A

    2018-05-08

    The objective of this work was to characterize photosynthetic ferredoxin from the Antarctic green alga Chlamydomonas sp. UWO241, a key enzyme involved in distributing photosynthetic reducing power. We hypothesize that ferredoxin possesses characteristics typical of cold-adapted enzymes, namely increased structural flexibility and high activity at low temperatures, accompanied by low stability at moderate temperatures. To address this objective, we purified ferredoxin from UWO241 and characterized the temperature dependence of its enzymatic activity and protein conformation. The UWO241 ferredoxin protein, RNA, and DNA sequences were compared with homologous sequences from related organisms. We provide evidence for the duplication of the main ferredoxin gene in the UWO241 nuclear genome and the presence of two highly similar proteins. Ferredoxin from UWO241 has both high activity at low temperatures and high stability at moderate temperatures, representing a novel class of cold-adapted enzymes. Our study reveals novel insights into how photosynthesis functions in the cold. The presence of two distinct ferredoxin proteins in UWO241 could provide an adaptive advantage for survival at cold temperatures. The primary amino acid sequence of ferredoxin is highly conserved among photosynthetic species, and we suggest that subtle differences in sequence can lead to significant changes in activity at low temperatures. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.

  13. Expression of human electron transfer flavoprotein-ubiquinone oxidoreductase from a baculovirus vector: kinetic and spectral characterization of the human protein.

    PubMed

    Simkovic, Martin; Degala, Gregory D; Eaton, Sandra S; Frerman, Frank E

    2002-06-15

    Electron transfer flavoprotein-ubiquinone oxidoreductase (ETF-QO) is an iron-sulphur flavoprotein and a component of an electron-transfer system that links 10 different mitochondrial flavoprotein dehydrogenases to the mitochondrial bc1 complex via electron transfer flavoprotein (ETF) and ubiquinone. ETF-QO is an integral membrane protein, and the primary sequences of human and porcine ETF-QO were deduced from the sequences of the cloned cDNAs. We have expressed human ETF-QO in Sf9 insect cells using a baculovirus vector. The cDNA encoding the entire protein, including the mitochondrial targeting sequence, was present in the vector. We isolated a membrane-bound form of the enzyme that has a molecular mass identical with that of the mature porcine protein as determined by SDS/PAGE and has an N-terminal sequence that is identical with that predicted for the mature holoenzyme. These data suggest that the heterologously expressed ETF-QO is targeted to mitochondria and processed to the mature, catalytically active form. The detergent-solubilized protein was purified by ion-exchange and hydroxyapatite chromatography. Absorption and EPR spectroscopy and redox titrations are consistent with the presence of flavin and iron-sulphur centres that are very similar to those in the equivalent porcine and bovine proteins. Additionally, the redox potentials of the two prosthetic groups appear similar to those of the other eukaryotic ETF-QO proteins. The steady-state kinetic constants of human ETF-QO were determined with ubiquinone homologues, a ubiquinone analogue, and with human wild-type ETF and a Paracoccus-human chimaeric ETF as varied substrates. The results demonstrate that this expression system provides sufficient amounts of human ETF-QO to enable crystallization and mechanistic investigations of the iron-sulphur flavoprotein.

  14. Using the Tools and Resources of the RCSB Protein Data Bank.

    PubMed

    Costanzo, Luigi Di; Ghosh, Sutapa; Zardecki, Christine; Burley, Stephen K

    2016-09-07

    The Protein Data Bank (PDB) archive is the worldwide repository of experimentally determined three-dimensional structures of large biological molecules found in all three kingdoms of life. Atomic-level structures of these proteins, nucleic acids, and complex assemblies thereof are central to research and education in molecular, cellular, and organismal biology, biochemistry, biophysics, materials science, bioengineering, ecology, and medicine. Several types of information are associated with each PDB archival entry, including atomic coordinates, primary experimental data, polymer sequence(s), and summary metadata. The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) serves as the U.S. data center for the PDB, distributing archival data and supporting both simple and complex queries that return results. These data can be freely downloaded, analyzed, and visualized using RCSB PDB tools and resources to gain a deeper understanding of fundamental biological processes, molecular evolution, human health and disease, and drug discovery. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  15. A Novel Helicase-Type Protein in the Nucleolus: Protein NOH61

    PubMed Central

    Zirwes, Rudolf F.; Eilbracht, Jens; Kneissel, Sandra; Schmidt-Zachmann, Marion S.

    2000-01-01

    We report the identification, cDNA cloning, and molecular characterization of a novel, constitutive nucleolar protein. The cDNA-deduced amino acid sequence of the human protein defines a polypeptide of a calculated mass of 61.5 kDa and an isoelectric point of 9.9. Inspection of the primary sequence disclosed that the protein is a member of the family of “DEAD-box” proteins, representing a subgroup of putative ATP-dependent RNA helicases. ATPase activity of the recombinant protein is evident and stimulated by a variety of polynucleotides tested. Immunolocalization studies revealed that protein NOH61 (nucleolar helicase of 61 kDa) is highly conserved during evolution and shows a strong accumulation in nucleoli. Biochemical experiments have shown that protein NOH61 synthesized in vitro sediments with ∼11.5 S, i.e., apparently as homo-oligomeric structures. By contrast, sucrose gradient centrifugation analysis of cellular extracts obtained with buffers of elevated ionic strength (600 mM NaCl) revealed that the solubilized native protein sediments with ∼4 S, suggestive of the monomeric form. Interestingly, protein NOH61 has also been identified as a specific constituent of free nucleoplasmic 65S preribosomal particles but is absent from cytoplasmic ribosomes. Treatment of cultured cells with 1) the transcription inhibitor actinomycin D and 2) RNase A results in a complete dissociation of NOH61 from nucleolar structures. The specific intracellular localization and its striking sequence homology to other known RNA helicases lead to the hypothesis that protein NOH61 might be involved in ribosome synthesis, most likely during the assembly process of the large (60S) ribosomal subunit. PMID:10749921

  16. Human, vector and parasite Hsp90 proteins: A comparative bioinformatics analysis.

    PubMed

    Faya, Ngonidzashe; Penkler, David L; Tastan Bishop, Özlem

    2015-01-01

    The treatment of protozoan parasitic diseases is challenging, and thus identification and analysis of new drug targets is important. Parasites survive within host organisms, and some need intermediate hosts to complete their life cycle. Changing host environment puts stress on parasites, and often adaptation is accompanied by the expression of large amounts of heat shock proteins (Hsps). Among Hsps, Hsp90 proteins play an important role in stress environments. Yet, there has been little computational research on Hsp90 proteins to analyze them comparatively as potential parasitic drug targets. Here, an attempt was made to gain detailed insights into the differences between host, vector and parasitic Hsp90 proteins by large-scale bioinformatics analysis. A total of 104 Hsp90 sequences were divided into three groups based on their cellular localizations; namely cytosolic, mitochondrial and endoplasmic reticulum (ER). Further, the parasitic proteins were divided according to the type of parasite (protozoa, helminth and ectoparasite). Primary sequence analysis, phylogenetic tree calculations, motif analysis and physicochemical properties of Hsp90 proteins suggested that despite the overall structural conservation of these proteins, parasitic Hsp90 proteins have unique features which differentiate them from human ones, thus encouraging the idea that protozoan Hsp90 proteins should be further analyzed as potential drug targets.

  17. Taxonomic distribution and origins of the extended LHC (light-harvesting complex) antenna protein superfamily

    PubMed Central

    2010-01-01

    Background The extended light-harvesting complex (LHC) protein superfamily is a centerpiece of eukaryotic photosynthesis, comprising the LHC family and several families involved in photoprotection, like the LHC-like and the photosystem II subunit S (PSBS). The evolution of this complex superfamily has long remained elusive, partially due to previously missing families. Results In this study we present a meticulous search for LHC-like sequences in public genome and expressed sequence tag databases covering twelve representative photosynthetic eukaryotes from the three primary lineages of plants (Plantae): glaucophytes, red algae and green plants (Viridiplantae). By introducing a coherent classification of the different protein families based on both, hidden Markov model analyses and structural predictions, numerous new LHC-like sequences were identified and several new families were described, including the red lineage chlorophyll a/b-binding-like protein (RedCAP) family from red algae and diatoms. The test of alternative topologies of sequences of the highly conserved chlorophyll-binding core structure of LHC and PSBS proteins significantly supports the independent origins of LHC and PSBS families via two unrelated internal gene duplication events. This result was confirmed by the application of cluster likelihood mapping. Conclusions The independent evolution of LHC and PSBS families is supported by strong phylogenetic evidence. In addition, a possible origin of LHC and PSBS families from different homologous members of the stress-enhanced protein subfamily, a diverse and anciently paralogous group of two-helix proteins, seems likely. The new hypothesis for the evolution of the extended LHC protein superfamily proposed here is in agreement with the character evolution analysis that incorporates the distribution of families and subfamilies across taxonomic lineages. Intriguingly, stress-enhanced proteins, which are universally found in the genomes of green plants, red algae, glaucophytes and in diatoms with complex plastids, could represent an important and previously missing link in the evolution of the extended LHC protein superfamily. PMID:20673336

  18. High Epstein-Barr Virus Load and Genomic Diversity Are Associated with Generation of gp350-Specific Neutralizing Antibodies following Acute Infectious Mononucleosis

    PubMed Central

    Weiss, Eric R.; Alter, Galit; Ogembo, Javier Gordon; Henderson, Jennifer L.; Tabak, Barbara; Bakiş, Yasin; Somasundaran, Mohan; Garber, Manuel; Selin, Liisa

    2016-01-01

    ABSTRACT The Epstein-Barr virus (EBV) gp350 glycoprotein interacts with the cellular receptor to mediate viral entry and is thought to be the major target for neutralizing antibodies. To better understand the role of EBV-specific antibodies in the control of viral replication and the evolution of sequence diversity, we measured EBV gp350-specific antibody responses and sequenced the gp350 gene in samples obtained from individuals experiencing primary EBV infection (acute infectious mononucleosis [AIM]) and again 6 months later (during convalescence [CONV]). EBV gp350-specific IgG was detected in the sera of 17 (71%) of 24 individuals at the time of AIM and all 24 (100%) individuals during CONV; binding antibody titers increased from AIM through CONV, reaching levels equivalent to those in age-matched, chronically infected individuals. Antibody-dependent cell-mediated phagocytosis (ADCP) was rarely detected during AIM (4 of 24 individuals; 17%) but was commonly detected during CONV (19 of 24 individuals; 79%). The majority (83%) of samples taken during AIM neutralized infection of primary B cells; all samples obtained at 6 months postdiagnosis neutralized EBV infection of cultured and primary target cells. Deep sequencing revealed interpatient gp350 sequence variation but conservation of the CR2-binding site. The levels of gp350-specific neutralizing activity directly correlated with higher peripheral blood EBV DNA levels during AIM and a greater evolution of diversity in gp350 nucleotide sequences from AIM to CONV. In summary, we conclude that the viral load and EBV gp350 diversity during early infection are associated with the development of neutralizing antibody responses following AIM. IMPORTANCE Antibodies against viral surface proteins can blunt the spread of viral infection by coating viral particles, mediating uptake by immune cells, or blocking interaction with host cell receptors, making them a desirable component of a sterilizing vaccine. The EBV surface protein gp350 is a major target for antibodies. We report the detection of EBV gp350-specific antibodies capable of neutralizing EBV infection in vitro. The majority of gp350-directed vaccines focus on glycoproteins from lab-adapted strains, which may poorly reflect primary viral envelope diversity. We report some of the first primary gp350 sequences, noting that the gp350 host receptor binding site is remarkably stable across patients and time. However, changes in overall gene diversity were detectable during infection. Patients with higher peripheral blood viral loads in primary infection and greater changes in viral diversity generated more efficient antibodies. Our findings provide insight into the generation of functional antibodies, necessary for vaccine development. PMID:27733645

  19. High Epstein-Barr Virus Load and Genomic Diversity Are Associated with Generation of gp350-Specific Neutralizing Antibodies following Acute Infectious Mononucleosis.

    PubMed

    Weiss, Eric R; Alter, Galit; Ogembo, Javier Gordon; Henderson, Jennifer L; Tabak, Barbara; Bakiş, Yasin; Somasundaran, Mohan; Garber, Manuel; Selin, Liisa; Luzuriaga, Katherine

    2017-01-01

    The Epstein-Barr virus (EBV) gp350 glycoprotein interacts with the cellular receptor to mediate viral entry and is thought to be the major target for neutralizing antibodies. To better understand the role of EBV-specific antibodies in the control of viral replication and the evolution of sequence diversity, we measured EBV gp350-specific antibody responses and sequenced the gp350 gene in samples obtained from individuals experiencing primary EBV infection (acute infectious mononucleosis [AIM]) and again 6 months later (during convalescence [CONV]). EBV gp350-specific IgG was detected in the sera of 17 (71%) of 24 individuals at the time of AIM and all 24 (100%) individuals during CONV; binding antibody titers increased from AIM through CONV, reaching levels equivalent to those in age-matched, chronically infected individuals. Antibody-dependent cell-mediated phagocytosis (ADCP) was rarely detected during AIM (4 of 24 individuals; 17%) but was commonly detected during CONV (19 of 24 individuals; 79%). The majority (83%) of samples taken during AIM neutralized infection of primary B cells; all samples obtained at 6 months postdiagnosis neutralized EBV infection of cultured and primary target cells. Deep sequencing revealed interpatient gp350 sequence variation but conservation of the CR2-binding site. The levels of gp350-specific neutralizing activity directly correlated with higher peripheral blood EBV DNA levels during AIM and a greater evolution of diversity in gp350 nucleotide sequences from AIM to CONV. In summary, we conclude that the viral load and EBV gp350 diversity during early infection are associated with the development of neutralizing antibody responses following AIM. Antibodies against viral surface proteins can blunt the spread of viral infection by coating viral particles, mediating uptake by immune cells, or blocking interaction with host cell receptors, making them a desirable component of a sterilizing vaccine. The EBV surface protein gp350 is a major target for antibodies. We report the detection of EBV gp350-specific antibodies capable of neutralizing EBV infection in vitro The majority of gp350-directed vaccines focus on glycoproteins from lab-adapted strains, which may poorly reflect primary viral envelope diversity. We report some of the first primary gp350 sequences, noting that the gp350 host receptor binding site is remarkably stable across patients and time. However, changes in overall gene diversity were detectable during infection. Patients with higher peripheral blood viral loads in primary infection and greater changes in viral diversity generated more efficient antibodies. Our findings provide insight into the generation of functional antibodies, necessary for vaccine development. Copyright © 2016 American Society for Microbiology.

  20. Adipogenic Effects and Gene Expression Profiling of Firemaster® 550 Components in Human Primary Preadipocytes

    PubMed Central

    Tung, Emily W.Y.; Peshdary, Vian; Gagné, Remi; Rowan-Carroll, Andrea; Yauk, Carole L.; Boudreau, Adéle

    2017-01-01

    Background: Exposure to flame retardants has been associated with negative health outcomes including metabolic effects. As polybrominated diphenyl ether flame retardants were pulled from commerce, human exposure to new flame retardants such as Firemaster® 550 (FM550) has increased. Although previous studies in murine systems have shown that FM550 and its main components increase adipogenesis, the effects of FM550 in human models have not been elucidated. Objectives: The objectives of this study were to determine if FM550 and its components are active in human preadipocytes, and to further investigate their mode of action. Methods: Human primary preadipocytes were differentiated in the presence of FM550 and its components. Differentiation was assessed by lipid accumulation and expression of peroxisome proliferator-activated receptor γ (PPARG), fatty acid binding protein (FABP) 4 and lipoprotein lipase (LPL). mRNA was collected for Poly (A) RNA sequencing and was used to identify differentially expressed genes (DEGs). Functional analysis of DEGs was undertaken in Ingenuity Pathway Analysis. Results: FM550 triphenyl phosphate (TPP) and isopropylated triphenyl phosphates (IPTP), increased adipogenesis in human primary preadipocytes as assessed by lipid accumulation and mRNA expression of regulators of adipogenesis such as PPARγ, CCAAT enhancer binding protein (C/EBP) α and sterol regulatory element binding protein (SREBP) 1 as well as the adipogenic markers FABP4 LPL and perilipin. Poly (A) RNA sequencing analysis revealed potential modes of action including liver X receptor/retinoid X receptor (LXR/RXR) activation, thyroid receptor (TR)/RXR, protein kinase A, and nuclear receptor subfamily 1 group H members activation. Conclusions: We found that FM550, and two of its components, induced adipogenesis in human primary preadipocytes. Further, using global gene expression analysis we showed that both TPP and IPTP likely exert their effects through PPARG to induce adipogenesis. In addition, IPTP perturbed signaling pathways that were not affected by TPP. https://doi.org/10.1289/EHP1318 PMID:28934090

  1. Weighted gene co-expression network analysis of colorectal cancer liver metastasis genome sequencing data and screening of anti-metastasis drugs.

    PubMed

    Gao, Bo; Shao, Qin; Choudhry, Hani; Marcus, Victoria; Dong, Kung; Ragoussis, Jiannis; Gao, Zu-Hua

    2016-09-01

    Approximately 9% of cancer-related deaths are caused by colorectal cancer (CRC). CRC patients are prone to liver metastasis, which is the most important cause for the high CRC mortality rate. Understanding the molecular mechanism of CRC liver metastasis could help us to find novel targets for the effective treatment of this deadly disease. Using weighted gene co-expression network analysis on the sequencing data of CRC with and with metastasis, we identified 5 colorectal cancer liver metastasis related modules which were labeled as brown, blue, grey, yellow and turquoise. In the brown module, which represents the metastatic tumor in the liver, gene ontology (GO) analysis revealed functions including the G-protein coupled receptor protein signaling pathway, epithelial cell differentiation and cell surface receptor linked signal transduction. In the blue module, which represents the primary CRC that has metastasized, GO analysis showed that the genes were mainly enriched in GO terms including G-protein coupled receptor protein signaling pathway, cell surface receptor linked signal transduction, and negative regulation of cell differentiation. In the yellow and turquoise modules, which represent the primary non-metastatic CRC, 13 downregulated CRC liver metastasis-related candidate miRNAs were identified (e.g. hsa-miR-204, hsa-miR-455, etc.). Furthermore, analyzing the DrugBank database and mining the literature identified 25 and 12 candidate drugs that could potentially block the metastatic processes of the primary tumor and inhibit the progression of metastatic tumors in the liver, respectively. Data generated from this study not only furthers our understanding of the genetic alterations that drive the metastatic process, but also guides the development of molecular-targeted therapy of colorectal cancer liver metastasis.

  2. The Conformational Stability and Biophysical Properties of the Eukaryotic Thioredoxins of Pisum Sativum Are Not Family-Conserved

    PubMed Central

    Aguado-Llera, David; Martínez-Gómez, Ana Isabel; Prieto, Jesús; Marenchino, Marco; Traverso, José Angel; Gómez, Javier; Chueca, Ana; Neira, José L.

    2011-01-01

    Thioredoxins (TRXs) are ubiquitous proteins involved in redox processes. About forty genes encode TRX or TRX-related proteins in plants, grouped in different families according to their subcellular localization. For instance, the h-type TRXs are located in cytoplasm or mitochondria, whereas f-type TRXs have a plastidial origin, although both types of proteins have an eukaryotic origin as opposed to other TRXs. Herein, we study the conformational and the biophysical features of TRXh1, TRXh2 and TRXf from Pisum sativum. The modelled structures of the three proteins show the well-known TRX fold. While sharing similar pH-denaturations features, the chemical and thermal stabilities are different, being PsTRXh1 (Pisum sativum thioredoxin h1) the most stable isoform; moreover, the three proteins follow a three-state denaturation model, during the chemical-denaturations. These differences in the thermal- and chemical-denaturations result from changes, in a broad sense, of the several ASAs (accessible surface areas) of the proteins. Thus, although a strong relationship can be found between the primary amino acid sequence and the structure among TRXs, that between the residue sequence and the conformational stability and biophysical properties is not. We discuss how these differences in the biophysical properties of TRXs determine their unique functions in pea, and we show how residues involved in the biophysical features described (pH-titrations, dimerizations and chemical-denaturations) belong to regions involved in interaction with other proteins. Our results suggest that the sequence demands of protein-protein function are relatively rigid, with different protein-binding pockets (some in common) for each of the three proteins, but the demands of structure and conformational stability per se (as long as there is a maintained core), are less so. PMID:21364950

  3. NemaPath: online exploration of KEGG-based metabolic pathways for nematodes

    PubMed Central

    Wylie, Todd; Martin, John; Abubucker, Sahar; Yin, Yong; Messina, David; Wang, Zhengyuan; McCarter, James P; Mitreva, Makedonka

    2008-01-01

    Background Nematode.net is a web-accessible resource for investigating gene sequences from parasitic and free-living nematode genomes. Beyond the well-characterized model nematode C. elegans, over 500,000 expressed sequence tags (ESTs) and nearly 600,000 genome survey sequences (GSSs) have been generated from 36 nematode species as part of the Parasitic Nematode Genomics Program undertaken by the Genome Center at Washington University School of Medicine. However, these sequencing data are not present in most publicly available protein databases, which only include sequences in Swiss-Prot. Swiss-Prot, in turn, relies on GenBank/Embl/DDJP for predicted proteins from complete genomes or full-length proteins. Description Here we present the NemaPath pathway server, a web-based pathway-level visualization tool for navigating putative metabolic pathways for over 30 nematode species, including 27 parasites. The NemaPath approach consists of two parts: 1) a backend tool to align and evaluate nematode genomic sequences (curated EST contigs) against the annotated Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database; 2) a web viewing application that displays annotated KEGG pathway maps based on desired confidence levels of primary sequence similarity as defined by a user. NemaPath also provides cross-referenced access to nematode genome information provided by other tools available on Nematode.net, including: detailed NemaGene EST cluster information; putative translations; GBrowse EST cluster views; links from nematode data to external databases for corresponding synonymous C. elegans counterparts, subject matches in KEGG's gene database, and also KEGG Ontology (KO) identification. Conclusion The NemaPath server hosts metabolic pathway mappings for 30 nematode species and is available on the World Wide Web at . The nematode source sequences used for the metabolic pathway mappings are available via FTP , as provided by the Genome Center at Washington University School of Medicine. PMID:18983679

  4. Prunus necrotic ringspot ilarvirus: nucleotide sequence of RNA3 and the relationship to other ilarviruses based on coat protein comparison.

    PubMed

    Guo, D; Maiss, E; Adam, G; Casper, R

    1995-05-01

    The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.

  5. A stoichiometry driven universal spatial organization of backbones of folded proteins: are there Chargaff's rules for protein folding?

    PubMed

    Mittal, A; Jayaram, B; Shenoy, Sandhya; Bawa, Tejdeep Singh

    2010-10-01

    Protein folding is at least a six decade old problem, since the times of Pauling and Anfinsen. However, rules of protein folding remain elusive till date. In this work, rigorous analyses of several thousand crystal structures of folded proteins reveal a surprisingly simple unifying principle of backbone organization in protein folding. We find that protein folding is a direct consequence of a narrow band of stoichiometric occurrences of amino-acids in primary sequences, regardless of the size and the fold of a protein. We observe that "preferential interactions" between amino-acids do not drive protein folding, contrary to all prevalent views. We dedicate our discovery to the seminal contribution of Chargaff which was one of the major keys to elucidation of the stoichiometry-driven spatially organized double helical structure of DNA.

  6. Characterization of the Eimeria maxima sporozoite surface protein IMP1.

    PubMed

    Jenkins, M C; Fetterer, R; Miska, K; Tuo, W; Kwok, O; Dubey, J P

    2015-07-30

    The purpose of this study was to characterize Eimeria maxima immune-mapped protein 1 (IMP1) that is hypothesized to play a role in eliciting protective immunity against E. maxima infection in chickens. RT-PCR analysis of RNA from unsporulated and sporulating E. maxima oocysts revealed highest transcription levels at 6-12h of sporulation with a considerable downregulation thereafter. Alignment of IMP1 coding sequence from Houghton, Weybridge, and APU-1 strains of E. maxima revealed single nucleotide polymorphisms that in some instances led to amino acid changes in the encoded protein sequence. The E. maxima (APU-1) IMP1 cDNA sequence was cloned and expressed in 2 different polyHis Escherichia coli expression vectors. Regardless of expression vector, recombinant E. maxima IMP1 (rEmaxIMP1) was fairly unstable in non-denaturing buffer, which is consistent with stability analysis of the primary amino acid sequence. Antisera specific for rEmaxIMP1 identified a single 72 kDa protein or a 61 kDa protein by non-reducing or reducing SDS-PAGE/immunoblotting. Immunofluorescence staining with anti-rEmaxIMP1, revealed intense surface staining of E. maxima sporozoites, with negligible staining of merozoite stages. Immuno-histochemical staining of E. maxima-infected chicken intestinal tissue revealed staining of E. maxima developmental stages in the lamnia propia and crypts at both 24 and 48 h post-infection, and negligible staining thereafter. The expression of IMP1 during early stages of in vivo development and its location on the sporozoite surface may explain in part the immunoprotective effect of this protein against E. maxima infection. Published by Elsevier B.V.

  7. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins

    NASA Astrophysics Data System (ADS)

    Firman, Taylor; Ghosh, Kingshuk

    2018-03-01

    We present an analytical theory to compute conformations of heteropolymers—applicable to describe disordered proteins—as a function of temperature and charge sequence. The theory describes coil-globule transition for a given protein sequence when temperature is varied and has been benchmarked against the all-atom Monte Carlo simulation (using CAMPARI) of intrinsically disordered proteins (IDPs). In addition, the model quantitatively shows how subtle alterations of charge placement in the primary sequence—while maintaining the same charge composition—can lead to significant changes in conformation, even as drastic as a coil (swelled above a purely random coil) to globule (collapsed below a random coil) and vice versa. The theory provides insights on how to control (enhance or suppress) these changes by tuning the temperature (or solution condition) and charge decoration. As an application, we predict the distribution of conformations (at room temperature) of all naturally occurring IDPs in the DisProt database and notice significant size variation even among IDPs with a similar composition of positive and negative charges. Based on this, we provide a new diagram-of-states delineating the sequence-conformation relation for proteins in the DisProt database. Next, we study the effect of post-translational modification, e.g., phosphorylation, on IDP conformations. Modifications as little as two-site phosphorylation can significantly alter the size of an IDP with everything else being constant (temperature, salt concentration, etc.). However, not all possible modification sites have the same effect on protein conformations; there are certain "hot spots" that can cause maximal change in conformation. The location of these "hot spots" in the parent sequence can readily be identified by using a sequence charge decoration metric originally introduced by Sawle and Ghosh. The ability of our model to predict conformations (both expanded and collapsed states) of IDPs at a high-throughput level can provide valuable insights into the different mechanisms by which phosphorylation/charge mutation controls IDP function.

  8. Processing of the 5'-UTR and existence of protein factors that regulate translation of tobacco chloroplast psbN mRNA.

    PubMed

    Kuroda, Hiroshi; Sugiura, Masahiro

    2014-12-01

    The chloroplast psbB operon includes five genes encoding photosystem II and cytochrome b 6 /f complex components. The psbN gene is located on the opposite strand. PsbN is localized in the thylakoid and is present even in the dark, although its level increases upon illumination and then decreases. However, the translation mechanism of the psbN mRNA remains unclear. Using an in vitro translation system from tobacco chloroplasts and a green fluorescent protein as a reporter protein, we show that translation occurs from a tobacco primary psbN 5'-UTR of 47 nucleotides (nt). Unlike many other chloroplast 5'-UTRs, the psbN 5'-UTR has two processing sites, at -39 and -24 upstream from the initiation site. Processing at -39 enhanced the translation rate fivefold. In contrast, processing at -24 did not affect the translation rate. These observations suggest that the two distinct processing events regulate, at least in part, the level of PsbN during development. The psbN 5'-UTR has no Shine-Dalgarno (SD)-like sequence. In vitro translation assays with excess amounts of the psbN 5'-UTR or with deleted psbN 5'-UTR sequences demonstrated that protein factors are required for translation and that their binding site is an 18 nt sequence in the 5'-UTR. Mobility shift assays using 10 other chloroplast 5'-UTRs suggested that common or similar proteins are involved in translation of a set of mRNAs lacking SD-like sequences.

  9. A deep learning framework for improving long-range residue-residue contact prediction using a hierarchical strategy.

    PubMed

    Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng

    2017-09-01

    Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. The molecular mechanism of the termination of insect diapause, part 1: A timer protein, TIME-EA4, in the diapause eggs of the silkworm Bombyx mori is a metallo-glycoprotein.

    PubMed

    Isobe, Minoru; Kai, Hidenori; Kurahashi, Takuya; Suwan, Sathorn; Pitchayawasin-Thapphasaraphong, Suthasinee; Franz, Thomas; Tani, Naoki; Higashi, Kenichiro; Nishida, Hideo

    2006-10-01

    TIME-EA4 is an ATPase that measures time intervals, functioning as a diapause duration clock in diapause eggs of the silkworm, Bombyx mori. Characterization of the primary and higher structures of the TIME-EA4 would be desirable to clarify the mechanism by which the protein measures the time intervals. In our current studies, the whole sequence of TIME-EA4 has been established as that of a metallo-glycoprotein by combinational means involving peptide sequence analysis, nano-HPLC-ESI-Q-TOF-MS and MS/MS, and cDNA dictation. The amino acid sequence of TIME-EA4 showed 46-55 % homology with the reported proteins of the Cu,Zn-SOD (superoxide dismutase) family; in particular, the SOD active site (core domain) includes metal-binding amino acid ligands and a disulfide bond, and these structures are completely identical in Bombyx SOD, bovine SOD, and TIME-EA4 proteins. We found, however, that TIME-EA4 contains one more copper ion than other SODs, as was proven under neutral nondenaturing conditions. ESI mass spectrometry revealed that the timer function was not in the SOD core domain. In addition, TIME-EA4 has an attached sugar chain, which is indispensable to its functioning as a timer protein.

  11. Venom characterization of the Amazonian scorpion Tityus metuendus.

    PubMed

    Batista, C V F; Martins, J G; Restano-Cassulini, R; Coronas, F I V; Zamudio, F Z; Procópio, R; Possani, L D

    2018-03-01

    The soluble venom from the scorpion Tityus metuendus was characterized by various methods. In vivo experiments with mice showed that it is lethal. Extended electrophysiological recordings using seven sub-types of human voltage gated sodium channels (hNav1.1 to 1.7) showed that it contains both α- and β-scorpion toxin types. Fingerprint analysis by mass spectrometry identified over 200 distinct molecular mass components. At least 60 sub-fractions were recovered from HPLC separation. Five purified peptides were sequenced by Edman degradation, and their complete primary structures were determined. Additionally, three other peptides have had their N-terminal amino acid sequences determined by Edman degradation and reported. Mass spectrometry analysis of tryptic digestion of the soluble venom permitted the identification of the amino acid sequence of 111 different peptides. Search for similarities of the sequences found indicated that they probably are: sodium and potassium channel toxins, metalloproteinases, hyaluronidases, endothelin and angiotensin-converting enzymes, bradykinin-potentiating peptide, hypothetical proteins, allergens, other enzymes, other proteins and peptides. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Information Propagation in Developmental Enhancers

    NASA Astrophysics Data System (ADS)

    Jena, Siddhartha; Levine, Michael

    Rather than encoding information about protein sequence, certain lengths of noncoding DNA, called enhancers, interact with protein machinery such as transcription factors to precisely regulate gene expression. Enhancers have been studied extensively in the fruit fly Drosophila melanogaster, where they regulate the expression of developmental genes that establish the blueprint of the adult fly. It has been suggested that enhancer sequences possess a specific but unknown syntax with regards to the placement and strength of transcription factor binding sites. Moreover, studies in divergent fly species have shown that compensatory evolution allows for maintenance of enhancer functionality despite considerable variation in primary DNA sequence. Here, the possible role of enhancers as signal processing modules is studied as a way of explaining these two findings. We first demonstrate how this framework can be used to explain the fine-tuned spatiotemporal dynamics of gene expression. We then explore the evolutionary pressure on enhancer sequences and the resulting emergence of enhancers that are linked by compensatory mutations. This study provides a possible mechanism for the function of multiple enhancers linked to a single gene.

  13. [The primary structure of a vaccine strain of tobacco mosaic virus V-69].

    PubMed

    Shiian, A N; Mil'shina, N V; Snegireva, P B; Pukhal'skiĭ, V A

    1994-12-01

    A random set of cDNA fragments were synthesized on genomic RNA of TMV vaccine strain V-69, using random primers and reverse transcriptase. Following synthesis of double-stranded cDNA, they were cloned into the pUC-19 plasmid; and 28 clones were sequenced (insert size 100-500 bp). High nucleotide sequence homology of V-69 (more than 95%) was shown only with tomato strain TMV-L [1]. Sequenced clones represent 54% of the genome (50% of the replicase gene, 98% of the transport protein gene, and 60% of the coat protein gene). In this genome region, 24 base substitutions were revealed, as compared to the wild-type TMV-L sequence. Six base substitutions resulted in changes in corresponding amino acid codons. No substitutions coincided with those discovered in the related TMV vaccine strain L11A [2], while two substitutions in the replicase gene were identical to those found in TMV strain Lta1 [3], which is capable of overcoming protection in tomatoes with the resistance gene Tm-1.

  14. Prediction of the aggregation propensity of proteins from the primary sequence: aggregation properties of proteomes.

    PubMed

    Castillo, Virginia; Graña-Montes, Ricardo; Sabate, Raimon; Ventura, Salvador

    2011-06-01

    In the cell, protein folding into stable globular conformations is in competition with aggregation into non-functional and usually toxic structures, since the biophysical properties that promote folding also tend to favor intermolecular contacts, leading to the formation of β-sheet-enriched insoluble assemblies. The formation of protein deposits is linked to at least 20 different human disorders, ranging from dementia to diabetes. Furthermore, protein deposition inside cells represents a major obstacle for the biotechnological production of polypeptides. Importantly, the aggregation behavior of polypeptides appears to be strongly influenced by the intrinsic properties encoded in their sequences and specifically by the presence of selective short regions with high aggregation propensity. This allows computational methods to be used to analyze the aggregation properties of proteins without the previous requirement for structural information. Applications range from the identification of individual amyloidogenic regions in disease-linked polypeptides to the analysis of the aggregation properties of complete proteomes. Herein, we review these theoretical approaches and illustrate how they have become important and useful tools in understanding the molecular mechanisms underlying protein aggregation. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. A novel class of small RNAs bind to MILI protein in mouse testes.

    PubMed

    Aravin, Alexei; Gaidatzis, Dimos; Pfeffer, Sébastien; Lagos-Quintana, Mariana; Landgraf, Pablo; Iovino, Nicola; Morris, Patricia; Brownstein, Michael J; Kuramochi-Miyagawa, Satomi; Nakano, Toru; Chien, Minchen; Russo, James J; Ju, Jingyue; Sheridan, Robert; Sander, Chris; Zavolan, Mihaela; Tuschl, Thomas

    2006-07-13

    Small RNAs bound to Argonaute proteins recognize partially or fully complementary nucleic acid targets in diverse gene-silencing processes. A subgroup of the Argonaute proteins--known as the 'Piwi family'--is required for germ- and stem-cell development in invertebrates, and two Piwi members--MILI and MIWI--are essential for spermatogenesis in mouse. Here we describe a new class of small RNAs that bind to MILI in mouse male germ cells, where they accumulate at the onset of meiosis. The sequences of the over 1,000 identified unique molecules share a strong preference for a 5' uridine, but otherwise cannot be readily classified into sequence families. Genomic mapping of these small RNAs reveals a limited number of clusters, suggesting that these RNAs are processed from long primary transcripts. The small RNAs are 26-31 nucleotides (nt) in length--clearly distinct from the 21-23 nt of microRNAs (miRNAs) or short interfering RNAs (siRNAs)--and we refer to them as 'Piwi-interacting RNAs' or piRNAs. Orthologous human chromosomal regions also give rise to small RNAs with the characteristics of piRNAs, but the cloned sequences are distinct. The identification of this new class of small RNAs provides an important starting point to determine the molecular function of Piwi proteins in mammalian spermatogenesis.

  16. Mapping the primary structure of copper/topaquinone-containing methylamine oxidase from Aspergillus niger.

    PubMed

    Lenobel, R; Sebela, M; Frébort, I

    2005-01-01

    The amino acid sequence of methylamine oxidase (MeAO) from the fungus Aspergillus niger was analyzed using mass spectrometry (MS). First, MeAO was characterized by an accurate molar mass of 72.4 kDa of the monomer measured using MALDI-TOF-MS and by a pI value of 5.8 determined by isoelectric focusing. MALDI-TOF-MS revealed a clear peptide mass fingerprint after tryptic digestion, which did not provide any relevant hit when searched against a nonredundant protein database and was different from that of A. niger amine oxidase AO-I. Tandem mass spectrometry with electrospray ionization coupled to liquid chromatography allowed unambiguous reading of six peptide sequences (11-19 amino acids) and seven sequence tags (4-15 amino acids), which were used for MS BLAST homology searching. MeAO was found to be largely homologous to a hypothetical protein AN7641.2 (EMBL/GenBank protein-accession code EAA61827) from Aspergillus nidulans FGSC A4 with a theoretical molar mass of 76.46 kDa and pI 6.14, which belongs to the superfamily of copper amine oxidases. The protein AN7641.2 is only little homologous to the amine oxidase AO-I (32% identity, 49 % similarity).

  17. Conservation of Three-Dimensional Helix-Loop-Helix Structure through the Vertebrate Lineage Reopens the Cold Case of Gonadotropin-Releasing Hormone-Associated Peptide.

    PubMed

    Pérez Sirkin, Daniela I; Lafont, Anne-Gaëlle; Kamech, Nédia; Somoza, Gustavo M; Vissio, Paula G; Dufour, Sylvie

    2017-01-01

    GnRH-associated peptide (GAP) is the C-terminal portion of the gonadotropin-releasing hormone (GnRH) preprohormone. Although it was reported in mammals that GAP may act as a prolactin-inhibiting factor and can be co-secreted with GnRH into the hypophyseal portal blood, GAP has been practically out of the research circuit for about 20 years. Comparative studies highlighted the low conservation of GAP primary amino acid sequences among vertebrates, contributing to consider that this peptide only participates in the folding or carrying process of GnRH. Considering that the three-dimensional (3D) structure of a protein may define its function, the aim of this study was to evaluate if GAP sequences and 3D structures are conserved in the vertebrate lineage. GAP sequences from various vertebrates were retrieved from databases. Analysis of primary amino acid sequence identity and similarity, molecular phylogeny, and prediction of 3D structures were performed. Amino acid sequence comparison and phylogeny analyses confirmed the large variation of GAP sequences throughout vertebrate radiation. In contrast, prediction of the 3D structure revealed a striking conservation of the 3D structure of GAP1 (GAP associated with the hypophysiotropic type 1 GnRH), despite low amino acid sequence conservation. This GAP1 peptide presented a typical helix-loop-helix (HLH) structure in all the vertebrate species analyzed. This HLH structure could also be predicted for GAP2 in some but not all vertebrate species and in none of the GAP3 analyzed. These results allowed us to infer that selective pressures have maintained GAP1 HLH structure throughout the vertebrate lineage. The conservation of the HLH motif, known to confer biological activity to various proteins, suggests that GAP1 peptides may exert some hypophysiotropic biological functions across vertebrate radiation.

  18. Conservation of Three-Dimensional Helix-Loop-Helix Structure through the Vertebrate Lineage Reopens the Cold Case of Gonadotropin-Releasing Hormone-Associated Peptide

    PubMed Central

    Pérez Sirkin, Daniela I.; Lafont, Anne-Gaëlle; Kamech, Nédia; Somoza, Gustavo M.; Vissio, Paula G.; Dufour, Sylvie

    2017-01-01

    GnRH-associated peptide (GAP) is the C-terminal portion of the gonadotropin-releasing hormone (GnRH) preprohormone. Although it was reported in mammals that GAP may act as a prolactin-inhibiting factor and can be co-secreted with GnRH into the hypophyseal portal blood, GAP has been practically out of the research circuit for about 20 years. Comparative studies highlighted the low conservation of GAP primary amino acid sequences among vertebrates, contributing to consider that this peptide only participates in the folding or carrying process of GnRH. Considering that the three-dimensional (3D) structure of a protein may define its function, the aim of this study was to evaluate if GAP sequences and 3D structures are conserved in the vertebrate lineage. GAP sequences from various vertebrates were retrieved from databases. Analysis of primary amino acid sequence identity and similarity, molecular phylogeny, and prediction of 3D structures were performed. Amino acid sequence comparison and phylogeny analyses confirmed the large variation of GAP sequences throughout vertebrate radiation. In contrast, prediction of the 3D structure revealed a striking conservation of the 3D structure of GAP1 (GAP associated with the hypophysiotropic type 1 GnRH), despite low amino acid sequence conservation. This GAP1 peptide presented a typical helix-loop-helix (HLH) structure in all the vertebrate species analyzed. This HLH structure could also be predicted for GAP2 in some but not all vertebrate species and in none of the GAP3 analyzed. These results allowed us to infer that selective pressures have maintained GAP1 HLH structure throughout the vertebrate lineage. The conservation of the HLH motif, known to confer biological activity to various proteins, suggests that GAP1 peptides may exert some hypophysiotropic biological functions across vertebrate radiation. PMID:28878737

  19. Phosphorylation and nuclear localization of the varicella-zoster virus gene 63 protein.

    PubMed Central

    Stevenson, D; Xue, M; Hay, J; Ruyechan, W T

    1996-01-01

    The protein encoded by varicella-zoster virus open reading frame 63 and carboxy-terminal deletions of the same were expressed either as fusion proteins at the carboxy terminus of the maltose-binding protein in Escherichia coli or independently in transfected mammalian cells. The truncations contained amino acids 1 to 142 (63 delta N) or 1 to 210 (63 delta K) of the complete 278-amino-acid primary sequence. Recombinant casein kinase II phosphorylated the 63F and 63 delta KF fusion proteins in vitro but did not phosphorylate the 63 delta NF fusion protein, implying that phosphorylation occurred between amino acids 142 and 210. Immunoprecipitation of 35S- or 32P-labelled extracts of cells transfected with plasmids expressing 63, 63 delta N, or 63 delta K also indicated that in situ phosphorylation most likely occurred between amino acids 142 and 210. These combined results suggest that casein kinase II plays a significant role in the phosphorylation of the varicella-zoster virus 63 protein. Indirect immunofluorescence of transfected cells indicated nuclear localization of the 63 protein and cytoplasmic localization of 63 delta K and 63 delta N, implying a requirement for sequences between amino acids 210 and 278 for efficient nuclear localization. PMID:8523589

  20. Isolation and characterization of a carrot nucleolar protein with structural and sequence similarity to the vertebrate PESCADILLO protein.

    PubMed

    Ueda, Kenji; Xu, Zheng-Jun; Miyagi, Nobuaki; Ono, Michiyuki; Wabiko, Hiroetsu; Masuda, Kiyoshi; Inoue, Masayasu

    2013-07-01

    The nuclear matrix is involved in many nuclear events, but its protein architecture in plants is still not fully understood. A cDNA clone was isolated by immunoscreening with a monoclonal antibody raised against nuclear matrix proteins of Daucus carota L. Its deduced amino acid sequence showed about 40% identity with the PESCADILLO protein of zebrafish and humans. Primary structure analysis of the protein revealed a Pescadillo N-terminus domain, a single breast cancer C-terminal domain, two nuclear localization signals, and a potential coiled-coil region as also found in animal PESCADILLO proteins. Therefore, we designated this gene DcPES1. Although DcPES1 mRNA was detected in all tissues examined, its levels were highest in tissues with proliferating cells. Immunofluorescence using specific antiserum against the recombinant protein revealed that DcPES1 localized exclusively in the nucleolus. Examination of fusion proteins with green fluorescent protein revealed that the N-terminal portion was important for localization to the nucleoli of tobacco and onion cells. Moreover, when the nuclear matrix of carrot cells was immunostained with an anti-DcPES1 serum, the signal was detected in the nucleolus. Therefore, the DcPES1 protein appears to be a component of or tightly bound to components of the nuclear matrix. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  1. Plant centromeres: structure and control.

    PubMed

    Richards, E J; Dawe, R K

    1998-04-01

    Recent work has led to a better understanding of the molecular components of plant centromeres. Conservation of at least some centromere protein constituents between plant and non-plant systems has been demonstrated. The identity and organization of plant centromeric DNA sequences are also beginning to yield to analysis. While there is little primary DNA sequence conservation among the characterized plant centromeres and their non-plant counterparts, some parallels in centromere genomic organisation can be seen across species. Finally, the emerging idea that centromere activity is controlled epigenetically finds support in an examination of the plant centromere literature.

  2. Computational Modeling of Proteins based on Cellular Automata: A Method of HP Folding Approximation.

    PubMed

    Madain, Alia; Abu Dalhoum, Abdel Latif; Sleit, Azzam

    2018-06-01

    The design of a protein folding approximation algorithm is not straightforward even when a simplified model is used. The folding problem is a combinatorial problem, where approximation and heuristic algorithms are usually used to find near optimal folds of proteins primary structures. Approximation algorithms provide guarantees on the distance to the optimal solution. The folding approximation approach proposed here depends on two-dimensional cellular automata to fold proteins presented in a well-studied simplified model called the hydrophobic-hydrophilic model. Cellular automata are discrete computational models that rely on local rules to produce some overall global behavior. One-third and one-fourth approximation algorithms choose a subset of the hydrophobic amino acids to form H-H contacts. Those algorithms start with finding a point to fold the protein sequence into two sides where one side ignores H's at even positions and the other side ignores H's at odd positions. In addition, blocks or groups of amino acids fold the same way according to a predefined normal form. We intend to improve approximation algorithms by considering all hydrophobic amino acids and folding based on the local neighborhood instead of using normal forms. The CA does not assume a fixed folding point. The proposed approach guarantees one half approximation minus the H-H endpoints. This lower bound guaranteed applies to short sequences only. This is proved as the core and the folds of the protein will have two identical sides for all short sequences.

  3. Negatively Charged Lipid Membranes Promote a Disorder-Order Transition in the Yersinia YscU Protein

    PubMed Central

    Weise, Christoph F.; Login, Frédéric H.; Ho, Oanh; Gröbner, Gerhard; Wolf-Watz, Hans; Wolf-Watz, Magnus

    2014-01-01

    The inner membrane of Gram-negative bacteria is negatively charged, rendering positively charged cytoplasmic proteins in close proximity likely candidates for protein-membrane interactions. YscU is a Yersinia pseudotuberculosis type III secretion system protein crucial for bacterial pathogenesis. The protein contains a highly conserved positively charged linker sequence that separates membrane-spanning and cytoplasmic (YscUC) domains. Although disordered in solution, inspection of the primary sequence of the linker reveals that positively charged residues are separated with a typical helical periodicity. Here, we demonstrate that the linker sequence of YscU undergoes a largely electrostatically driven coil-to-helix transition upon binding to negatively charged membrane interfaces. Using membrane-mimicking sodium dodecyl sulfate micelles, an NMR derived structural model reveals the induction of three helical segments in the linker. The overall linker placement in sodium dodecyl sulfate micelles was identified by NMR experiments including paramagnetic relaxation enhancements. Partitioning of individual residues agrees with their hydrophobicity and supports an interfacial positioning of the helices. Replacement of positively charged linker residues with alanine resulted in YscUC variants displaying attenuated membrane-binding affinities, suggesting that the membrane interaction depends on positive charges within the linker. In vivo experiments with bacteria expressing these YscU replacements resulted in phenotypes displaying significantly reduced effector protein secretion levels. Taken together, our data identify a previously unknown membrane-interacting surface of YscUC that, when perturbed by mutations, disrupts the function of the pathogenic machinery in Yersinia. PMID:25418176

  4. Negatively charged lipid membranes promote a disorder-order transition in the Yersinia YscU protein.

    PubMed

    Weise, Christoph F; Login, Frédéric H; Ho, Oanh; Gröbner, Gerhard; Wolf-Watz, Hans; Wolf-Watz, Magnus

    2014-10-21

    The inner membrane of Gram-negative bacteria is negatively charged, rendering positively charged cytoplasmic proteins in close proximity likely candidates for protein-membrane interactions. YscU is a Yersinia pseudotuberculosis type III secretion system protein crucial for bacterial pathogenesis. The protein contains a highly conserved positively charged linker sequence that separates membrane-spanning and cytoplasmic (YscUC) domains. Although disordered in solution, inspection of the primary sequence of the linker reveals that positively charged residues are separated with a typical helical periodicity. Here, we demonstrate that the linker sequence of YscU undergoes a largely electrostatically driven coil-to-helix transition upon binding to negatively charged membrane interfaces. Using membrane-mimicking sodium dodecyl sulfate micelles, an NMR derived structural model reveals the induction of three helical segments in the linker. The overall linker placement in sodium dodecyl sulfate micelles was identified by NMR experiments including paramagnetic relaxation enhancements. Partitioning of individual residues agrees with their hydrophobicity and supports an interfacial positioning of the helices. Replacement of positively charged linker residues with alanine resulted in YscUC variants displaying attenuated membrane-binding affinities, suggesting that the membrane interaction depends on positive charges within the linker. In vivo experiments with bacteria expressing these YscU replacements resulted in phenotypes displaying significantly reduced effector protein secretion levels. Taken together, our data identify a previously unknown membrane-interacting surface of YscUC that, when perturbed by mutations, disrupts the function of the pathogenic machinery in Yersinia.

  5. Protein sectors: evolutionary units of three-dimensional structure

    PubMed Central

    Halabi, Najeeb; Rivoire, Olivier; Leibler, Stanislas; Ranganathan, Rama

    2011-01-01

    Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term “protein sectors”. Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories. PMID:19703402

  6. Rapid removal of acetimidoyl groups from proteins and peptides. Applications to primary structure determination.

    PubMed Central

    Dubois, G C; Robinson, E A; Inman, J K; Perham, R N; Appella, E

    1981-01-01

    Methylamine buffers can be used for the rapid quantitative removal of acetimidoyl groups from proteins and peptides modified by treatment with ethyl or methyl acetimidate. The half-life for displacement of acetimidoyl groups from fully amidinated proteins incubated in 3.44 M-methylamine/HCl buffer at pH 11.5 and 25 degrees C was approx. 26 min; this half life is 29 times less than that observed in ammonia/HCl buffer under the same conditions of pH and amine concentration. Incubation of acetimidated proteins with methylamine for 4 h resulted in greater than 95% removal of acetimidoyl groups. No deleterious effects on primary structure were detected by amino acid analysis or by automated Edman degradation. Reversible amidination of lysine residues, in conjunction with tryptic digestion, has been successfully applied to the determination of the amino acid sequence of an acetimidated mouse immunoglobulin heavy chain peptide. The regeneration of amino groups in amidinated proteins and peptides by methylaminolysis makes amidination a valuable alternative to citraconoylation and maleoylation in structural studies. PMID:6803762

  7. Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical Discovery

    NASA Astrophysics Data System (ADS)

    Sen, K. Ilker; Tang, Wilfred H.; Nayak, Shruti; Kil, Yong J.; Bern, Marshall; Ozoglu, Berk; Ueberheide, Beatrix; Davis, Darryl; Becker, Christopher

    2017-05-01

    Applications of antibody de novo sequencing in the biopharmaceutical industry range from the discovery of new antibody drug candidates to identifying reagents for research and determining the primary structure of innovator products for biosimilar development. When murine, phage display, or patient-derived monoclonal antibodies against a target of interest are available, but the cDNA or the original cell line is not, de novo protein sequencing is required to humanize and recombinantly express these antibodies, followed by in vitro and in vivo testing for functional validation. Availability of fully automated software tools for monoclonal antibody de novo sequencing enables efficient and routine analysis. Here, we present a novel method to automatically de novo sequence antibodies using mass spectrometry and the Supernovo software. The robustness of the algorithm is demonstrated through a series of stress tests.

  8. Characterization and distribution of a maize cDNA encoding a peptide similar to the catalytic region of second messenger dependent protein kinases

    NASA Technical Reports Server (NTRS)

    Biermann, B.; Johnson, E. M.; Feldman, L. J.

    1990-01-01

    Maize (Zea mays) roots respond to a variety of environmental stimuli which are perceived by a specialized group of cells, the root cap. We are studying the transduction of extracellular signals by roots, particularly the role of protein kinases. Protein phosphorylation by kinases is an important step in many eukaryotic signal transduction pathways. As a first phase of this research we have isolated a cDNA encoding a maize protein similar to fungal and animal protein kinases known to be involved in the transduction of extracellular signals. The deduced sequence of this cDNA encodes a polypeptide containing amino acids corresponding to 33 out of 34 invariant or nearly invariant sequence features characteristic of protein kinase catalytic domains. The maize cDNA gene product is more closely related to the branch of serine/threonine protein kinase catalytic domains composed of the cyclic-nucleotide- and calcium-phospholipid-dependent subfamilies than to other protein kinases. Sequence identity is 35% or more between the deduced maize polypeptide and all members of this branch. The high structural similarity strongly suggests that catalytic activity of the encoded maize protein kinase may be regulated by second messengers, like that of all members of this branch whose regulation has been characterized. Northern hybridization with the maize cDNA clone shows a single 2400 base transcript at roughly similar levels in maize coleoptiles, root meristems, and the zone of root elongation, but the transcript is less abundant in mature leaves. In situ hybridization confirms the presence of the transcript in all regions of primary maize root tissue.

  9. STAG3 truncating variant as the cause of primary ovarian insufficiency

    PubMed Central

    Le Quesne Stabej, Polona; Williams, Hywel J; James, Chela; Tekman, Mehmet; Stanescu, Horia C; Kleta, Robert; Ocaka, Louise; Lescai, Francesco; Storr, Helen L; Bitner-Glindzicz, Maria; Bacchelli, Chiara; Conway, Gerard S

    2016-01-01

    Primary ovarian insufficiency (POI) is a distressing cause of infertility in young women. POI is heterogeneous with only a few causative genes having been discovered so far. Our objective was to determine the genetic cause of POI in a consanguineous Lebanese family with two affected sisters presenting with primary amenorrhoea and an absence of any pubertal development. Multipoint parametric linkage analysis was performed. Whole-exome sequencing was done on the proband. Linkage analysis identified a locus on chromosome 7 where exome sequencing successfully identified a homozygous two base pair duplication (c.1947_48dupCT), leading to a truncated protein p.(Y650Sfs*22) in the STAG3 gene, confirming it as the cause of POI in this family. Exome sequencing combined with linkage analyses offers a powerful tool to efficiently find novel genetic causes of rare, heterogeneous disorders, even in small single families. This is only the second report of a STAG3 variant; the first STAG3 variant was recently described in a phenotypically similar family with extreme POI. Identification of an additional family highlights the importance of STAG3 in POI pathogenesis and suggests it should be evaluated in families affected with POI. PMID:26059840

  10. Co-evolutionary constraints of globular proteins correlate with their folding rates.

    PubMed

    Mallik, Saurav; Kundu, Sudip

    2015-08-04

    Folding rates (lnkf) of globular proteins correlate with their biophysical properties, but relationship between lnkf and patterns of sequence evolution remains elusive. We introduce 'relative co-evolution order' (rCEO) as length-normalized average primary chain separation of co-evolving pairs (CEPs), which negatively correlates with lnkf. In addition to pairs in native 3D contact, indirectly connected and structurally remote CEPs probably also play critical roles in protein folding. Correlation between rCEO and lnkf is stronger in multi-state proteins than two-state proteins, contrasting the case of contact order (co), where stronger correlation is found in two-state proteins. Finally, rCEO, co and lnkf are fitted into a 3D linear correlation. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  11. Memetic algorithms for de novo motif-finding in biomedical sequences.

    PubMed

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. PRIMARY STRUCTURE OF THE CYTOCHROME P450 LANOSTEROL 14A-DEMETHYLASE GENE FROM CANDIDA TROPICALIS

    EPA Science Inventory

    We report the nucleotide sequence of the gene and flanking DNA for the cytochrome P450 lanosterol 14 alpha-demethylase (14DM) from the yeast Candida tropicalis ATCC750. An open reading frame (ORF) of 528 codons encoding a 60.9-kD protein is identified. This ORF includes a charact...

  13. Mechanisms of molecular mimicry involving the microbiota in neurodegeneration.

    PubMed

    Friedland, Robert P

    2015-01-01

    The concept of molecular mimicry was established to explain commonalities of structure which developed in response to evolutionary pressures. Most examples of molecular mimicry in medicine have involved homologies of primary protein structure which cause disease. Molecular mimicry can be expanded beyond amino acid sequence to include microRNA and proteomic effects which are either pathogenic or salutogenic (beneficial) in regard to Parkinson's disease, Alzheimer's disease, and related disorders. Viruses of animal or plant origin may mimic nucleotide sequences of microRNAs and influence protein expression. Both Parkinson's and Alzheimer's diseases involve the formation of transmissible self-propagating prion-like proteins. However, the initiating factors responsible for creation of these misfolded nucleating factors are unknown. Amyloid patterns of protein folding are highly conserved through evolution and are widely distributed in the world. Similarities of tertiary protein structure may be involved in the creation of these prion-like agents through molecular mimicry. Cross-seeding of amyloid misfolding, altered proteostasis, and oxidative stress may be induced by amyloid proteins residing in bacteria in our microbiota in the gut and in the diet. Pathways of molecular mimicry induced processes induced by bacterial amyloid in neurodegeneration may involve TLR 2/1, CD14, and NFκB, among others. Furthermore, priming of the innate immune system by the microbiota may enhance the inflammatory response to cerebral amyloids (such as amyloid-β and α-synuclein). This paper describes the specific molecular pathways of these cross-seeding and neuroinflammatory processes. Evolutionary conservation of proteins provides the opportunity for conserved sequences and structures to influence neurological disease through molecular mimicry.

  14. The bean. alpha. -amylase inhibitor is encoded by a lectin gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreno, J.; Altabella, T.; Chrispeels, M.J.

    The common bean, Phaseolus vulgaris, contains an inhibitor of insect and mammalian {alpha}-amylases that does not inhibit plant {alpha}-amylase. This inhibitor functions as an anti-feedant or seed-defense protein. We purified this inhibitor by affinity chromatography and found that it consists of a series of glycoforms of two polypeptides (Mr 14,000-19,000). Partial amino acid sequencing was carried out, and the sequences obtained are identical with portions of the derived amino acid sequence of a lectin-like gene. This lectin gene encodes a polypeptide of MW 28,000, and the primary in vitro translation product identified by antibodies to the {alpha}-amylase inhibitor has themore » same size. Co- and posttranslational processing of this polypeptide results in glycosylated polypeptides of 14-19 kDa. Our interpretation of these results is that the bean lectins constitute a gene family that encodes diverse plant defense proteins, including phytohemagglutinin, arcelin and {alpha}-amylase inhibitor.« less

  15. Gaining knowledge from previously unexplained spectra-application of the PTM-Explorer software to detect PTM in HUPO BPP MS/MS data.

    PubMed

    Chamrad, Daniel C; Körting, Gerhard; Schäfer, Heike; Stephan, Christian; Thiele, Herbert; Apweiler, Rolf; Meyer, Helmut E; Marcus, Katrin; Blüggel, Martin

    2006-09-01

    A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.

  16. Structural genomics reveals EVE as a new ASCH/PUA-related domain

    PubMed Central

    Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard

    2014-01-01

    Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354

  17. Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertonati, C.; Punta, M; Fischer, M

    2008-01-01

    We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less

  18. Primary structure of the Aequorea victoria green-fluorescent protein.

    PubMed

    Prasher, D C; Eckenrode, V K; Ward, W W; Prendergast, F G; Cormier, M J

    1992-02-15

    Many cnidarians utilize green-fluorescent proteins (GFPs) as energy-transfer acceptors in bioluminescence. GFPs fluoresce in vivo upon receiving energy from either a luciferase-oxyluciferin excited-state complex or a Ca(2+)-activated phosphoprotein. These highly fluorescent proteins are unique due to the chemical nature of their chromophore, which is comprised of modified amino acid (aa) residues within the polypeptide. This report describes the cloning and sequencing of both cDNA and genomic clones of GFP from the cnidarian, Aequorea victoria. The gfp10 cDNA encodes a 238-aa-residue polypeptide with a calculated Mr of 26,888. Comparison of A. victoria GFP genomic clones shows three different restriction enzyme patterns which suggests that at least three different genes are present in the A. victoria population at Friday Harbor, Washington. The gfp gene encoded by the lambda GFP2 genomic clone is comprised of at least three exons spread over 2.6 kb. The nucleotide sequences of the cDNA and the gene will aid in the elucidation of structure-function relationships in this unique class of proteins.

  19. Slowing Translation between Protein Domains by Increasing Affinity between mRNAs and the Ribosomal Anti-Shine-Dalgarno Sequence Improves Solubility.

    PubMed

    Vasquez, Kevin A; Hatridge, Taylor A; Curtis, Nicholas C; Contreras, Lydia M

    2016-02-19

    Recent studies have demonstrated that effective protein production requires coordination of multiple cotranslational cellular processes, which are heavily affected by translation timing. Until recently, protein engineering has focused on codon optimization to maximize protein production rates, mostly considering the effect of tRNA abundance. However, as it relates to complex multidomain proteins, it has been hypothesized that strategic translational pauses between domains and between distinct individual structural motifs can prevent interactions between nascent chain fragments that generate kinetically trapped misfolded peptides and thereby enhance protein yields. In this study, we introduce synthetic transient pauses between structural domains in a heterologous model protein based on designed patterns of affinity between the mRNA and the anti-Shine-Dalgarno (aSD) sequence on the ribosome. We demonstrate that optimizing translation attenuation at domain boundaries can predictably affect solubility patterns in bacteria. Exploration of the affinity space showed that modifying less than 1% of the nucleotides (on a small 12 amino acid linker) can vary soluble protein yields up to ∼7-fold without altering the primary sequence of the protein. In the context of longer linkers, where a larger number of distinct structural motifs can fold outside the ribosome, optimal synonymous codon variations resulted in an additional 2.1-fold increase in solubility, relative to that of nonoptimized linkers of the same length. While rational construction of 54 linkers of various affinities showed a significant correlation between protein solubility and predicted affinity, only weaker correlations were observed between tRNA abundance and protein solubility. We also demonstrate that naturally occurring high-affinity clusters are present between structural domains of β-galactosidase, one of Escherichia coli's largest native proteins. Interdomain ribosomal affinity is an important factor that has not previously been explored in the context of protein engineering.

  20. Traceless splicing enabled by substrate-induced activation of the Nostoc punctiforme Npu DnaE intein after mutation of a catalytic cysteine to serine.

    PubMed

    Cheriyan, Manoj; Chan, Siu-Hong; Perler, Francine

    2014-12-12

    Inteins self-catalytically cleave out of precursor proteins while ligating the surrounding extein fragments with a native peptide bond. Much attention has been lavished on these molecular marvels with the hope of understanding and harnessing their chemistry for novel biochemical transformations including coupling peptides from synthetic or biological origins and controlling protein function. Despite an abundance of powerful applications, the use of inteins is still hampered by limitations in our understanding of their specificity (defined as flanking sequences that permit splicing) and the challenge of inserting inteins into target proteins. We examined the frequently used Nostoc punctiforme Npu DnaE intein after the C-extein cysteine nucleophile (Cys+1) was mutated to serine or threonine. Previous studies demonstrated reduced rates and/or splicing yields with the Npu DnaE intein after mutation of Cys+1 to Ser+1. In this study, genetic selection identified extein sequences with Ser+1 that enabled the Npu DnaE intein to splice with only a 5-fold reduction in rate compared to the wild-type Cys+1 intein and without mutation of the intein itself to activate Ser+1 as a nucleophile. Three different proteins spliced efficiently after insertion of the intein flanked by the selected sequences. We then used this selected specificity to achieve traceless splicing in a targeted enzyme at a location predicted by primary sequence similarity to only the selected C-extein sequence. This study highlights the latent catalytic potential of the Npu DnaE intein to splice with an alternative nucleophile and enables broader intein utility by increasing insertion site choices. Copyright © 2014. Published by Elsevier Ltd.

  1. Complete genome sequences of avian paramyxovirus type 8 strains goose/Delaware/1053/76 and pintail/Wakuya/20/78

    PubMed Central

    Paldurai, Anandan; Subbiah, Madhuri; Kumar, Sachin; Collins, Peter L.; Samal, Siba K.

    2009-01-01

    Complete consensus genome sequences were determined for avian paramyxovirus type 8 (APMV-8) strains goose/Delaware/1053/76 (prototype strain) and pintail/Wakuya/20/78. The genome of each strain is 15,342 nucleotides (nt) long, which follows the “rule of six”. The genome consists of six genes in the order of 3′-N-P/V/W-M-F-HN-L-5′. The genes are flanked on either side by conserved transcription start and stop signals, and have intergenic regions ranging from 1 to 30 nt. The genome contains a 55 nt leader region at the 3′-end and a 171 nt trailer region at the 5′-end. Comparison of sequences of strains Delaware and Wakuya showed nucleotide identity of 96.8% at the genome level and amino acid identities of 99.3%, 96.5%, 98.6%, 99.4%, 98.6% and 99.1% for the predicted N, P, M, F, HN and L proteins, respectively. Both strains grew in embryonated chicken eggs and in primary chicken embryo kidney cells, and 293T cells. Both strains contained only a single basic residue at the cleavage activation site of the F protein and their efficiency of replication in vitro depended on and was augmented by, the presence of exogenous protease in most cell lines. Sequence alignment and phylogenic analysis of the predicted amino acid sequence of APMV-8 strain Delaware proteins with the cognate proteins of other available APMV serotypes showed that APMV-8 is more closely related to APMV-2 and -6 than to APMV-1, -3 and -4. PMID:19341613

  2. Mechanism of DNA binding enhancement by hepatitis B virus protein pX.

    PubMed

    Palmer, C R; Gegnas, L D; Schepartz, A

    1997-12-09

    At least three hundred million people worldwide are infected with the hepatitis B virus (HBV), and epidemiological studies show a clear correlation between chronic HBV infection and the development of hepatocellular carcinoma. HBV encodes a protein, pX, which abducts the cellular transcriptional machinery in several ways including direct interactions with bZIP transcription factors. These interactions increase the DNA affinities of target bZIP proteins in a DNA sequence-dependent manner. Here we use a series of bZIP peptide models to explore the mechanism by which pX interacts with bZIP proteins. Our results suggest that pX increases bZIP.DNA stability by increasing the stability of the bZIP dimer as well as the affinity of the dimer for DNA. Additional experiments provide evidence for a mechanism in which pX recognizes the composite structure of the peptide.DNA complex, not simply the primary peptide sequence. These experiments provide a framework for understanding how pX alters the patterns of transcription within the nucleus. The similarities between the mechanism proposed for pX and the mechanism previously proposed for the human T-cell leukemia virus protein Tax are discussed.

  3. A Generalized Michaelis-Menten Equation in Protein Synthesis: Effects of Mis-Charged Cognate tRNA and Mis-Reading of Codon.

    PubMed

    Dutta, Annwesha; Chowdhury, Debashish

    2017-05-01

    The sequence of amino acid monomers in the primary structure of a protein is decided by the corresponding sequence of codons (triplets of nucleic acid monomers) on the template messenger RNA (mRNA). The polymerization of a protein, by incorporation of the successive amino acid monomers, is carried out by a molecular machine called ribosome. We develop a stochastic kinetic model that captures the possibilities of mis-reading of mRNA codon and prior mis-charging of a tRNA. By a combination of analytical and numerical methods, we obtain the distribution of the times taken for incorporation of the successive amino acids in the growing protein in this mathematical model. The corresponding exact analytical expression for the average rate of elongation of a nascent protein is a 'biologically motivated' generalization of the Michaelis-Menten formula for the average rate of enzymatic reactions. This generalized Michaelis-Menten-like formula (and the exact analytical expressions for a few other quantities) that we report here display the interplay of four different branched pathways corresponding to selection of four different types of tRNA.

  4. Structural organization of the genes for rat von Ebner's gland proteins 1 and 2 reveals their close relationship to lipocalins.

    PubMed

    Kock, K; Ahlers, C; Schmale, H

    1994-05-01

    The rat von Ebner's gland protein 1 (VEGP 1) is a secretory protein, which is abundantly expressed in the small acinar von Ebner's salivary glands of the tongue. Based on the primary structure of this protein we have previously suggested that it is a member of the lipocalin superfamily of lipophilic-ligand carrier proteins. Although the physiological role of VEGP 1 is not clear, it might be involved in sensory or protective functions in the taste epithelium. Here, we report the purification of VEGP 1 and of a closely related secretory polypeptide, VEGP 2, the isolation of a cDNA clone encoding VEGP 2, and the isolation and structural characterization of the genes for both proteins. Protein purification by gel-filtration and anion-exchange chromatography using Mono Q revealed the presence of two different immunoreactive VEGP species. N-terminal sequence determination of peptide fragments isolated after protease Asp-N digestion allowed the identification of a new VEGP, named VEGP 2, in addition to the previously characterized VEGP 1. The complete VEGP 2 sequence was deduced from a cDNA clone isolated from a von Ebner's gland cDNA library. The VEGP 2 cDNA encodes a protein of 177 amino acids and is 94% identical to VEGP 1. DNA sequence analysis of the rat VEGP 1 and 2 genes isolated from rat genomic libraries revealed that both span about 4.5 kb and contain seven exons. The VEGP 1 and 2 genes are non-allelic distinct genes in the rat genome and probably arose by gene duplication. The high degree of nucleotide sequence identity in introns A-C (94-100%) points to a recent gene conversion event that included the 5' part of the genes. The genomic organization of the rat VEGP genes closely resembles that found in other lipocalins such as beta-lactoglobulin, mouse urinary proteins (MUPs) and prostaglandin D synthase, and therefore provides clear evidence that VEGPs belong to this superfamily of proteins.

  5. Structure and function of the UV-B photoreceptor UVR8.

    PubMed

    Jenkins, Gareth I

    2014-12-01

    UVR8 is a UV-B photoreceptor that employs specific tryptophans in its primary sequence as chromophores in photoreception. UV-B absorption causes dissociation of the dimeric photoreceptor by neutralizing interactions between monomers. The monomeric form initiates signalling through interaction with the COP1 protein, leading to transcriptional responses. This article discusses the structural basis of UVR8 function, highlighting recent research on the mechanism of photoreception and on interactions with other proteins involved in signalling and regulation. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Inadequate Reference Datasets Biased toward Short Non-epitopes Confound B-cell Epitope Prediction*

    PubMed Central

    Rahman, Kh. Shamsur; Chowdhury, Erfan Ullah; Sachse, Konrad; Kaltenboeck, Bernhard

    2016-01-01

    X-ray crystallography has shown that an antibody paratope typically binds 15–22 amino acids (aa) of an epitope, of which 2–5 randomly distributed amino acids contribute most of the binding energy. In contrast, researchers typically choose for B-cell epitope mapping short peptide antigens in antibody binding assays. Furthermore, short 6–11-aa epitopes, and in particular non-epitopes, are over-represented in published B-cell epitope datasets that are commonly used for development of B-cell epitope prediction approaches from protein antigen sequences. We hypothesized that such suboptimal length peptides result in weak antibody binding and cause false-negative results. We tested the influence of peptide antigen length on antibody binding by analyzing data on more than 900 peptides used for B-cell epitope mapping of immunodominant proteins of Chlamydia spp. We demonstrate that short 7–12-aa peptides of B-cell epitopes bind antibodies poorly; thus, epitope mapping with short peptide antigens falsely classifies many B-cell epitopes as non-epitopes. We also show in published datasets of confirmed epitopes and non-epitopes a direct correlation between length of peptide antigens and antibody binding. Elimination of short, ≤11-aa epitope/non-epitope sequences improved datasets for evaluation of in silico B-cell epitope prediction. Achieving up to 86% accuracy, protein disorder tendency is the best indicator of B-cell epitope regions for chlamydial and published datasets. For B-cell epitope prediction, the most effective approach is plotting disorder of protein sequences with the IUPred-L scale, followed by antibody reactivity testing of 16–30-aa peptides from peak regions. This strategy overcomes the well known inaccuracy of in silico B-cell epitope prediction from primary protein sequences. PMID:27189949

  7. MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

    PubMed

    Suckau, Detlev; Resemann, Anja

    2009-12-01

    The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.

  8. Beta-haemolytic group A streptococci emm75 carrying altered pyrogenic exotoxin A linked to scarlet fever in adults.

    PubMed

    Dong, Hongjun; Xu, Guozhang; Li, Shuhua; Song, Qifa; Liu, Shijian; Lin, Hui; Chai, Yibiao; Zhou, Aimin; Fang, Ting; Zhang, Hongwei; Jin, Chunguang; Lu, Wei; Cao, Guangwen

    2008-04-01

    To determine the etiological cause of a food-borne outbreak of scarlet fever in adults. Swabs from the throats of the patients and asymptomatic control were cultured on blood agar plates individually. Biochemical identification of all isolates was performed with a VITEX automated system. Antibiotic susceptibility was examined by using the Kirby-Bauer disc diffusion method. emm gene and extracellular pyrogenic exotoxins of each isolate were amplified by using polymerase chain reaction and subjected to DNA sequencing. Sequence differences between the isolated and the highly similar reference sequences were compared on BLAST. Bioinformatics was used to predict protein structures. Beta-haemolytic group A streptococci (GAS) emm75 were identified from 10 of 13 available patients. The isolates were susceptible to penicillin, ampicillin, vancomycin, cefatriaxone, ofloxacin, linezolid and quinupristin. All of the isolates carried pyrogenic exotoxin A (speA) and cysteine protease (speB). Isolated speA was phylogenetically different from 30 highly similar references on BLAST. Differences in the primary sequence of the deduced protein were 14.37-20.12% between the speA and each of 11 references. Secondary protein structure of the speA was different from the references at the N-terminal. GAS emm75 encoding altered speA was responsible for the food-borne outbreak of scarlet fever in adults.

  9. A glycine-to-glutamate substitution abolishes alanine:glyoxylate aminotransferase catalytic activity in a subset of patients with primary hyperoxaluria type 1.

    PubMed

    Purdue, P E; Lumb, M J; Allsop, J; Minatogawa, Y; Danpure, C J

    1992-05-01

    We have synthesized and sequenced alanine:glyoxylate aminotransferase (AGT; HGMW-approved symbol for the gene--AGXT) cDNA from the liver of a primary hyperoxaluria type 1 (PH1) patient who had normal levels of hepatic peroxisomal immunoreactive AGT protein, but no AGT catalytic activity. This revealed the presence of a single point mutation (G----A at cDNA nucleotide 367), which is predicted to cause a glycine-to-glutamate substitution at residue 82 of the AGT protein. This mutation is located in exon 2 of the AGT gene and leads to the loss of an AvaI restriction site. Exon 2-specific PCR followed by AvaI digestion showed that this patient was homozygous for this mutation. In addition, three other PH1 patients, one related to and two unrelated to, but with enzymological phenotype similar to that of the first patient, were also shown to be homozygous for the mutation. However, one other phenotypically similar PH1 patient was shown to lack this mutation. The mechanism by which the glycine-to-glutamate substitution at residue 82 causes loss of catalytic activity remains to be resolved. However, the protein sequence in this region is highly conserved between different mammals, and the substitution at residue 82 is predicted to cause significant local structural alterations.

  10. Intrinsically disordered proteins aggregate at fungal cell-to-cell channels and regulate intercellular connectivity.

    PubMed

    Lai, Julian; Koh, Chuan Hock; Tjota, Monika; Pieuchot, Laurent; Raman, Vignesh; Chandrababu, Karthik Balakrishna; Yang, Daiwen; Wong, Limsoon; Jedd, Gregory

    2012-09-25

    Like animals and plants, multicellular fungi possess cell-to-cell channels (septal pores) that allow intercellular communication and transport. Here, using a combination of MS of Woronin body-associated proteins and a bioinformatics approach that identifies related proteins based on composition and character, we identify 17 septal pore-associated (SPA) proteins that localize to the septal pore in rings and pore-centered foci. SPA proteins are not homologous at the primary sequence level but share overall physical properties with intrinsically disordered proteins. Some SPA proteins form aggregates at the septal pore, and in vitro assembly assays suggest aggregation through a nonamyloidal mechanism involving mainly α-helical and disordered structures. SPA loss-of-function phenotypes include excessive septation, septal pore degeneration, and uncontrolled Woronin body activation. Together, our data identify the septal pore as a complex subcellular compartment and focal point for the assembly of unstructured proteins controlling diverse aspects of intercellular connectivity.

  11. Identification of the gene encoding a 38-kilodalton immunogenic and protective antigen of Streptococcus suis.

    PubMed

    Okwumabua, Ogi; Chinnapapakkagari, Sharmila

    2005-04-01

    In our continued effort to search for a Streptococcus suis protein(s) that can serve as a vaccine candidate or a diagnostic reagent, we constructed and screened a gene library with a polyclonal antibody raised against the whole-cell protein of S. suis type 2. A clone that reacted with the antibody was identified and characterized. Analysis revealed that the gene encoding the protein is localized within a 2.0-kbp EcoRI DNA fragment. The nucleotide sequence contained an open reading frame that encoded a polypeptide of 445 amino acid residues with a calculated molecular mass of 46.4 kDa. By in vitro protein synthesis and Western blot experiments, the protein exhibited an electrophoretic mobility of approximately 38 kDa. At the amino acid level the deduced primary sequence shared homology with sequences of unknown function from Streptococcus pneumoniae (89%), Streptococcus mutans (86%), Lactococcus lactis (80%), Listeria monocytogenes (74%), and Clostridium perfringens (64%). Except for strains of serotypes 20, 26, 32, and 33, Southern hybridization analysis revealed the presence of the gene in strains of other S. suis serotypes and demonstrated restriction fragment length differences caused by a point mutation in the EcoRI recognition sequence. We confirmed expression of the 38-kDa protein in the hybridization-positive isolates using specific antiserum against the purified protein. The recombinant protein was reactive with serum from pigs experimentally infected with virulent strains of S. suis type 2, suggesting that the protein is immunogenic and may serve as an antigen of diagnostic importance for the detection of most S. suis infections. Pigs immunized with the recombinant 38-kDa protein mounted antibody responses to the protein and were completely protected against challenge with a strain of a homologous serotype, the wild-type virulent strain of S. suis type 2, suggesting that it may be a good candidate for the development of a vaccine that can be used as protection against S. suis infection. Analysis of the cellular fractions of the bacterium by Western blotting revealed that the protein was present in the surface and cell wall extracts. The functional role of the protein with respect to pathogenesis and whether antibodies against the antigen confer protective immunity against diseases caused by strains of other pathogenic S. suis capsular types remains to be determined.

  12. On-Line Electrochemical Reduction of Disulfide Bonds: Improved FTICR-CID and -ETD Coverage of Oxytocin and Hepcidin

    NASA Astrophysics Data System (ADS)

    Nicolardi, Simone; Giera, Martin; Kooijman, Pieter; Kraj, Agnieszka; Chervet, Jean-Pierre; Deelder, André M.; van der Burgt, Yuri E. M.

    2013-12-01

    Particularly in the field of middle- and top-down peptide and protein analysis, disulfide bridges can severely hinder fragmentation and thus impede sequence analysis (coverage). Here we present an on-line/electrochemistry/ESI-FTICR-MS approach, which was applied to the analysis of the primary structure of oxytocin, containing one disulfide bridge, and of hepcidin, containing four disulfide bridges. The presented workflow provided up to 80 % (on-line) conversion of disulfide bonds in both peptides. With minimal sample preparation, such reduction resulted in a higher number of peptide backbone cleavages upon CID or ETD fragmentation, and thus yielded improved sequence coverage. The cycle times, including electrode recovery, were rapid and, therefore, might very well be coupled with liquid chromatography for protein or peptide separation, which has great potential for high-throughput analysis.

  13. Protein expression and genetic variability of canine Can f 1 in golden and Labrador retriever service dogs.

    PubMed

    Breitenbuecher, Christina; Belanger, Janelle M; Levy, Kerinne; Mundell, Paul; Fates, Valerie; Gershony, Liza; Famula, Thomas R; Oberbauer, Anita M

    2016-01-01

    Valued for trainability in diverse tasks, dogs are the primary service animal used to assist individuals with disabilities. Despite their utility, many people in need of service dogs are sensitive to the primary dog allergen, Can f 1, encoded by the Lipocalin 1 gene (LCN1). Several organizations specifically breed service dogs to meet special needs and would like to reduce allergenic potential if possible. In this study, we evaluated the expression of Can f 1 protein and the inherent variability of LCN1 in two breeds used extensively as service dogs. Saliva samples from equal numbers of male and female Labrador retrievers (n = 12), golden retrievers (n = 12), and Labrador-golden crosses (n = 12) were collected 1 h after the morning meal. Can f 1 protein concentrations in the saliva were measured by ELISA, and the LCN1 5' and 3' UTRs and exons sequenced. There was no sex effect (p > 0.2) nor time-of-day effect; however, Can f 1 protein levels varied by breed with Labrador retrievers being lower than golden retrievers (3.18 ± 0.51 and 5.35 ± 0.52 μg/ml, respectively, p < 0.0075), and the Labrador-golden crosses having intermediate levels (3.77 ± 0.48 μg/ml). Although several novel SNPs were identified in LCN1, there were no significant breed-specific sequence differences in the gene and no association of LCN1 genotypes with Can f 1 expression. As service dogs, Labrador retrievers likely have lower allergenic potential and, though there were no DNA sequence differences identified, classical genetic selection on the estimated breeding values associated with salivary Can f 1 expression may further reduce that potential.

  14. Systems Biology of Recombinant Protein Production in Bacillus megaterium

    NASA Astrophysics Data System (ADS)

    Biedendieck, Rebekka; Bunk, Boyke; Fürch, Tobias; Franco-Lara, Ezequiel; Jahn, Martina; Jahn, Dieter

    Over the last two decades the Gram-positive bacterium Bacillus megaterium was systematically developed to a useful alternative protein production host. Multiple vector systems for high yield intra- and extracellular protein production were constructed. Strong inducible promoters were combined with DNA sequences for optimised ribosome binding sites, various leader peptides for protein export and N- as well as C-terminal affinity tags for affinity chromatographic purification of the desired protein. High cell density cultivation and recombinant protein production were successfully tested. For further system biology based control and optimisation of the production process the genomes of two B. megaterium strains were completely elucidated, DNA arrays designed, proteome, fluxome and metabolome analyses performed and all data integrated using the bioinformatics platform MEGABAC. Now, solid theoretical and experimental bases for primary modeling attempts of the production process are available.

  15. Purification and cDNA cloning of a protein derived from Flammulina velutipes that increases the permeability of the intestinal Caco-2 cell monolayer.

    PubMed

    Watanabe, H; Narai, A; Shimizu, M

    1999-06-01

    A new protein that decreases transepithelial electrical resistance (TEER) in the human intestinal Caco-2 cell monolayer was found in a water-soluble fraction of the mushroom Flammulina velutipes. This protein, termed TEER-decreasing protein (TDP), is not cytotoxic and does not induce cell detachment, but rapidly increases the tight junctional permeability for water-soluble marker substances such as Lucifer Yellow CH (Mr 457) through the paracellular pathway. TDP was isolated and purified from the aqueous extract of F. velutipes by chromatographic means. Purified TDP was found to be a simple, nonglycosylated protein without intermolecular disulfide bonds, and the apparent molecular mass as estimated by SDS/PAGE and gel filtration is 30 kDa. It was revealed that the N-terminal amino-acid sequence of purified TDP is identical to the recently reported N-terminal sequence of flammutoxin, a membrane-perturbing hemolytic protein, for which the complete primary structure has not yet been reported [Tomita, T., Ishikawa, D., Noguchi, T., Katayama, E., and Hashimoto, Y. (1998) Biochem. J. 333, 24794-24799]. The cDNA coding for TDP was cloned by 5' and 3' rapid amplification of cDNA ends. The ORF encodes a protein with 272 amino-acid residues showing no homology to known proteins. Relevant studies using TDP cDNA will provide insight into the structure-function relationships of membrane pore-forming toxins.

  16. Molecular cloning of a murine homologue of membrane cofactor protein (CD46): preferential expression in testicular germ cells.

    PubMed Central

    Tsujimura, A; Shida, K; Kitamura, M; Nomura, M; Takeda, J; Tanaka, H; Matsumoto, M; Matsumiya, K; Okuyama, A; Nishimune, Y; Okabe, M; Seya, T

    1998-01-01

    Human membrane cofactor protein (MCP, CD46) has been suggested, although no convincing evidence has been proposed, to be a fertilization-associated protein, in addition to its primary functions as a complement regulator and a measles virus receptor. We have cloned a cDNA encoding the murine homologue of MCP. This cDNA showed 45% identity in deduced protein sequence and 62% identity in nucleotide sequence with human MCP. Its ectodomains were four short consensus repeats and a serine/threonine-rich domain, and it appeared to be a type 1 membrane protein with a 23-amino acid transmembrane domain and a short cytoplasmic tail. The protein expressed on Chinese hamster ovary cell transfectants was 47 kDa on SDS/PAGE immunoblotting, approximately 6 kDa larger than the murine testis MCP. It served as a cofactor for factor I-mediated inactivation of the complement protein C3b in a homologous system and, to a lesser extent, in a human system. Strikingly, the major message of murine MCP was 1.5 kb and was expressed predominantly in the testis. It was not detected in mice defective in spermatogenesis or with immature germ cells (until 23 days old). Thus, murine MCP may be a sperm-dominant protein the message of which is expressed selectively in spermatids during germ-cell differentiation. PMID:9461505

  17. The Treacher Collins syndrome (TCOF1) gene product, treacle, is targeted to the nucleolus by signals in its C-terminus.

    PubMed

    Winokur, S T; Shiang, R

    1998-11-01

    The TCOF1 gene product, treacle, responsible for the craniofacial disorder Treacher Collins syndrome, has been predicted to be a member of a class of nucleolar phosphoproteins based on its primary amino acid sequence. Treacle is a low complexity protein with ten repeating units of acidic and basic residues, each of which contains a large number of putative casein kinase 2 and protein kinase C phosphorylation sites. In addition, the C-terminus of treacle contains multiple putative nuclear localization signals. The overall structure of treacle, as well as sequence similarity to several nucleolar phosphoproteins, predicts that treacle is a member of this class of proteins. Using green fluorescent protein fusion constructs with the full-length and deleted domains of the murine homolog of treacle, we demonstrate that the cellular localization of treacle is nucleolar. This localization is mediated by the last 41 residues of the C-terminus (residues 1262-1302). At least two functional nuclear localization signals have been identified in the protein, one between residues 1176 and 1270 and the second within the last 32 residues of the protein (1271-1302). The nucleolar localization signal is disrupted by two constructs that split the C-terminal region between residues 1270 and 1271. This study provides the first direct analysis of treacle and demonstrates that the protein involved in TCOF1 is a nucleolar protein.

  18. Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

    PubMed

    Ning, Juan; Wang, Minxiao; Li, Chaolun; Sun, Song

    2013-01-01

    Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs) that provide a resource for gene function studies. Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.

  19. P41IDENTIFICATION OF GLIOMA SPECIFIC APTAMER TARGETS

    PubMed Central

    Arora, Mohit; Alder, Jane; Lawrence, Clare; Davis, Charles; Dawson, Tim; Hall, Greg; Shaw, Lisa

    2014-01-01

    INTRODUCTION: Aptamers are in vitro generated DNA and RNA sequences which are randomly created as a library, with multiple permutations and combinations. These are then exposed to the target structure against which we want an aptamer ‘selected’ using Sequential Enumeration of Ligands by Exponential enrichment (SELEX). METHOD: Commercially available glioma and glial cell lines and in-house generated primary glioma cultures were used. Modified aptamers based on published sequences against glioma cell lines and newly generated sequences were used in the project to identify their binding targets. Cy3 or biotin- conjugated aptamers were incubated with live glioma cell cultures and imaged using confocal or light microscopy.To determine the target ligand, aptamers were then reacted with glial cell lysate and subjected to precipitation using streptavidin agarose beads and SDS polyacrylamide electrophoresis. Proteins were analysed by mass spectroscopy. RESULTS: Known and unknown aptamer protein ligands were co-precipitated. Ku70, Ku80 were precipitated along with nucleolin and related proteins. CONCLUSION: The aptamer has shown preferential binding to glioma cells and could act as a delivery system for therapeutic payloads. The aptamer targets Ku70 and Ku80, which are known to be over expressed in other forms of cancer but their role in gliomagenesis has not been fully elucidated. Other novel proteins have also been identified. Thus the aptamer co-precipitation technique has identified potential glioma biomarkers that may be of clinical significance.

  20. Prediction of small molecule binding property of protein domains with Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Stein, Martin; Jackson, David; Eils, Roland

    2009-12-01

    Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request.

  1. cDNA, deduced polypeptide structure and chromosomal assignment of human pulmonary surfactant proteolipid, SPL(pVal)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.

    1988-01-05

    In hyaline membrane disease of premature infants, lack of surfactant leads to pulmonary atelectasis and respiratory distress. Hydrophobic surfactant proteins of M/sub r/ = 5000-14,000 have been isolated from mammalian surfactants which enhance the rate of spreading and the surface tension lowering properties of phospholipids during dynamic compression. The authors have characterized the amino-terminal amino acid sequence of pulmonary proteolipids from ether/ethanol extracts of bovine, canine, and human surfactant. Two distinct peptides were identified and termed SPL(pVal) and SPL(Phe). An oligonucleotide probe based on the valine-rich amino-terminal amino acid sequence of SPL(pVal) was utilized to isolate cDNA and genomic DNAmore » encoding the human protein, termed surfactant proteolipid SPL(pVal) on the basis of its unique polyvaline domain. The primary structure of a precursor protein of 20,870 daltons, containing the SPL(pVal) peptide, was deduced from the nucleotide sequence of the cDNAs. Hybrid-arrested translation and immunoprecipitation of labeled translation products of human mRNA demonstrated a precursor protein, the active hydrophobic peptide being produced by proteolytic processing. Two classes of cDNAs encoding SPL(pVal) were identified. Human SPL(pVal) mRNA was more abundant in the adult than in fetal lung. The SPL(pVal) gene locus was assigned to chromosome 8.« less

  2. Silk from crickets: a new twist on spinning.

    PubMed

    Walker, Andrew A; Weisman, Sarah; Church, Jeffrey S; Merritt, David J; Mudie, Stephen T; Sutherland, Tara D

    2012-01-01

    Raspy crickets (Orthoptera: Gryllacrididae) are unique among the orthopterans in producing silk, which is used to build shelters. This work studied the material composition and the fabrication of cricket silk for the first time. We examined silk-webs produced in captivity, which comprised cylindrical fibers and flat films. Spectra obtained from micro-Raman experiments indicated that the silk is composed of protein, primarily in a beta-sheet conformation, and that fibers and films are almost identical in terms of amino acid composition and secondary structure. The primary sequences of four silk proteins were identified through a mass spectrometry/cDNA library approach. The most abundant silk protein was large in size (300 and 220 kDa variants), rich in alanine, glycine and serine, and contained repetitive sequence motifs; these are features which are shared with several known beta-sheet forming silk proteins. Convergent evolution at the molecular level contrasts with development by crickets of a novel mechanism for silk fabrication. After secretion of cricket silk proteins by the labial glands they are fabricated into mature silk by the labium-hypopharynx, which is modified to allow the controlled formation of either fibers or films. Protein folding into beta-sheet structure during silk fabrication is not driven by shear forces, as is reported for other silks.

  3. Acylation-dependent protein export in Leishmania.

    PubMed

    Denny, P W; Gokool, S; Russell, D G; Field, M C; Smith, D F

    2000-04-14

    The surface of the protozoan parasite Leishmania is unusual in that it consists predominantly of glycosylphosphatidylinositol-anchored glycoconjugates and proteins. Additionally, a family of hydrophilic acylated surface proteins (HASPs) has been localized to the extracellular face of the plasma membrane in infective parasite stages. These surface polypeptides lack a recognizable endoplasmic reticulum secretory signal sequence, transmembrane spanning domain, or glycosylphosphatidylinositol-anchor consensus sequence, indicating that novel mechanisms are involved in their transport and localization. Here, we show that the N-terminal domain of HASPB contains primary structural information that directs both N-myristoylation and palmitoylation and is essential for correct localization of the protein to the plasma membrane. Furthermore, the N-terminal 18 amino acids of HASPB, encoding the dual acylation site, are sufficient to target the heterologous Aequorea victoria green fluorescent protein to the cell surface of Leishmania. Mutagenesis of the predicted acylated residues confirms that modification by both myristate and palmitate is required for correct trafficking. These data suggest that HASPB is a representative of a novel class of proteins whose translocation onto the surface of eukaryotic cells is dependent upon a "non-classical" pathway involving N-myristoylation/palmitoylation. Significantly, HASPB is also translocated on to the extracellular face of the plasma membrane of transfected mammalian cells, indicating that the export signal for HASPB is recognized by a higher eukaryotic export mechanism.

  4. An Aromatic Sensor with Aversion to Damaged Strands Confers Versatility to DNA Repair

    PubMed Central

    Maillard, Olivier; Solyom, Szilvia; Naegeli, Hanspeter

    2007-01-01

    It was not known how xeroderma pigmentosum group C (XPC) protein, the primary initiator of global nucleotide excision repair, achieves its outstanding substrate versatility. Here, we analyzed the molecular pathology of a unique Trp690Ser substitution, which is the only reported missense mutation in xeroderma patients mapping to the evolutionary conserved region of XPC protein. The function of this critical residue and neighboring conserved aromatics was tested by site-directed mutagenesis followed by screening for excision activity and DNA binding. This comparison demonstrated that Trp690 and Phe733 drive the preferential recruitment of XPC protein to repair substrates by mediating an exquisite affinity for single-stranded sites. Such a dual deployment of aromatic side chains is the distinctive feature of functional oligonucleotide/oligosaccharide-binding folds and, indeed, sequence homologies with replication protein A and breast cancer susceptibility 2 protein indicate that XPC displays a monomeric variant of this recurrent interaction motif. An aversion to associate with damaged oligonucleotides implies that XPC protein avoids direct contacts with base adducts. These results reveal for the first time, to our knowledge, an entirely inverted mechanism of substrate recognition that relies on the detection of single-stranded configurations in the undamaged complementary sequence of the double helix. PMID:17355181

  5. Analysis of the Sarcocystis neurona microneme protein SnMIC10: protein characteristics and expression during intracellular development.

    PubMed

    Hoane, Jessica S; Carruthers, Vernon B; Striepen, Boris; Morrison, David P; Entzeroth, Rolf; Howe, Daniel K

    2003-07-01

    Sarcocystis neurona, an apicomplexan parasite, is the primary causative agent of equine protozoal myeloencephalitis. Like other members of the Apicomplexa, S. neurona zoites possess secretory organelles that contain proteins necessary for host cell invasion and intracellular survival. From a collection of S. neurona expressed sequence tags, we identified a sequence encoding a putative microneme protein based on similarity to Toxoplasma gondii MIC10 (TgMIC10). Pairwise sequence alignments of SnMIC10 to TgMIC10 and NcMIC10 from Neospora caninum revealed approximately 33% identity to both orthologues. The open reading frame of the S. neurona gene encodes a 255 amino acid protein with a predicted 39-residue signal peptide. Like TgMIC10 and NcMIC10, SnMIC10 is predicted to be hydrophilic, highly alpha-helical in structure, and devoid of identifiable adhesive domains. Antibodies raised against recombinant SnMIC10 recognised a protein band with an apparent molecular weight of 24 kDa in Western blots of S. neurona merozoites, consistent with the size predicted for SnMIC10. In vitro secretion assays demonstrated that this protein is secreted by extracellular merozoites in a temperature-dependent manner. Indirect immunofluorescence analysis of SnMIC10 showed a polar labelling pattern, which is consistent with the apical position of the micronemes, and immunoelectron microscopy provided definitive localisation of the protein to these secretory organelles. Further analysis of SnMIC10 in intracellular parasites revealed that expression of this protein is temporally regulated during endopolygeny, supporting the view that micronemes are only needed during host cell invasion. Collectively, the data indicate that SnMIC10 is a microneme protein that is part of the excreted/secreted antigen fraction of S. neurona. Identification and characterisation of additional S. neurona microneme antigens and comparisons to orthologues in other Apicomplexa could provide further insight into the functions that these proteins serve during invasion of host cells.

  6. Difficulties in Generating Specific Antibodies for Immunohistochemical Detection of Nitrosylated Tubulins

    PubMed Central

    Kamnev, Anton; Muhar, Matthias; Preinreich, Martina; Ammer, Hermann; Propst, Friedrich

    2013-01-01

    Protein S-nitrosylation, the covalent attachment of a nitroso moiety to thiol groups of specific cysteine residues, is one of the major pathways of nitric oxide signaling. Hundreds of proteins are subject to this transient post-translational modification and for some the functional consequences have been identified. Biochemical assays for the analysis of protein S-nitrosylation have been established and can be used to study if and under what conditions a given protein is S-nitrosylated. In contrast, the equally desirable subcellular localization of specific S-nitrosylated protein isoforms has not been achieved to date. In the current study we attempted to specifically localize S-nitrosylated α- and β-tubulin isoforms in primary neurons after fixation. The approach was based on in situ replacement of the labile cysteine nitroso modification with a stable tag and the subsequent use of antibodies which recognize the tag in the context of the tubulin polypeptide sequence flanking the cysteine residue of interest. We established a procedure for tagging S-nitrosylated proteins in cultured primary neurons and obtained polyclonal anti-tag antibodies capable of specifically detecting tagged proteins on immunoblots and in fixed cells. However, the antibodies were not specific for tubulin isoforms. We suggest that different tagging strategies or alternative methods such as fluorescence resonance energy transfer techniques might be more successful. PMID:23840827

  7. Gene: a gene-centered information resource at NCBI.

    PubMed

    Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D

    2015-01-01

    The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  8. Electron microscopic analysis and structural characterization of novel NADP(H)-containing methanol: N,N'-dimethyl-4-nitrosoaniline oxidoreductases from the gram-positive methylotrophic bacteria Amycolatopsis methanolica and Mycobacterium gastri MB19.

    PubMed Central

    Bystrykh, L V; Vonck, J; van Bruggen, E F; van Beeumen, J; Samyn, B; Govorukhina, N I; Arfman, N; Duine, J A; Dijkhuizen, L

    1993-01-01

    The quaternary protein structure of two methanol:N,N'-dimethyl-4-nitrosoaniline (NDMA) oxidoreductases purified from Amycolatopsis methanolica and Mycobacterium gastri MB19 was analyzed by electron microscopy and image processing. The enzymes are decameric proteins (displaying fivefold symmetry) with estimated molecular masses of 490 to 500 kDa based on their subunit molecular masses of 49 to 50 kDa. Both methanol:NDMA oxidoreductases possess a tightly but noncovalently bound NADP(H) cofactor at an NADPH-to-subunit molar ratio of 0.7. These cofactors are redox active toward alcohol and aldehyde substrates. Both enzymes contain significant amounts of Zn2+ and Mg2+ ions. The primary amino acid sequences of the A. methanolica and M. gastri MB19 methanol:NDMA oxidoreductases share a high degree of identity, as indicated by N-terminal sequence analysis (63% identity among the first 27 N-terminal amino acids), internal peptide sequence analysis, and overall amino acid composition. The amino acid sequence analysis also revealed significant similarity to a decameric methanol dehydrogenase of Bacillus methanolicus C1. Images PMID:8449887

  9. A Single Rainbow Trout Cobalamin-binding Protein Stands in for Three Human Binders

    PubMed Central

    Greibe, Eva; Fedosov, Sergey; Sorensen, Boe S.; Højrup, Peter; Poulsen, Steen S.; Nexo, Ebba

    2012-01-01

    Cobalamin uptake and transport in mammals are mediated by three cobalamin-binding proteins: haptocorrin, intrinsic factor, and transcobalamin. The nature of cobalamin-binding proteins in lower vertebrates remains to be elucidated. The aim of this study was to characterize the cobalamin-binding proteins of the rainbow trout (Oncorhynchus mykiss) and to compare their properties with those of the three human cobalamin-binding proteins. High cobalamin-binding capacity was found in trout stomach (210 pmol/g), roe (400 pmol/g), roe fluid (390 nmol/liter), and plasma (2500 nmol/liter). In all cases, it appeared to be the same protein based on analysis of partial sequences and immunological responses. The trout cobalamin-binding protein was purified from roe fluid, sequenced, and further characterized. Like haptocorrin, the trout cobalamin-binding protein was stable at low pH and had a high binding affinity for the cobalamin analog cobinamide. Like haptocorrin and transcobalamin, the trout cobalamin-binding protein was present in plasma and recognized ligands with altered nucleotide moiety. Like intrinsic factors, the trout cobalamin-binding protein was present in the stomach and resisted degradation by trypsin and chymotrypsin. It also resembled intrinsic factor in the composition of conserved residues in the primary cobalamin-binding site in the C terminus. The trout cobalamin-binding protein was glycosylated and displayed spectral properties comparable with those of haptocorrin and intrinsic factor. In conclusion, only one soluble cobalamin-binding protein was identified in the rainbow trout, a protein that structurally behaves like an intermediate between the three human cobalamin-binding proteins. PMID:22872637

  10. Characterisation and expression of a PP1 serine/threonine protein phosphatase (PfPP1) from the malaria parasite, Plasmodium falciparum: demonstration of its essential role using RNA interference

    PubMed Central

    Kumar, Rajinder; Adams, Brian; Oldenburg, Anja; Musiyenko, Alla; Barik, Sailen

    2002-01-01

    Background Reversible protein phosphorylation is relatively unexplored in the intracellular protozoa of the Apicomplexa family that includes the genus Plasmodium, to which belong the causative agents of malaria. Members of the PP1 family represent the most highly conserved protein phosphatase sequences in phylogeny and play essential regulatory roles in various cellular pathways. Previous evidence suggested a PP1-like activity in Plasmodium falciparum, not yet identified at the molecular level. Results We have identified a PP1 catalytic subunit from P. falciparum and named it PfPP1. The predicted primary structure of the 304-amino acid long protein was highly similar to PP1 sequences of other species, and showed conservation of all the signature motifs. The purified recombinant protein exhibited potent phosphatase activity in vitro. Its sensitivity to specific phosphatase inhibitors was characteristic of the PP1 class. The authenticity of the PfPP1 cDNA was further confirmed by mutational analysis of strategic amino acid residues important in catalysis. The protein was expressed in all erythrocytic stages of the parasite. Abrogation of PP1 expression by synthetic short interfering RNA (siRNA) led to inhibition of parasite DNA synthesis. Conclusions The high sequence similarity of PfPP1 with other PP1 members suggests conservation of function. Phenotypic gene knockdown studies using siRNA confirmed its essential role in the parasite. Detailed studies of PfPP1 and its regulation may unravel the role of reversible protein phosphorylation in the signalling pathways of the parasite, including glucose metabolism and parasitic cell division. The use of siRNA could be an important tool in the functional analysis of Apicomplexan genes. PMID:12057017

  11. Germline genetic variants in men with prostate cancer and one or more additional cancers.

    PubMed

    Pilié, Patrick G; Johnson, Anna M; Hanson, Kristen L; Dayno, Megan E; Kapron, Ashley L; Stoffel, Elena M; Cooney, Kathleen A

    2017-10-15

    Prostate cancer has a significant heritable component, and rare deleterious germline variants in certain genes can increase the risk of the disease. The aim of the current study was to describe the prevalence of pathogenic germline variants in cancer-predisposing genes in men with prostate cancer and at least 1 additional primary cancer. Using a multigene panel, the authors sequenced germline DNA from 102 men with prostate cancer and at least 1 additional primary cancer who also met ≥1 of the following criteria: 1) age ≤55 years at the time of diagnosis of the first malignancy; 2) rare tumor type or atypical presentation of a common tumor; and/or 3) ≥3 primary malignancies. Cancer family history and clinicopathologic data were independently reviewed by a clinical genetic counselor to determine whether the patient met established criteria for testing for a hereditary cancer syndrome. Sequencing identified approximately 3500 variants. Nine protein-truncating deleterious mutations were found across 6 genes, including BRCA2, ataxia telangiectasia mutated (ATM), mutL homolog 1 (MLH1), BRCA1 interacting protein C-terminal helicase 1 (BRIP1), partner and localizer of BRCA2 (PALB2), and fibroblast growth factor receptor 3 (FGFR3). Likely pathogenic missense variants were identified in checkpoint kinase 2 (CHEK2) and homeobox protein Hox-B13 (HOXB13). In total, 11 of 102 patients (10.8%) were found to have pathogenic or likely pathogenic mutations in cancer-predisposing genes. The majority of these men (64%) did not meet current clinical criteria for germline testing. Men with prostate cancer and at least 1 additional primary cancer are enriched for harboring a germline deleterious mutation in a cancer-predisposing gene that may impact cancer prognosis and treatment, but the majority do not meet current criteria for clinical genetic testing. Cancer 2017;123:3925-32. © 2017 American Cancer Society. © 2017 American Cancer Society.

  12. Profiling of integral membrane proteins and their post translational modifications using high-resolution mass spectrometry.

    PubMed

    Souda, Puneet; Ryan, Christopher M; Cramer, William A; Whitelegge, Julian

    2011-12-01

    Integral membrane proteins pose challenges to traditional proteomics approaches due to unique physicochemical properties including hydrophobic transmembrane domains that limit solubility in aqueous solvents. A well resolved intact protein molecular mass profile defines a protein's native covalent state including post-translational modifications, and is thus a vital measurement toward full structure determination. Both soluble loop regions and transmembrane regions potentially contain post-translational modifications that must be characterized if the covalent primary structure of a membrane protein is to be defined. This goal has been achieved using electrospray-ionization mass spectrometry (ESI-MS) with low-resolution mass analyzers for intact protein profiling, and high-resolution instruments for top-down experiments, toward complete covalent primary structure information. In top-down, the intact protein profile is supplemented by gas-phase fragmentation of the intact protein, including its transmembrane regions, using collisionally activated and/or electron-capture dissociation (CAD/ECD) to yield sequence-dependent high-resolution MS information. Dedicated liquid chromatography systems with aqueous/organic solvent mixtures were developed allowing us to demonstrate that polytopic integral membrane proteins are amenable to ESI-MS analysis, including top-down measurements. Covalent post-translational modifications are localized regardless of their position in transmembrane domains. Top-down measurements provide a more detail oriented high-resolution description of post-transcriptional and post-translational diversity for enhanced understanding beyond genomic translation. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. High levels of p19/nm23 protein in neuroblastoma are associated with advanced stage disease and with N-myc gene amplification.

    PubMed Central

    Hailat, N; Keim, D R; Melhem, R F; Zhu, X X; Eckerskorn, C; Brodeur, G M; Reynolds, C P; Seeger, R C; Lottspeich, F; Strahler, J R

    1991-01-01

    The gene encoding a novel protein designated nm23-H1, which was recently identified as identical to the A subunit of nucleotide diphosphate kinase from human erythrocytes, has been proposed to play a role in tumor metastasis suppression. We report that untreated neuroblastoma tumors contain a cellular polypeptide (Mr = 19,000) designated p19, identified in two-dimensional electrophoretic gels, which occurs at significantly higher levels (P = 0.0001) in primary tumors containing amplified N-myc gene. The partial amino acid sequence obtained for p19 is identical to the sequence of the human nm23-H1 protein. An antibody to the A subunit of erythrocyte nucleotide diphosphate kinase reacted exclusively with p19. In this study, significantly higher levels of p19/nm23 occurred in primary neuroblastoma tumors from patients with advanced stages (III and IV) relative to tumors from patients with limited stages (I and II) of the disease. Even among patients with a single copy of the N-myc gene, tumors from patients with stages III and IV had statistically significantly higher levels of p19/nm23 than tumors from patients with stages I and II. Our findings indicate that, in contrast to a proposed role for nm23-H1 as a tumor metastasis suppressor, increased p19/nm23 protein in neuroblastoma is correlated with features of the disease that are associated with aggressive tumors. Therefore, nm23-H1 may have distinct if not opposite roles in different tumors. Images PMID:2056128

  14. Reverse genetics with a full-length infectious cDNA of the Middle East respiratory syndrome coronavirus.

    PubMed

    Scobey, Trevor; Yount, Boyd L; Sims, Amy C; Donaldson, Eric F; Agnihothram, Sudhakar S; Menachery, Vineet D; Graham, Rachel L; Swanstrom, Jesica; Bove, Peter F; Kim, Jeeho D; Grego, Sonia; Randell, Scott H; Baric, Ralph S

    2013-10-01

    Severe acute respiratory syndrome with high mortality rates (~50%) is associated with a novel group 2c betacoronavirus designated Middle East respiratory syndrome coronavirus (MERS-CoV). We synthesized a panel of contiguous cDNAs that spanned the entire genome. Following contig assembly into genome-length cDNA, transfected full-length transcripts recovered several recombinant viruses (rMERS-CoV) that contained the expected marker mutations inserted into the component clones. Because the wild-type MERS-CoV contains a tissue culture-adapted T1015N mutation in the S glycoprotein, rMERS-CoV replicated ~0.5 log less efficiently than wild-type virus. In addition, we ablated expression of the accessory protein ORF5 (rMERS•ORF5) and replaced it with tomato red fluorescent protein (rMERS-RFP) or deleted the entire ORF3, 4, and 5 accessory cluster (rMERS-ΔORF3-5). Recombinant rMERS-CoV, rMERS-CoV•ORF5, and MERS-CoV-RFP replicated to high titers, whereas MERS-ΔORF3-5 showed 1-1.5 logs reduced titer compared with rMERS-CoV. Northern blot analyses confirmed the associated molecular changes in the recombinant viruses, and sequence analysis demonstrated that RFP was expressed from the appropriate consensus sequence AACGAA. We further show dipeptidyl peptidase 4 expression, MERS-CoV replication, and RNA and protein synthesis in human airway epithelial cell cultures, primary lung fibroblasts, primary lung microvascular endothelial cells, and primary alveolar type II pneumocytes, demonstrating a much broader tissue tropism than severe acute respiratory syndrome coronavirus. The availability of a MERS-CoV molecular clone, as well as recombinant viruses expressing indicator proteins, will allow for high-throughput testing of therapeutic compounds and provide a genetic platform for studying gene function and the rational design of live virus vaccines.

  15. Chemical probes and engineered constructs reveal a detailed unfolding mechanism for a solvent-free multi-domain protein

    PubMed Central

    Eschweiler, Joseph D.; Martini, Rachel M.; Ruotolo, Brandon T.

    2017-01-01

    Despite the growing application of gas-phase measurements in structural biology and drug discovery, the factors that govern protein stabilities and structures in a solvent-free environment are still poorly understood. Here, we examine the solvent-free unfolding pathway for a group of homologous serum albumins. Utilizing a combination of chemical probes and non-covalent reconstructions, we draw new specific conclusions regarding the unfolding of albumins in the gas-phase, as well as more-general inferences regarding the sensitivity of collision induced unfolding to changes in protein primary and tertiary structure. Our findings suggest that the general unfolding pathway of low charge state albumin ions is largely unaffected by changes in primary structure; however, the stabilities of intermediates along these pathways vary widely as sequences diverge. Additionally, we find that human albumin follows a domain associated unfolding pathway, and are able to assign each unfolded form observed in our gas-phase dataset to the disruption of specific domains within the protein. The totality of our data informs the first detailed mechanism for multi-domain protein unfolding in the gas phase, and highlights key similarities and differences from the known the solution-phase pathway. PMID:27959526

  16. Light-modulated abundance of an mRNA encoding a calmodulin-regulated, chromatin-associated NTPase in pea

    NASA Technical Reports Server (NTRS)

    Hsieh, H. L.; Tong, C. G.; Thomas, C.; Roux, S. J.

    1996-01-01

    A CDNA encoding a 47 kDa nucleoside triphosphatase (NTPase) that is associated with the chromatin of pea nuclei has been cloned and sequenced. The translated sequence of the cDNA includes several domains predicted by known biochemical properties of the enzyme, including five motifs characteristic of the ATP-binding domain of many proteins, several potential casein kinase II phosphorylation sites, a helix-turn-helix region characteristic of DNA-binding proteins, and a potential calmodulin-binding domain. The deduced primary structure also includes an N-terminal sequence that is a predicted signal peptide and an internal sequence that could serve as a bipartite-type nuclear localization signal. Both in situ immunocytochemistry of pea plumules and immunoblots of purified cell fractions indicate that most of the immunodetectable NTPase is within the nucleus, a compartment proteins typically reach through nuclear pores rather than through the endoplasmic reticulum pathway. The translated sequence has some similarity to that of human lamin C, but not high enough to account for the earlier observation that IgG against human lamin C binds to the NTPase in immunoblots. Northern blot analysis shows that the NTPase MRNA is strongly expressed in etiolated plumules, but only poorly or not at all in the leaf and stem tissues of light-grown plants. Accumulation of NTPase mRNA in etiolated seedlings is stimulated by brief treatments with both red and far-red light, as is characteristic of very low-fluence phytochrome responses. Southern blotting with pea genomic DNA indicates the NTPase is likely to be encoded by a single gene.

  17. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. A Stochastic Evolutionary Model for Protein Structure Alignment and Phylogeny

    PubMed Central

    Challis, Christopher J.; Schmidler, Scott C.

    2012-01-01

    We present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model, and mutations follow a standard substitution matrix, whereas backbone atoms diffuse in three-dimensional space according to an Ornstein–Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables phylogenetic inference on time scales not previously attainable with sequence evolution models. The model also provides a tool for testing evolutionary hypotheses and improving our understanding of protein structural evolution. PMID:22723302

  19. IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids.

    PubMed

    Ali, Safdar; Majid, Abdul; Khan, Asifullah

    2014-04-01

    Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.

  20. Hidden Markov models-based system (HMMSPECTR) for detecting structural homologies on the basis of sequential information.

    PubMed

    Tsigelny, Igor; Sharikov, Yuriy; Ten Eyck, Lynn F

    2002-05-01

    HMMSPECTR is a tool for finding putative structural homologs for proteins with known primary sequences. HMMSPECTR contains four major components: a data warehouse with the hidden Markov models (HMM) and alignment libraries; a search program which compares the initial protein sequences with the libraries of HMMs; a secondary structure prediction and comparison program; and a dominant protein selection program that prepares the set of 10-15 "best" proteins from the chosen HMMs. The data warehouse contains four libraries of HMMs. The first two libraries were constructed using different HHM preparation options of the HAMMER program. The third library contains parts ("partial HMM") of initial alignments. The fourth library contains trained HMMs. We tested our program against all of the protein targets proposed in the CASP4 competition. The data warehouse included libraries of structural alignments and HMMs constructed on the basis of proteins publicly available in the Protein Data Bank before the CASP4 meeting. The newest fully automated versions of HMMSPECTR 1.02 and 1.02ss produced better results than the best result reported at CASP4 either by r.m.s.d. or by length (or both) in 64% (HMMSPECTR 1.02) and 79% (HMMSPECTR 1.02ss) of the cases. The improvement is most notable for the targets with complexity 4 (difficult fold recognition cases).

  1. Primary and secondary structural analyses of glutathione S-transferase pi from human placenta.

    PubMed

    Ahmad, H; Wilson, D E; Fritz, R R; Singh, S V; Medh, R D; Nagle, G T; Awasthi, Y C; Kurosky, A

    1990-05-01

    The primary structure of glutathione S-transferase (GST) pi from a single human placenta was determined. The structure was established by chemical characterization of tryptic and cyanogen bromide peptides as well as automated sequence analysis of the intact enzyme. The structural analysis indicated that the protein is comprised of 209 amino acid residues and gave no evidence of post-translational modifications. The amino acid sequence differed from that of the deduced amino acid sequence determined by nucleotide sequence analysis of a cDNA clone (Kano, T., Sakai, M., and Muramatsu, M., 1987, Cancer Res. 47, 5626-5630) at position 104 which contained both valine and isoleucine whereas the deduced sequence from nucleotide sequence analysis identified only isoleucine at this position. These results demonstrated that in the one individual placenta studied at least two GST pi genes are coexpressed, probably as a result of allelomorphism. Computer assisted consensus sequence evaluation identified a hydrophobic region in GST pi (residues 155-181) that was predicted to be either a buried transmembrane helical region or a signal sequence region. The significance of this hydrophobic region was interpreted in relation to the mode of action of the enzyme especially in regard to the potential involvement of a histidine in the active site mechanism. A comparison of the chemical similarity of five known human GST complete enzyme structures, one of pi, one of mu, two of alpha, and one microsomal, gave evidence that all five enzymes have evolved by a divergent evolutionary process after gene duplication, with the microsomal enzyme representing the most divergent form.

  2. Folding and Stabilization of Native-Sequence-Reversed Proteins

    PubMed Central

    Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong

    2016-01-01

    Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844

  3. Folding and Stabilization of Native-Sequence-Reversed Proteins

    NASA Astrophysics Data System (ADS)

    Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong

    2016-04-01

    Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.

  4. Complete Genome Sequence of the Methanococcus maripaludis Type Strain JJ (DSM 2067), a Model for Selenoprotein Synthesis in Archaea.

    PubMed

    Poehlein, Anja; Heym, Daniel; Quitzke, Vivien; Fersch, Julia; Daniel, Rolf; Rother, Michael

    2018-04-05

    Methanococcus maripaludis type strain JJ (DSM 2067) is an important organism because it serves as a model for primary energy metabolism and hydrogenotrophic methanogenesis and is amenable to genetic manipulation. The complete genome (1.7 Mb) harbors 1,815 predicted protein-encoding genes, including 9 encoding selenoproteins. Copyright © 2018 Poehlein et al.

  5. HFE gene mutations in patients with primary iron overload: is there a significant improvement in molecular diagnosis yield with HFE sequencing?

    PubMed

    Santos, Paulo C J L; Pereira, Alexandre C; Cançado, Rodolfo D; Schettert, Isolmar T; Sobreira, Tiago J P; Oliveira, Paulo S L; Hirata, Rosario D C; Hirata, Mario H; Figueiredo, Maria Stella; Chiattone, Carlos S; Krieger, Jose E; Guerra-Shinohara, Elvira M

    2010-12-15

    Rare HFE variants have been shown to be associated with hereditary hemochromatosis (HH), an iron overload disease. The low frequency of the HFE p.C282Y mutation in HH-affected Brazilian patients may suggest that other HFE-related mutations may also be implicated in the pathogenesis of HH in this population. The main aim was to screen for new HFE mutations in Brazilian individuals with primary iron overload and to investigate their relationship with HH. Fifty Brazilian patients with primary iron overload (transferrin saturation>50% in females and 60% in males) were selected. Subsequent bidirectional sequencing for each HFE exon was performed. The effect of HFE mutations on protein structure were analyzed by molecular dynamics simulation and free binding energy calculations. p.C282Y in homozygosis or in heterozygosis with p.H63D were the most frequent genotypic combinations associated with HH in our sample population (present in 17 individuals, 34%). Thirty-six (72.0%) out of the 50 individuals presented at least one HFE mutation. The most frequent genotype associated with HH was the homozygous p.C282Y mutation (n=11, 22.0%). One novel mutation (p.V256I) was indentified in heterozygosis with the p.H63D mutation. In silico modeling analysis of protein behavior indicated that the p.V256I mutation does not reduce the binding affinity between HFE and β2-microglobulin (β2M) in the same way the p.C282Y mutation does compared with the native HFE protein. In conclusion, screening of HFE through direct sequencing, as compared to p.C282Y/p.H63D genotyping, was not able to increase the molecular diagnosis yield of HH. The novel p.V256I mutation could not be implicated in the molecular basis of the HH phenotype, although its role cannot be completely excluded in HH-phenotype development. Our molecular modeling analysis can help in the analysis of novel, previously undescribed, HFE mutations. Copyright © 2010 Elsevier Inc. All rights reserved.

  6. Profiling of integral membrane proteins and their post translational modifications using high-resolution mass spectrometry

    PubMed Central

    Souda, Puneet; Ryan, Christopher M.; Cramer, William A.; Whitelegge, Julian

    2011-01-01

    Integral membrane proteins pose challenges to traditional proteomics approaches due to unique physicochemical properties including hydrophobic transmembrane domains that limit solubility in aqueous solvents. A well resolved intact protein molecular mass profile defines a protein’s native covalent state including post-translational modifications, and is thus a vital measurement toward full structure determination. Both soluble loop regions and transmembrane regions potentially contain post-translational modifications that must be characterized if the covalent primary structure of a membrane protein is to be defined. This goal has been achieved using electrospray-ionization mass spectrometry (ESI-MS) with low-resolution mass analyzers for intact protein profiling, and high-resolution instruments for top-down experiments, toward complete covalent primary structure information. In top-down, the intact protein profile is supplemented by gas-phase fragmentation of the intact protein, including its transmembrane regions, using collisionally activated and/or electroncapture dissociation (CAD/ECD) to yield sequence-dependent high-resolution MS information. Dedicated liquid chromatography systems with aqueous/organic solvent mixtures were developed allowing us to demonstrate that polytopic integral membrane proteins are amenable to ESI-MS analysis, including top-down measurements. Covalent post-translational modifications are localized regardless of their position in transmembrane domains. Top-down measurements provide a more detail oriented high-resolution description of post-transcriptional and post-translational diversity for enhanced understanding beyond genomic translation. PMID:21982782

  7. Characterization of the major proteins of tubers of yam bean (Pachyrhizus ahipa).

    PubMed

    Forsyth, Jane L; Shewry, Peter R

    2002-03-27

    Tubers of six accessions of ahipa (Pachyrhizus ahipa) contained between 0.77 and 1.34% nitrogen on a dry weight basis. This corresponds to 4.8 to 8.4% crude protein based on a nitrogen to protein conversion factor of 6.25; but detailed analysis of AC230 showed that although 93% of the total N was extracted with buffer containing 1.0 M NaCl, about a third of this was lost on dialysis. It was calculated, therefore, that salt-soluble proteins comprise about 60% of the total tuber nitrogen, with low-molecular-mass nitrogenous components comprising a further 30%. Electophoretic analysis of the salt-soluble proteins showed similar patterns of components in the six accessions, with none being present in amounts sufficiently high to suggest a role as storage proteins. Furthermore, light microscopy failed to show significant deposits of protein within the tuber cells. Five "major" protein bands, which together accounted for about 19% of the total salt-soluble protein fraction were purified and subjected to N-terminal amino acid sequencing. Comparison of these with sequences in protein databases revealed similarities to alpha-amylases, chitinases and chitin binding proteins, cysteine proteinases (including major components from P. erosus tubers), a tuberization-specific protein from potato, and proteins induced in soybean and pea by stress or the plant hormone abscisic acid, respectively. It was concluded that the primary roles of these proteins are probably in aspects of tuber metabolism and development and/or conferring protection to pests and pathogens, and that true storage proteins are not present. The absence of storage proteins is consistent with the biological role of the tubers as storage organs for carbohydrates (cf cassava tuberous roots) rather than as propagules (cf yam and potato tubers).

  8. A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.

    PubMed

    Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew

    2012-12-20

    The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Biosynthesis of small proteoglycan II (decorin) by chondrocytes and evidence for a procore protein.

    PubMed

    Sawhney, R S; Hering, T M; Sandell, L J

    1991-05-15

    We have studied the biosynthesis of cartilage dermatan sulfate proteoglycan II (DS-PGII) (decorin) using in vitro translation of mRNA to determine the size of the primary gene product and by radiolabeling the protein in the presence of tunicamycin to inhibit the addition of Asn-linked oligosaccharides. Pulse-chase experiments were performed to examine post-translational processing and secretion. Inhibitors of oligosaccharide processing were used to determine whether DS-PGII molecules containing partially processed oligosaccharides could become proteoglycans and be secreted. Cell-free translation of sucrose gradient-fractionated RNA and subsequent immunoprecipitation of the core protein confirmed that the functional translated mRNA is in the size range of the two mRNA species observed by hybridization of chondrocyte RNA with a bone PGII cloned probe and that the translation product is a single protein with an apparent molecular mass of 42 kDa. Digestion of the intact proteoglycan (average molecular mass = 103 kDa) with chondroitinase ABC or AC results in an approximately 48-49-kDa product. Chondrocytes treated with tunicamycin to inhibit Asn-linked oligosaccharide addition synthesize and secrete a glycosaminoglycan (GAG)-substituted proteoglycan (average molecular mass = 86 kDa), yielding a 42-kDa core protein after chondroitinase ABC digestion, showing that Asn-linked oligosaccharides are not required for the addition of GAG chains or secretion. Following a short pulse (10 min) of [3H]leucine, three glycosylated forms of the DS-PGII core protein were observed, one of which is likely to be the precursor form of PGII predicted by the implied protein sequence of both bovine and human cDNA clones. Following the apparent cleavage of the propeptide, GAG-substituted intracellular core protein is detectable. Susceptibility to endoglycosidase H indicates that approximately one-third of the secreted core protein contains exclusively complex-type Asn-linked oligosaccharides and approximately two-thirds contain high mannose as well as complex-type oligosaccharides. Secreted DS-PGII appears to be fully substituted with three Asn-linked oligosaccharide chains. Inhibitors of oligosaccharide processing, however, permitted secretion of GAG-substituted DS-PGII that was fully (three chains) or incompletely (one or two chains) substituted with partially processed Asn-linked carbohydrate chains. By comparison of chondrocyte DS-PGII with fibroblast DS-PGII, we conclude that the addition and processing of Asn-linked carbohydrate chains are directed by the amino acid sequence of the core protein. The results reported here also suggest that the addition of xylose, the initial step in GAG chain synthesis, occurs early in biosynthesis and is determined by the primary amino acid sequence of the core protein.(ABSTRACT TRUNCATED AT 400 WORDS)

  10. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  11. Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification.

    PubMed

    Wang, LiQiang; Li, CuiFeng

    2014-10-01

    A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.

  12. Rational identification of aggregation hotspots based on secondary structure and amino acid hydrophobicity.

    PubMed

    Matsui, Daisuke; Nakano, Shogo; Dadashipour, Mohammad; Asano, Yasuhisa

    2017-08-25

    Insolubility of proteins expressed in the Escherichia coli expression system hinders the progress of both basic and applied research. Insoluble proteins contain residues that decrease their solubility (aggregation hotspots). Mutating these hotspots to optimal amino acids is expected to improve protein solubility. To date, however, the identification of these hotspots has proven difficult. In this study, using a combination of approaches involving directed evolution and primary sequence analysis, we found two rules to help inductively identify hotspots: the α-helix rule, which focuses on the hydrophobicity of amino acids in the α-helix structure, and the hydropathy contradiction rule, which focuses on the difference in hydrophobicity relative to the corresponding amino acid in the consensus protein. By properly applying these two rules, we succeeded in improving the probability that expressed proteins would be soluble. Our methods should facilitate research on various insoluble proteins that were previously difficult to study due to their low solubility.

  13. Replica exchange molecular dynamics simulation of structure variation from α/4β-fold to 3α-fold protein.

    PubMed

    Lazim, Raudah; Mei, Ye; Zhang, Dawei

    2012-03-01

    Replica exchange molecular dynamics (REMD) simulation provides an efficient conformational sampling tool for the study of protein folding. In this study, we explore the mechanism directing the structure variation from α/4β-fold protein to 3α-fold protein after mutation by conducting REMD simulation on 42 replicas with temperatures ranging from 270 K to 710 K. The simulation began from a protein possessing the primary structure of GA88 but the tertiary structure of GB88, two G proteins with "high sequence identity." Albeit the large Cα-root mean square deviation (RMSD) of the folded protein (4.34 Å at 270 K and 4.75 Å at 304 K), a variation in tertiary structure was observed. Together with the analysis of secondary structure assignment, cluster analysis and principal component, it provides insights to the folding and unfolding pathway of 3α-fold protein and α/4β-fold protein respectively paving the way toward the understanding of the ongoings during conformational variation.

  14. Prediction of rat protein subcellular localization with pseudo amino acid composition based on multiple sequential features.

    PubMed

    Shi, Ruijia; Xu, Cunshuan

    2011-06-01

    The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area.

  15. Application of Tandem Two-Dimensional Mass Spectrometry for Top-Down Deep Sequencing of Calmodulin.

    PubMed

    Floris, Federico; Chiron, Lionel; Lynch, Alice M; Barrow, Mark P; Delsuc, Marc-André; O'Connor, Peter B

    2018-06-04

    Two-dimensional mass spectrometry (2DMS) involves simultaneous acquisition of the fragmentation patterns of all the analytes in a mixture by correlating their precursor and fragment ions by modulating precursor ions systematically through a fragmentation zone. Tandem two-dimensional mass spectrometry (MS/2DMS) unites the ultra-high accuracy of Fourier transform ion cyclotron resonance (FT-ICR) MS/MS and the simultaneous data-independent fragmentation of 2DMS to achieve extensive inter-residue fragmentation of entire proteins. 2DMS was recently developed for top-down proteomics (TDP), and applied to the analysis of calmodulin (CaM), reporting a cleavage coverage of about ~23% using infrared multiphoton dissociation (IRMPD) as fragmentation technique. The goal of this work is to expand the utility of top-down protein analysis using MS/2DMS in order to extend the cleavage coverage in top-down proteomics further into the interior regions of the protein. In this case, using MS/2DMS, the cleavage coverage of CaM increased from ~23% to ~42%. Graphical Abstract Two-dimensional mass spectrometry, when applied to primary fragment ions from the source, allows deep-sequencing of the protein calmodulin.

  16. Self-assembled bionanostructures: proteins following the lead of DNA nanostructures

    PubMed Central

    2014-01-01

    Natural polymers are able to self-assemble into versatile nanostructures based on the information encoded into their primary structure. The structural richness of biopolymer-based nanostructures depends on the information content of building blocks and the available biological machinery to assemble and decode polymers with a defined sequence. Natural polypeptides comprise 20 amino acids with very different properties in comparison to only 4 structurally similar nucleotides, building elements of nucleic acids. Nevertheless the ease of synthesizing polynucleotides with selected sequence and the ability to encode the nanostructural assembly based on the two specific nucleotide pairs underlay the development of techniques to self-assemble almost any selected three-dimensional nanostructure from polynucleotides. Despite more complex design rules, peptides were successfully used to assemble symmetric nanostructures, such as fibrils and spheres. While earlier designed protein-based nanostructures used linked natural oligomerizing domains, recent design of new oligomerizing interaction surfaces and introduction of the platform for topologically designed protein fold may enable polypeptide-based design to follow the track of DNA nanostructures. The advantages of protein-based nanostructures, such as the functional versatility and cost effective and sustainable production methods provide strong incentive for further development in this direction. PMID:24491139

  17. Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

    PubMed Central

    Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

    2014-01-01

    An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841

  18. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    PubMed

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  19. Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.

    PubMed

    Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong

    2009-07-01

    A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.

  20. Topographical localization of the C-terminal region of the voltage-dependent sodium channel from Electrophorus electricus using antibodies raised against a synthetic peptide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gordon, R.D.; Fieles, W.E.; Schotland, D.L.

    1987-01-01

    A peptide corresponding to amino acid residues 1783-1794 near the C terminus of the electric eel sodium channel primary sequence of the eel (Electrophorus electricus) sodium channel has been synthesized and used to raise an antiserum in rabbits. This antiserum specifically recognized the peptide in a solid-phase radioimmunoassay. Specificity of the antiserum for the native channel protein was shown by its specific binding to a 280-kDa protein in immunoblots of eel electroplax membrane proteins. The antiserum also specifically labeled the innervated membrane of the eel electroplax in immunofluorescent studies. The membrane topology of the peptide recognized by this antiserum wasmore » proved in binding studies using oriented electroplax membrane vesicles. These vesicles were 98% right-side-out as determined by (/sup 3/H)saxitoxin binding. Binding of the antipeptide antiserum to this fraction was measured before and after permeabilization with 0.01% saponin. Specific binding to intact vesicles was low, but this binding increased 10-fold after permeabilization, implying a cytoplasmic orientation for the peptide. Confirmation for this orientation was then sought by localizing the antibody bound to intact electroplax cells with immunogold electron microscopy. The data imply that the region of the sodium channel primary sequence near the C terminus that is recognized by the anitserum is localized on the cytoplasmic side of the membrane; this localization provides some further constraints on models of sodium channel tertiary structure.« less

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klenk, Hans-Peter; Held, Brittany; Lucas, Susan

    Saccharomonospora azurea Runmao et al. 1987 is a member to the genomically so far poorly characterized genus Saccharomonospora in the family Pseudonocardiaceae. Members of the genus Sacharomonosoras are of interest because they originate from diverse habitats, such as leaf litter, manure, compost, surface of peat, moist and over-heated grain, where they might play a role in the primary degradation of plant material by attacking hemicellulose. They are Gram-negative staining organisms classified among the usually Gram-positive actinomycetes. Next to S. viridis, S. azurea is only the second member in the genus Saccharomonospora for with a completely sequenced type strain genome willmore » be published. Here we describe the features of this organism, together with the complete genome sequence with project status 'permanent draft', and annotation. The 4,763,832 bp long chromosome with its 4,472 protein-coding and 58 RNA genes was sequenced as part of the DOE funded Community Sequencing Program (CSP) 2010 at the Joint Genome Institute (JGI).« less

  2. Salt-bridging effects on short amphiphilic helical structure and introducing sequence-based short beta-turn motifs.

    PubMed

    Guarracino, Danielle A; Gentile, Kayla; Grossman, Alec; Li, Evan; Refai, Nader; Mohnot, Joy; King, Daniel

    2018-02-01

    Determining the minimal sequence necessary to induce protein folding is beneficial in understanding the role of protein-protein interactions in biological systems, as their three-dimensional structures often dictate their activity. Proteins are generally comprised of discrete secondary structures, from α-helices to β-turns and larger β-sheets, each of which is influenced by its primary structure. Manipulating the sequence of short, moderately helical peptides can help elucidate the influences on folding. We created two new scaffolds based on a modestly helical eight-residue peptide, PT3, we previously published. Using circular dichroism (CD) spectroscopy and changing the possible salt-bridging residues to new combinations of Lys, Arg, Glu, and Asp, we found that our most helical improvements came from the Arg-Glu combination, whereas the Lys-Asp was not significantly different from the Lys-Glu of the parent scaffold, PT3. The marked 3 10 -helical contributions in PT3 were lessened in the Arg-Glu-containing peptide with the beginning of cooperative unfolding seen through a thermal denaturation. However, a unique and unexpected signature was seen for the denaturation of the Lys-Asp peptide which could help elucidate the stages of folding between the 3 10 and α-helix. In addition, we developed a short six-residue peptide with β-turn/sheet CD signature, again to help study minimal sequences needed for folding. Overall, the results indicate that improvements made to short peptide scaffolds by fine-tuning the salt-bridging residues can enhance scaffold structure. Likewise, with the results from the new, short β-turn motif, these can help impact future peptidomimetic designs in creating biologically useful, short, structured β-sheet-forming peptides.

  3. Sarcocystis neurona merozoites express a family of immunogenic surface antigens that are orthologues of the Toxoplasma gondii surface antigens (SAGs) and SAG-related sequences.

    PubMed

    Howe, Daniel K; Gaji, Rajshekhar Y; Mroz-Barrett, Meaghan; Gubbels, Marc-Jan; Striepen, Boris; Stamper, Shelby

    2005-02-01

    Sarcocystis neurona is a member of the Apicomplexa that causes myelitis and encephalitis in horses but normally cycles between the opossum and small mammals. Analysis of an S. neurona expressed sequence tag (EST) database revealed four paralogous proteins that exhibit clear homology to the family of surface antigens (SAGs) and SAG-related sequences of Toxoplasma gondii. The primary peptide sequences of the S. neurona proteins are consistent with the two-domain structure that has been described for the T. gondii SAGs, and each was predicted to have an amino-terminal signal peptide and a carboxyl-terminal glycolipid anchor addition site, suggesting surface localization. All four proteins were confirmed to be membrane associated and displayed on the surface of S. neurona merozoites. Due to their surface localization and homology to T. gondii surface antigens, these S. neurona proteins were designated SnSAG1, SnSAG2, SnSAG3, and SnSAG4. Consistent with their homology, the SnSAGs elicited a robust immune response in infected and immunized animals, and their conserved structure further suggests that the SnSAGs similarly serve as adhesins for attachment to host cells. Whether the S. neurona SAG family is as extensive as the T. gondii SAG family remains unresolved, but it is probable that additional SnSAGs will be revealed as more S. neurona ESTs are generated. The existence of an SnSAG family in S. neurona indicates that expression of multiple related surface antigens is not unique to the ubiquitous organism T. gondii. Instead, the SAG gene family is a common trait that presumably has an essential, conserved function(s).

  4. Mapping the Geometric Evolution of Protein Folding Motor.

    PubMed

    Jerath, Gaurav; Hazam, Prakash Kishore; Shekhar, Shashi; Ramakrishnan, Vibin

    2016-01-01

    Polypeptide chain has an invariant main-chain and a variant side-chain sequence. How the side-chain sequence determines fold in terms of its chemical constitution has been scrutinized extensively and verified periodically. However, a focussed investigation on the directive effect of side-chain geometry may provide important insights supplementing existing algorithms in mapping the geometrical evolution of protein chains and its structural preferences. Geometrically, folding of protein structure may be envisaged as the evolution of its geometric variables: ϕ, and ψ dihedral angles of polypeptide main-chain directed by χ1, and χ2 of side chain. In this work, protein molecule is metaphorically modelled as a machine with 4 rotors ϕ, ψ, χ1 and χ2, with its evolution to the functional fold is directed by combinations of its rotor directions. We observe that differential rotor motions lead to different secondary structure formations and the combinatorial pattern is unique and consistent for particular secondary structure type. Further, we found that combination of rotor geometries of each amino acid is unique which partly explains how different amino acid sequence combinations have unique structural evolution and functional adaptation. Quantification of these amino acid rotor preferences, resulted in the generation of 3 substitution matrices, which later on plugged in the BLAST tool, for evaluating their efficiency in aligning sequences. We have employed BLOSUM62 and PAM30 as standard for primary evaluation. Generation of substitution matrices is a logical extension of the conceptual framework we attempted to build during the development of this work. Optimization of matrices following the conventional routines and possible application with biologically relevant data sets are beyond the scope of this manuscript, though it is a part of the larger project design.

  5. Intramolecular control of transcriptional activity by the NK2-specific domain in NK-2 homeodomain proteins

    PubMed Central

    Watada, Hirotaka; Mirmira, Raghavendra G.; Kalamaras, Julie; German, Michael S.

    2000-01-01

    The developmentally important homeodomain transcription factors of the NK-2 class contain a highly conserved region, the NK2-specific domain (NK2-SD). The function of this domain, however, remains unknown. The primary structure of the NK2-SD suggests that it might function as an accessory DNA-binding domain or as a protein–protein interaction interface. To assess the possibility that the NK2-SD may contribute to DNA-binding specificity, we used a PCR-based approach to identify a consensus DNA-binding sequences for Nkx2.2, an NK-2 family member involved in pancreas and central nervous system development. The consensus sequence (TCTAAGTGAGCTT) is similar to the known binding sequences for other NK-2 homeodomain proteins, but we show that the NK2-SD does not contribute significantly to specific DNA binding to this sequence. To determine whether the NK2-SD contributes to transactivation, we used GAL4-Nkx2.2 fusion constructs to map a powerful transcriptional activation domain in the C-terminal region beyond the conserved NK2-SD. Interestingly, this C-terminal region functions as a transcriptional activator only in the absence of an intact NK2-SD. The NK2-SD also can mask transactivation from the paired homeodomain transcription factor Pax6, but it has no effect on transcription by itself. These results demonstrate that the NK2-SD functions as an intramolecular regulator of the C-terminal activation domain in Nkx2.2 and support a model in which interactions through the NK2-SD regulate the ability of NK-2-class proteins to activate specific genes during development. PMID:10944215

  6. Molecular cloning, sequence identification and tissue expression profile of three novel sheep (Ovis aries) genes - BCKDHA, NAGA and HEXA.

    PubMed

    Liu, G Y; Gao, S Z

    2009-01-01

    The complete coding sequences of three sheep genes- BCKDHA, NAGA and HEXA were amplified using the reverse transcriptase polymerase chain reaction (RT-PCR), based on the conserved sequence information of the mouse or other mammals. The nucleotide sequences of these three genes revealed that the sheep BCKDHA gene encodes a protein of 313 amino acids which has high homology with the BCKDHA gene that encodes a protein of 447 amino acids that has high homology with the Branched chain keto acid dehydrogenase El, alpha polypeptide (BCKDHA) of five species chimpanzee (93%), human (96%), crab-eating macaque (93%), bovine (98%) and mouse (91%). The sheep NAGA gene encodes a protein of 411 amino acids that has high homology with the alpha-N-acetylgalactosaminidase (NAGA) of five species human (85%), bovine (94%), mouse (91%), rat (83%) and chicken (74%). The sheep HEXA gene encodes a protein of 529 amino acids that has high homology with the hexosaminidase A(HEXA) of five species bovine (98%), human (84%), Bornean orangután (84%), rat (80%) and mouse (81%). Finally these three novel sheep genes were assigned to GenelDs: 100145857, 100145858 and 100145856. The phylogenetic tree analysis revealed that the sheep BCKDHA, NAGA, and HEXA all have closer genetic relationships to the BCKDHA, NAGA, and HEXA of bovine. Tissue expression profile analysis was also carried out and results revealed that sheep BCKDHA, NAGA and HEXA genes were differentially expressed in tissues including muscle, heart, liver, fat, kidney, lung, small and large intestine. Our experiment is the first to establish the primary foundation for further research on these three sheep genes.

  7. Construction and characterization of 3A-epitope-tagged foot-and-mouth disease virus.

    PubMed

    Ma, Xueqing; Li, Pinghua; Sun, Pu; Bai, Xingwen; Bao, Huifang; Lu, Zengjun; Fu, Yuanfang; Cao, Yimei; Li, Dong; Chen, Yingli; Qiao, Zilin; Liu, Zaixin

    2015-04-01

    Nonstructural protein 3A of foot-and-mouth disease virus (FMDV) is a partially conserved protein of 153 amino acids (aa) in most FMDVs examined to date. Specific deletion in the FMDV 3A protein has been associated with the inability of FMDV to grow in primary bovine cells and cause disease in cattle. However, the aa residues playing key roles in these processes are poorly understood. In this study, we constructed epitope-tagged FMDVs containing an 8 aa FLAG epitope, a 9 aa haemagglutinin (HA) epitope, and a 10 aa c-Myc epitope to substitute residues 94-101, 93-101, and 93-102 of 3A protein, respectively, using a recently developed O/SEA/Mya-98 FMDV infectious cDNA clone. Immunofluorescence assay (IFA), Western blot and sequence analysis showed that the epitope-tagged viruses stably maintained and expressed the foreign epitopes even after 10 serial passages in BHK-21 cells. The epitope-tagged viruses displayed growth properties and plaque phenotypes similar to those of the parental virus in BHK-21 cells. However, the epitope-tagged viruses exhibited lower growth rates and smaller plaque size phenotypes than those of the parental virus in primary fetal bovine kidney (FBK) cells, but similar growth properties and plaque phenotypes to those of the recombinant viruses harboring 93-102 deletion in 3A. These results demonstrate that the decreased ability of FMDV to replicate in primary bovine cells was not associated with the length of 3A, and the genetic determinant thought to play key role in decreased ability to replicate in primary bovine cells could be reduced from 93-102 residues to 8 aa residues at positions 94-101 in 3A protein. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project.

    PubMed

    Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S

    2015-07-01

    Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke subtype and in African Americans. Replication of protein-coding variants in candidate genes was observed for 2 previously reported GWAS associations: ZFHX3 (cardioembolic stroke) and ABCA1 (large-vessel stroke). Exome sequencing discovered 2 novel genes and mechanisms, PDE4DIP and ACOT4, associated with increased risk for ischemic stroke. In addition, ZFHX3 and ABCA1 were discovered to have protein-coding variants associated with ischemic stroke. These results suggest that genetic variation in novel pathways contributes to ischemic stroke risk and serves as a target for prediction, prevention, and therapy.

  9. Statistical mechanics of simple models of protein folding and design.

    PubMed Central

    Pande, V S; Grosberg, A Y; Tanaka, T

    1997-01-01

    It is now believed that the primary equilibrium aspects of simple models of protein folding are understood theoretically. However, current theories often resort to rather heavy mathematics to overcome some technical difficulties inherent in the problem or start from a phenomenological model. To this end, we take a new approach in this pedagogical review of the statistical mechanics of protein folding. The benefit of our approach is a drastic mathematical simplification of the theory, without resort to any new approximations or phenomenological prescriptions. Indeed, the results we obtain agree precisely with previous calculations. Because of this simplification, we are able to present here a thorough and self contained treatment of the problem. Topics discussed include the statistical mechanics of the random energy model (REM), tests of the validity of REM as a model for heteropolymer freezing, freezing transition of random sequences, phase diagram of designed ("minimally frustrated") sequences, and the degree to which errors in the interactions employed in simulations of either folding and design can still lead to correct folding behavior. Images FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 6 PMID:9414231

  10. UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs

    PubMed Central

    Mignone, Flavio; Grillo, Giorgio; Licciulli, Flavio; Iacono, Michele; Liuni, Sabino; Kersey, Paul J.; Duarte, Jorge; Saccone, Cecilia; Pesole, Graziano

    2005-01-01

    The 5′ and 3′ untranslated regions of eukaryotic mRNAs play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5′ and 3′ untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated (and also collated as the UTRsite database) and cross-links to genomic and protein data are provided. The integration of UTRdb with genomic and protein data has allowed the implementation of a powerful retrieval resource for the selection and extraction of UTR subsets based on their genomic coordinates and/or features of the protein encoded by the relevant mRNA (e.g. GO term, PFAM domain, etc.). All internet resources implemented for retrieval and functional analysis of 5′ and 3′ untranslated regions of eukaryotic mRNAs are accessible at http://www.ba.itb.cnr.it/UTR/. PMID:15608165

  11. Functional and evolutionary relationships between bacteriorhodopsin and halorhodopsin in the archaebacterium, halobacterium halobium

    NASA Technical Reports Server (NTRS)

    Lanyi, J. K.

    1986-01-01

    The archaebacteria occupy a unique place in phylogenetic trees constructed from analyses of sequences from key informational macromolecules, and their study continues to yield interesting ideas on the early evolution and divergence of biological forms. It is now known that the halobacteria among these species contain various retinal-proteins, resembling eukaryotic rhodopsins, but with different functions. Two of these pigments, located in the cytoplasmic membranes of the bacteria, are bacteriorhodopsin (a light-driven proton pump) and halorhodopsin (a light-driven chloride pump). Comparison of these systems is expected to reveal structure/function relationships in these simple (primitive?) energy transducing membrane components and evolutionary relationships which had produced the structural features which allow the divergent functions. Findings indicate that very different primary structures are needed for these proteins to accomplish their different functions. Indeed, analysis of partial amino acid sequences from halo-opsin shows already that few if any long segments exist which are homologous to bacterio-opsin. Either these proteins diverged a very long time ago to allow for the observed differences, or the evolutionary clock in the halobacteria runs faster than usual.

  12. Finding the target sites of RNA-binding proteins

    PubMed Central

    Li, Xiao; Kazan, Hilal; Lipshitz, Howard D; Morris, Quaid D

    2014-01-01

    RNA–protein interactions differ from DNA–protein interactions because of the central role of RNA secondary structure. Some RNA-binding domains (RBDs) recognize their target sites mainly by their shape and geometry and others are sequence-specific but are sensitive to secondary structure context. A number of small- and large-scale experimental approaches have been developed to measure RNAs associated in vitro and in vivo with RNA-binding proteins (RBPs). Generalizing outside of the experimental conditions tested by these assays requires computational motif finding. Often RBP motif finding is done by adapting DNA motif finding methods; but modeling secondary structure context leads to better recovery of RBP-binding preferences. Genome-wide assessment of mRNA secondary structure has recently become possible, but these data must be combined with computational predictions of secondary structure before they add value in predicting in vivo binding. There are two main approaches to incorporating structural information into motif models: supplementing primary sequence motif models with preferred secondary structure contexts (e.g., MEMERIS and RNAcontext) and directly modeling secondary structure recognized by the RBP using stochastic context-free grammars (e.g., CMfinder and RNApromo). The former better reconstruct known binding preferences for sequence-specific RBPs but are not suitable for modeling RBPs that recognize shape and geometry of RNAs. Future work in RBP motif finding should incorporate interactions between multiple RBDs and multiple RBPs in binding to RNA. WIREs RNA 2014, 5:111–130. doi: 10.1002/wrna.1201 PMID:24217996

  13. Intrinsically disordered proteins aggregate at fungal cell-to-cell channels and regulate intercellular connectivity

    PubMed Central

    Lai, Julian; Koh, Chuan Hock; Tjota, Monika; Pieuchot, Laurent; Raman, Vignesh; Chandrababu, Karthik Balakrishna; Yang, Daiwen; Wong, Limsoon; Jedd, Gregory

    2012-01-01

    Like animals and plants, multicellular fungi possess cell-to-cell channels (septal pores) that allow intercellular communication and transport. Here, using a combination of MS of Woronin body-associated proteins and a bioinformatics approach that identifies related proteins based on composition and character, we identify 17 septal pore-associated (SPA) proteins that localize to the septal pore in rings and pore-centered foci. SPA proteins are not homologous at the primary sequence level but share overall physical properties with intrinsically disordered proteins. Some SPA proteins form aggregates at the septal pore, and in vitro assembly assays suggest aggregation through a nonamyloidal mechanism involving mainly α-helical and disordered structures. SPA loss-of-function phenotypes include excessive septation, septal pore degeneration, and uncontrolled Woronin body activation. Together, our data identify the septal pore as a complex subcellular compartment and focal point for the assembly of unstructured proteins controlling diverse aspects of intercellular connectivity. PMID:22955885

  14. Cellular Assays for Ferredoxins: A Strategy for Understanding Electron Flow through Protein Carriers That Link Metabolic Pathways.

    PubMed

    Atkinson, Joshua T; Campbell, Ian; Bennett, George N; Silberg, Jonathan J

    2016-12-27

    The ferredoxin (Fd) protein family is a structurally diverse group of iron-sulfur proteins that function as electron carriers, linking biochemical pathways important for energy transduction, nutrient assimilation, and primary metabolism. While considerable biochemical information about individual Fd protein electron carriers and their reactions has been acquired, we cannot yet anticipate the proportion of electrons shuttled between different Fd-partner proteins within cells using biochemical parameters that govern electron flow, such as holo-Fd concentration, midpoint potential (driving force), molecular interactions (affinity and kinetics), conformational changes (allostery), and off-pathway electron leakage (chemical oxidation). Herein, we describe functional and structural gaps in our Fd knowledge within the context of a sequence similarity network and phylogenetic tree, and we propose a strategy for improving our understanding of Fd sequence-function relationships. We suggest comparing the functions of divergent Fds within cells whose growth, or other measurable output, requires electron transfer between defined electron donor and acceptor proteins. By comparing Fd-mediated electron transfer with biochemical parameters that govern electron flow, we posit that models that anticipate energy flow across Fd interactomes can be built. This approach is expected to transform our ability to anticipate Fd control over electron flow in cellular settings, an obstacle to the construction of synthetic electron transfer pathways and rational optimization of existing energy-conserving pathways.

  15. Phylogeny of the Vitamin K 2,3-Epoxide Reductase (VKOR) Family and Evolutionary Relationship to the Disulfide Bond Formation Protein B (DsbB) Family

    PubMed Central

    Bevans, Carville G.; Krettler, Christoph; Reinhart, Christoph; Watzka, Matthias; Oldenburg, Johannes

    2015-01-01

    In humans and other vertebrate animals, vitamin K 2,3-epoxide reductase (VKOR) family enzymes are the gatekeepers between nutritionally acquired K vitamins and the vitamin K cycle responsible for posttranslational modifications that confer biological activity upon vitamin K-dependent proteins with crucial roles in hemostasis, bone development and homeostasis, hormonal carbohydrate regulation and fertility. We report a phylogenetic analysis of the VKOR family that identifies five major clades. Combined phylogenetic and site-specific conservation analyses point to clade-specific similarities and differences in structure and function. We discovered a single-site determinant uniquely identifying VKOR homologs belonging to human pathogenic, obligate intracellular prokaryotes and protists. Building on previous work by Sevier et al. (Protein Science 14:1630), we analyzed structural data from both VKOR and prokaryotic disulfide bond formation protein B (DsbB) families and hypothesize an ancient evolutionary relationship between the two families where one family arose from the other through a gene duplication/deletion event. This has resulted in circular permutation of primary sequence threading through the four-helical bundle protein folds of both families. This is the first report of circular permutation relating distant α-helical membrane protein sequences and folds. In conclusion, we suggest a chronology for the evolution of the five extant VKOR clades. PMID:26230708

  16. Molecular and immunological characterization of subtilisin like serine protease, a major allergen of Curvularia lunata.

    PubMed

    Tripathi, Prabhanshu; Nair, Smitha; Singh, B P; Arora, Naveen

    2011-03-01

    Serine protease from numerous sources have been identified and characterized as major allergens. The present study aimed to clone, express and characterize a serine protease from Curvularia lunata. cDNA library screening identified partial protease clones. A clone showed significant homology to subtilisin like serine proteases from Aspergillus and Penicillium species. Full length sequence was generated by RACE PCR, subcloned in pET vector, protein expressed in Escherichia coli and purified from inclusion bodies yielding 0.5 mg/L of culture. Bioinformatic analysis identified serine protease motifs of subtilase family, catalytic triad and N-glycosylation sites on the primary sequence. The protein resolved at 54-kDa on SDS-PAGE and was recognized as a major allergen on immunoblot with 13/16 C. lunata sensitive patients' sera in ELISA and immunoblot. Recombinant protein reacted with rabbit polyclonal antibodies against alkaline serine proteases from C. lunata. Recombinant protein required 50-56 ng of same protein for 50% inhibition of IgE binding in competitive ELISA. In addition, 13 of 16 patients' samples showed significant basophil histamine release upon stimulation with purified recombinant protein. In conclusion, a 54 kDa major allergen of C. lunata was cloned, expressed, characterized and showed biological activity. It has potential to be used in molecule based approach for allergy diagnosis and therapy. Copyright © 2010 Elsevier GmbH. All rights reserved.

  17. Phylogeny of the Vitamin K 2,3-Epoxide Reductase (VKOR) Family and Evolutionary Relationship to the Disulfide Bond Formation Protein B (DsbB) Family.

    PubMed

    Bevans, Carville G; Krettler, Christoph; Reinhart, Christoph; Watzka, Matthias; Oldenburg, Johannes

    2015-07-29

    In humans and other vertebrate animals, vitamin K 2,3-epoxide reductase (VKOR) family enzymes are the gatekeepers between nutritionally acquired K vitamins and the vitamin K cycle responsible for posttranslational modifications that confer biological activity upon vitamin K-dependent proteins with crucial roles in hemostasis, bone development and homeostasis, hormonal carbohydrate regulation and fertility. We report a phylogenetic analysis of the VKOR family that identifies five major clades. Combined phylogenetic and site-specific conservation analyses point to clade-specific similarities and differences in structure and function. We discovered a single-site determinant uniquely identifying VKOR homologs belonging to human pathogenic, obligate intracellular prokaryotes and protists. Building on previous work by Sevier et al. (Protein Science 14:1630), we analyzed structural data from both VKOR and prokaryotic disulfide bond formation protein B (DsbB) families and hypothesize an ancient evolutionary relationship between the two families where one family arose from the other through a gene duplication/deletion event. This has resulted in circular permutation of primary sequence threading through the four-helical bundle protein folds of both families. This is the first report of circular permutation relating distant a-helical membrane protein sequences and folds. In conclusion, we suggest a chronology for the evolution of the five extant VKOR clades.

  18. cDNA, genomic sequence cloning and overexpression of ribosomal protein S25 gene (RPS25) from the Giant Panda.

    PubMed

    Hao, Yan-Zhe; Hou, Wan-Ru; Hou, Yi-Ling; Du, Yu-Jie; Zhang, Tian; Peng, Zheng-Song

    2009-11-01

    RPS25 is a component of the 40S small ribosomal subunit encoded by RPS25 gene, which is specific to eukaryotes. Studies in reference to RPS25 gene from animals were handful. The Giant Panda (Ailuropoda melanoleuca), known as a "living fossil", are increasingly concerned by the world community. Studies on RPS25 of the Giant Panda could provide scientific data for inquiring into the hereditary traits of the gene and formulating the protective strategy for the Giant Panda. The cDNA of the RPS25 cloned from Giant Panda is 436 bp in size, containing an open reading frame of 378 bp encoding 125 amino acids. The length of the genomic sequence is 1,992 bp, which was found to possess four exons and three introns. Alignment analysis indicated that the nucleotide sequence of the coding sequence shows a high homology to those of Homo sapiens, Bos taurus, Mus musculus and Rattus norvegicus as determined by Blast analysis, 92.6, 94.4, 89.2 and 91.5%, respectively. Primary structure analysis revealed that the molecular weight of the putative RPS25 protein is 13.7421 kDa with a theoretical pI 10.12. Topology prediction showed there is one N-glycosylation site, one cAMP and cGMP-dependent protein kinase phosphorylation site, two Protein kinase C phosphorylation sites and one Tyrosine kinase phosphorylation site in the RPS25 protein of the Giant Panda. The RPS25 gene was overexpressed in E. coli BL21 and Western Blotting of the RPS25 protein was also done. The results indicated that the RPS25 gene can be really expressed in E. coli and the RPS25 protein fusioned with the N-terminally his-tagged form gave rise to the accumulation of an expected 17.4 kDa polypeptide. The cDNA and the genomic sequence of RPS25 were cloned successfully for the first time from the Giant Panda using RT-PCR technology and Touchdown-PCR, respectively, which were both sequenced and analyzed preliminarily; then the cDNA of the RPS25 gene was overexpressed in E. coli BL21 and immunoblotted, which is the first report on the RPS25 gene from the Giant Panda. The data will enrich and supplement the information about RPS25, which will contribute to the protection for gene resources and the discussion of the genetic polymorphism.

  19. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Optics of Spider Sticky Orb Webs

    DTIC Science & Technology

    2011-01-01

    biopolymer which is almost exclusively protein with repeated sequences of the amino acids glycine and alanine [16]. The capture silk is spiraled...Herberstein, M. E., Craig, C. L. and Separovic, F., "Solid-state NMR relaxation studies of Australian spider silks ", Biopolymers 61, 287-297 (2002). [17... silks , scattering from rough/structured surfaces and thin film effects as the primary causes. We report systematic studies carried out using the

  1. Molecular cloning and functional expression of allergenic sarcoplasmic calcium-binding proteins from Penaeus shrimps.

    PubMed

    Mita, Hajime; Koketsu, Aiko; Ishizaki, Shoichiro; Shiomi, Kazuo

    2013-05-01

    Sarcoplasmic calcium-binding proteins (SCPs) have recently been identified as crustacean allergens. However, information on their primary structures is very limited and no recombinant SCP (rSCP) as an alternative of natural SCP (nSCP) is available. This study was aimed to elucidate primary structures of SCPs from two species of Penaeus shrimp (black tiger shrimp and kuruma shrimp) by cDNA cloning and to produce a black tiger shrimp rSCP preparation that is comparable in IgE reactivity to nSCP. The full-length cDNAs encoding black tiger shrimp and kuruma shrimp SCPs were successfully cloned. Both SCPs are composed of 193 amino acid residues and share more than 80% sequence identity with the known crustacean SCPs. The black tiger shrimp SCP was then expressed in Escherichia coli using the pFN6A (HQ) Flexi vector system. Enzyme-linked immunosorbent assay (ELISA) and inhibition ELISA experiments demonstrated that rSCP has the same IgE reactivity as nSCP. Our results provide further evidence for the high sequence identity among crustacean SCPs. In addition, rSCP will be a useful tool in studying crustacean allergens and also in the diagnosis of crustacean allergy. © 2012 Society of Chemical Industry.

  2. Modular protein domains: an engineering approach toward functional biomaterials.

    PubMed

    Lin, Charng-Yu; Liu, Julie C

    2016-08-01

    Protein domains and peptide sequences are a powerful tool for conferring specific functions to engineered biomaterials. Protein sequences with a wide variety of functionalities, including structure, bioactivity, protein-protein interactions, and stimuli responsiveness, have been identified, and advances in molecular biology continue to pinpoint new sequences. Protein domains can be combined to make recombinant proteins with multiple functionalities. The high fidelity of the protein translation machinery results in exquisite control over the sequence of recombinant proteins and the resulting properties of protein-based materials. In this review, we discuss protein domains and peptide sequences in the context of functional protein-based materials, composite materials, and their biological applications. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Rapid comparison of properties on protein surface

    PubMed Central

    Sael, Lee; La, David; Li, Bin; Rustamov, Raif; Kihara, Daisuke

    2008-01-01

    The mapping of physicochemical characteristics onto the surface of a protein provides crucial insights into its function and evolution. This information can be further used in the characterization and identification of similarities within protein surface regions. We propose a novel method which quantitatively compares global and local properties on the protein surface. We have tested the method on comparison of electrostatic potentials and hydrophobicity. The method is based on 3D Zernike descriptors, which provides a compact representation of a given property defined on a protein surface. Compactness and rotational invariance of this descriptor enable fast comparison suitable for database searches. The usefulness of this method is exemplified by studying several protein families including globins, thermophilic and mesophilic proteins, and active sites of TIM β/α barrel proteins. In all the cases studied, the descriptor is able to cluster proteins into functionally relevant groups. The proposed approach can also be easily extended to other surface properties. This protein surface-based approach will add a new way of viewing and comparing proteins to conventional methods, which compare proteins in terms of their primary sequence or tertiary structure. PMID:18618695

  4. Rapid comparison of properties on protein surface.

    PubMed

    Sael, Lee; La, David; Li, Bin; Rustamov, Raif; Kihara, Daisuke

    2008-10-01

    The mapping of physicochemical characteristics onto the surface of a protein provides crucial insights into its function and evolution. This information can be further used in the characterization and identification of similarities within protein surface regions. We propose a novel method which quantitatively compares global and local properties on the protein surface. We have tested the method on comparison of electrostatic potentials and hydrophobicity. The method is based on 3D Zernike descriptors, which provides a compact representation of a given property defined on a protein surface. Compactness and rotational invariance of this descriptor enable fast comparison suitable for database searches. The usefulness of this method is exemplified by studying several protein families including globins, thermophilic and mesophilic proteins, and active sites of TIM beta/alpha barrel proteins. In all the cases studied, the descriptor is able to cluster proteins into functionally relevant groups. The proposed approach can also be easily extended to other surface properties. This protein surface-based approach will add a new way of viewing and comparing proteins to conventional methods, which compare proteins in terms of their primary sequence or tertiary structure.

  5. piRNA biogenesis during adult spermatogenesis in mice is independent of the ping-pong mechanism.

    PubMed

    Beyret, Ergin; Liu, Na; Lin, Haifan

    2012-10-01

    piRNAs, a class of small non-coding RNAs associated with PIWI proteins, have broad functions in germline development, transposon silencing, and epigenetic regulation. In diverse organisms, a subset of piRNAs derived from repeat sequences are produced via the interplay between two PIWI proteins. This mechanism, termed "ping-pong" cycle, operates among the PIWI proteins of the primordial mouse testis; however, its involvement in postnatal testes remains elusive. Here we show that adult testicular piRNAs are produced independent of the ping-pong mechanism. We identified and characterized large populations of piRNAs in the adult and postnatal developing testes associated with MILI and MIWI, the only PIWI proteins detectable in these testes. No interaction between MILI and MIWI or sequence feature for the ping-pong mechanism among their piRNAs was detected in the adult testis. The majority of MILI- and MIWI-associated piRNAs originate from the same DNA strands within the same loci. Both populations of piRNAs are biased for 5' Uracil but not for Adenine on the 10th nucleotide position, and display no complementarity. Furthermore, in Miwi mutants, MILI-associated piRNAs are not downregulated, but instead upregulated. These results indicate that the adult testicular piRNAs are predominantly, if not exclusively, produced by a primary processing mechanism instead of the ping-pong mechanism. In this primary pathway, biogenesis of MILI- and MIWI-associated piRNAs may compete for the same precursors; the types of piRNAs produced tend to be non-selectively dictated by the available precursors in the cell; and precursors with introns tend to be spliced before processed into piRNAs.

  6. p19-targeted ABD-derived protein variants inhibit IL-23 binding and exert suppressive control over IL-23-stimulated expansion of primary human IL-17+ T-cells.

    PubMed

    Křížová, Lucie; Kuchař, Milan; Petroková, Hana; Osička, Radim; Hlavničková, Marie; Pelák, Ondřej; Černý, Jiří; Kalina, Tomáš; Malý, Petr

    2017-03-01

    Interleukin-23 (IL-23), a heterodimeric cytokine of covalently bound p19 and p40 proteins, has recently been closely associated with development of several chronic autoimmune diseases such as psoriasis, psoriatic arthritis or inflammatory bowel disease. Released by activated dendritic cells, IL-23 interacts with IL-23 receptor (IL-23R) on Th17 cells, thus promoting intracellular signaling, a pivotal step in Th17-driven pro-inflammatory axis. Here, we aimed to block the binding of IL-23 cytokine to its cell-surface receptor by novel inhibitory protein binders targeted to the p19 subunit of human IL-23. To this goal, we used a combinatorial library derived from a scaffold of albumin-binding domain (ABD) of streptococcal protein G, and ribosome display selection, to yield a collection of ABD-derived p19-targeted variants, called ILP binders. From 214 clones analyzed by ELISA, Western blot and DNA sequencing, 53 provided 35 different sequence variants that were further characterized. Using in silico docking in combination with cell-surface competition binding assay, we identified a group of inhibitory candidates that substantially diminished binding of recombinant p19 to the IL-23R on human monocytic THP-1 cells. Of these best p19-blockers, ILP030, ILP317 and ILP323 inhibited IL-23-driven expansion of IL-17-producing primary human CD4 +  T-cells. Thus, these novel binders represent unique IL-23-targeted probes useful for IL-23/IL-23R epitope mapping studies and could be used for designing novel p19/IL-23-targeted anti-inflammatory biologics.

  7. Natural Variation of Epstein-Barr Virus Genes, Proteins, and Primary MicroRNA.

    PubMed

    Correia, Samantha; Palser, Anne; Elgueta Karstegl, Claudio; Middeldorp, Jaap M; Ramayanti, Octavia; Cohen, Jeffrey I; Hildesheim, Allan; Fellner, Maria Dolores; Wiels, Joelle; White, Robert E; Kellam, Paul; Farrell, Paul J

    2017-08-01

    Viral gene sequences from an enlarged set of about 200 Epstein-Barr virus (EBV) strains, including many primary isolates, have been used to investigate variation in key viral genetic regions, particularly LMP1, Zp, gp350, EBNA1, and the BART microRNA (miRNA) cluster 2. Determination of type 1 and type 2 EBV in saliva samples from people from a wide range of geographic and ethnic backgrounds demonstrates a small percentage of healthy white Caucasian British people carrying predominantly type 2 EBV. Linkage of Zp and gp350 variants to type 2 EBV is likely to be due to their genes being adjacent to the EBNA3 locus, which is one of the major determinants of the type 1/type 2 distinction. A novel classification of EBNA1 DNA binding domains, named QCIGP, results from phylogeny analysis of their protein sequences but is not linked to the type 1/type 2 classification. The BART cluster 2 miRNA region is classified into three major variants through single-nucleotide polymorphisms (SNPs) in the primary miRNA outside the mature miRNA sequences. These SNPs can result in altered levels of expression of some miRNAs from the BART variant frequently present in Chinese and Indonesian nasopharyngeal carcinoma (NPC) samples. The EBV genetic variants identified here provide a basis for future, more directed analysis of association of specific EBV variations with EBV biology and EBV-associated diseases. IMPORTANCE Incidence of diseases associated with EBV varies greatly in different parts of the world. Thus, relationships between EBV genome sequence variation and health, disease, geography, and ethnicity of the host may be important for understanding the role of EBV in diseases and for development of an effective EBV vaccine. This paper provides the most comprehensive analysis so far of variation in specific EBV genes relevant to these diseases and proposed EBV vaccines. By focusing on variation in LMP1, Zp, gp350, EBNA1, and the BART miRNA cluster 2, new relationships with the known type 1/type 2 strains are demonstrated, and a novel classification of EBNA1 and the BART miRNAs is proposed. Copyright © 2017 Correia et al.

  8. DNA-Catalyzed Amide Hydrolysis.

    PubMed

    Zhou, Cong; Avins, Joshua L; Klauser, Paul C; Brandsen, Benjamin M; Lee, Yujeong; Silverman, Scott K

    2016-02-24

    DNA catalysts (deoxyribozymes) for a variety of reactions have been identified by in vitro selection. However, for certain reactions this identification has not been achieved. One important example is DNA-catalyzed amide hydrolysis, for which a previous selection experiment instead led to DNA-catalyzed DNA phosphodiester hydrolysis. Subsequent efforts in which the selection strategy deliberately avoided phosphodiester hydrolysis led to DNA-catalyzed ester and aromatic amide hydrolysis, but aliphatic amide hydrolysis has been elusive. In the present study, we show that including modified nucleotides that bear protein-like functional groups (any one of primary amino, carboxyl, or primary hydroxyl) enables identification of amide-hydrolyzing deoxyribozymes. In one case, the same deoxyribozyme sequence without the modifications still retains substantial catalytic activity. Overall, these findings establish the utility of introducing protein-like functional groups into deoxyribozymes for identifying new catalytic function. The results also suggest the longer-term feasibility of deoxyribozymes as artificial proteases.

  9. Long-range comparison of human and mouse Sprr loci to identify conserved noncoding sequences involved in coordinate regulation

    PubMed Central

    Martin, Natalia; Patel, Satyakam; Segre, Julia A.

    2004-01-01

    Mammalian epidermis provides a permeability barrier between an organism and its environment. Under homeostatic conditions, epidermal cells produce structural proteins, which are cross-linked in an orderly fashion to form a cornified envelope (CE). However, under genetic or environmental stress, specific genes are induced to rapidly build a temporary barrier. Small proline-rich (SPRR) proteins are the primary constituents of the CE. Under stress the entire family of 14 Sprr genes is upregulated. The Sprr genes are clustered within the larger epidermal differentiation complex on mouse chromosome 3, human chromosome 1q21. The clustering of the Sprr genes and their upregulation under stress suggest that these genes may be coordinately regulated. To identify enhancer elements that regulate this stress response activation of the Sprr locus, we utilized bioinformatic tools and classical biochemical dissection. Long-range comparative sequence analysis identified conserved noncoding sequences (CNSs). Clusters of epidermal-specific DNaseI-hypersensitive sites (HSs) mapped to specific CNSs. Increased prevalence of these HSs in barrier-deficient epidermis provides in vivo evidence of the regulation of the Sprr locus by these conserved sequences. Individual components of these HSs were cloned, and one was shown to have strong enhancer activity specific to conditions when the Sprr genes are coordinately upregulated. PMID:15574822

  10. Genome-wide identification of pathogenicity factors of the free-living amoeba Naegleria fowleri.

    PubMed

    Zysset-Burri, Denise C; Müller, Norbert; Beuret, Christian; Heller, Manfred; Schürch, Nadia; Gottstein, Bruno; Wittwer, Matthias

    2014-06-19

    The free-living amoeba Naegleria fowleri is the causative agent of the rapidly progressing and typically fatal primary amoebic meningoencephalitis (PAM) in humans. Despite the devastating nature of this disease, which results in > 97% mortality, knowledge of the pathogenic mechanisms of the amoeba is incomplete. This work presents a comparative proteomic approach based on an experimental model in which the pathogenic potential of N. fowleri trophozoites is influenced by the compositions of different media. As a scaffold for proteomic analysis, we sequenced the genome and transcriptome of N. fowleri. Since the sequence similarity of the recently published genome of Naegleria gruberi was far lower than the close taxonomic relationship of these species would suggest, a de novo sequencing approach was chosen. After excluding cell regulatory mechanisms originating from different media compositions, we identified 22 proteins with a potential role in the pathogenesis of PAM. Functional annotation of these proteins revealed, that the membrane is the major location where the amoeba exerts its pathogenic potential, possibly involving actin-dependent processes such as intracellular trafficking via vesicles. This study describes for the first time the 30 Mb-genome and the transcriptome sequence of N. fowleri and provides the basis for the further definition of effective intervention strategies against the rare but highly fatal form of amoebic meningoencephalitis.

  11. Aplysia attractin: biophysical characterization and modeling of a water-borne pheromone.

    PubMed Central

    Schein, C H; Nagle, G T; Page, J S; Sweedler, J V; Xu, Y; Painter, S D; Braun, W

    2001-01-01

    Attractin, a 58-residue protein secreted by the mollusk Aplysia californica, stimulates sexually mature animals to approach egg cordons. Attractin from five different Aplysia species are approximately 40% identical in sequence. Recombinant attractin, expressed in insect cells and purified by reverse-phase high-performance liquid chromatography (RP-HPLC), is active in a bioassay using A. brasiliana; its circular dichroism (CD) spectrum indicates a predominantly alpha-helical structure. Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) characterization of proteolytic fragments identified disulfide bonds between the six conserved cysteines (I-VI, II-V, III-IV, where the Roman numeral indicates the order of occurrence in the primary sequence). Attractin has no significant similarity to any other sequence in the database. The protozoan Euplotes pheromones were selected by fold recognition as possible templates. These diverse proteins have three alpha-helices, with six cysteine residues disulfide-bonded in a different pattern from attractin. Model structures with good stereochemical parameters were prepared using the EXDIS/DIAMOD/FANTOM program suite and constraints based on sequence alignments with the Euplotes templates and the attractin disulfide bonds. A potential receptor-binding site is suggested based on these data. Future structural characterization of attractin will be needed to confirm these models. PMID:11423429

  12. Characterization of constitutive and putative differentially expressed mRNAs by means of expressed sequence tags, differential display reverse transcriptase-PCR and randomly amplified polymorphic DNA-PCR from the sand fly vector Lutzomyia longipalpis.

    PubMed

    Ramalho-Ortigão, J M; Temporal, P; de Oliveira , S M; Barbosa, A F; Vilela, M L; Rangel, E F; Brazil, R P; Traub-Cseko, Y M

    2001-01-01

    Molecular studies of insect disease vectors are of paramount importance for understanding parasite-vector relationship. Advances in this area have led to important findings regarding changes in vectors' physiology upon blood feeding and parasite infection. Mechanisms for interfering with the vectorial capacity of insects responsible for the transmission of diseases such as malaria, Chagas disease and dengue fever are being devised with the ultimate goal of developing transgenic insects. A primary necessity for this goal is information on gene expression and control in the target insect. Our group is investigating molecular aspects of the interaction between Leishmania parasites and Lutzomyia sand flies. As an initial step in our studies we have used random sequencing of cDNA clones from two expression libraries made from head/thorax and abdomen of sugar fed L. longipalpis for the identification of expressed sequence tags (EST). We applied differential display reverse transcriptase-PCR and randomly amplified polymorphic DNA-PCR to characterize differentially expressed mRNA from sugar and blood fed insects, and, in one case, from a L. (V.) braziliensis-infected L. longipalpis. We identified 37 cDNAs that have shown homology to known sequences from GeneBank. Of these, 32 cDNAs code for constitutive proteins such as zinc finger protein, glutamine synthetase, G binding protein, ubiquitin conjugating enzyme. Three are putative differentially expressed cDNAs from blood fed and Leishmania-infected midgut, a chitinase, a V-ATPase and a MAP kinase. Finally, two sequences are homologous to Drosophila melanogaster gene products recently discovered through the Drosophila genome initiative.

  13. Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins.

    PubMed

    Bandeira, Nuno; Clauser, Karl R; Pevzner, Pavel A

    2007-07-01

    Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.

  14. Efficient CRISPR-mediated mutagenesis in primary immune cells using CrispRGold and a C57BL/6 Cas9 transgenic mouse line.

    PubMed

    Chu, Van Trung; Graf, Robin; Wirtz, Tristan; Weber, Timm; Favret, Jeremy; Li, Xun; Petsch, Kerstin; Tran, Ngoc Tung; Sieweke, Michael H; Berek, Claudia; Kühn, Ralf; Rajewsky, Klaus

    2016-11-01

    Applying clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated protein 9 (Cas9)-mediated mutagenesis to primary mouse immune cells, we used high-fidelity single guide RNAs (sgRNAs) designed with an sgRNA design tool (CrispRGold) to target genes in primary B cells, T cells, and macrophages isolated from a Cas9 transgenic mouse line. Using this system, we achieved an average knockout efficiency of 80% in B cells. On this basis, we established a robust small-scale CRISPR-mediated screen in these cells and identified genes essential for B-cell activation and plasma cell differentiation. This screening system does not require deep sequencing and may serve as a precedent for the application of CRISPR/Cas9 to primary mouse cells.

  15. Crystal structure and confirmation of the alanine:glyoxylate aminotransferase activity of the YFL030w yeast protein.

    PubMed

    Meyer, Philippe; Liger, Dominique; Leulliot, Nicolas; Quevillon-Cheruel, Sophie; Zhou, Cong-Zhao; Borel, Franck; Ferrer, Jean-Luc; Poupon, Anne; Janin, Joël; van Tilbeurgh, Herman

    2005-12-01

    We have determined the three-dimensional crystal structure of the protein encoded by the open reading frame YFL030w from Saccharomyces cerevisiae to a resolution of 2.6 A using single wavelength anomalous diffraction. YFL030w is a 385 amino-acid protein with sequence similarity to the aminotransferase family. The structure of the protein reveals a homodimer adopting the fold-type I of pyridoxal 5'-phosphate (PLP)-dependent aminotransferases. The PLP co-factor is covalently bound to the active site in the crystal structure. The protein shows close structural resemblance with the human alanine:glyoxylate aminotransferase (EC 2.6.1.44), an enzyme involved in the hereditary kidney stone disease primary hyperoxaluria type 1. In this paper we show that YFL030w codes for an alanine:glyoxylate aminotransferase, highly specific for its amino donor and acceptor substrates.

  16. Geary autocorrelation and DCCA coefficient: Application to predict apoptosis protein subcellular localization via PSSM

    NASA Astrophysics Data System (ADS)

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2017-02-01

    Apoptosis is a fundamental process controlling normal tissue homeostasis by regulating a balance between cell proliferation and death. Predicting subcellular location of apoptosis proteins is very helpful for understanding its mechanism of programmed cell death. Prediction of apoptosis protein subcellular location is still a challenging and complicated task, and existing methods mainly based on protein primary sequences. In this paper, we propose a new position-specific scoring matrix (PSSM)-based model by using Geary autocorrelation function and detrended cross-correlation coefficient (DCCA coefficient). Then a 270-dimensional (270D) feature vector is constructed on three widely used datasets: ZD98, ZW225 and CL317, and support vector machine is adopted as classifier. The overall prediction accuracies are significantly improved by rigorous jackknife test. The results show that our model offers a reliable and effective PSSM-based tool for prediction of apoptosis protein subcellular localization.

  17. Detrended cross-correlation coefficient: Application to predict apoptosis protein subcellular localization.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2016-12-01

    Apoptosis, or programed cell death, plays a central role in the development and homeostasis of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful for understanding the apoptosis mechanism. The prediction of subcellular localization of an apoptosis protein is still a challenging task, and existing methods mainly based on protein primary sequences. In this paper, we introduce a new position-specific scoring matrix (PSSM)-based method by using detrended cross-correlation (DCCA) coefficient of non-overlapping windows. Then a 190-dimensional (190D) feature vector is constructed on two widely used datasets: CL317 and ZD98, and support vector machine is adopted as classifier. To evaluate the proposed method, objective and rigorous jackknife cross-validation tests are performed on the two datasets. The results show that our approach offers a novel and reliable PSSM-based tool for prediction of apoptosis protein subcellular localization. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Searching for Potential Silicon-associated Genes in Cyanobacteria

    NASA Astrophysics Data System (ADS)

    Collier, J.; Brzezinski, M. A.; Baines, S. B.; Krause, J. W.; Ohnemus, D.; Twining, B. S.

    2016-02-01

    Recent studies have demonstrated the accumulation of Si in both wild cells and laboratory cultures of marine Synechococcus. Because of their abundance, the cellular Si quotas measured are sufficient to suggest a substantial, unrecognized role for these organisms in the marine Si cycle. Since there is no known role for Si in cyanobacteria, we are using sequenced cyanobacterial genomes to search for pathways of Si metabolism known from other organisms. Si transporters belonging to four different protein superfamilies have been identified in diverse Si-metabolizing organisms, including diatoms and other protists, plants, bacteria, and sponges. A homolog of ArsB/Lsi2, the arsenite-antimonite efflux porter that can also transport silicate in plants, can be found in many cyanobacteria. However, we have been unable to identify likely influx porter homologs in cyanobacteria, except for predicted proteins with similarity to diatom SIT but only half the length, as well as a few atypical members of the Major Intrinsic Protein (aquaporin) superfamily. Proteins catalyzing and/or controlling the polymerization of silica have been identified in diatoms and sponges. We have been unable to identify clear homologs of these proteins in cyanobacteria, although cathepsins (belonging to the same protein superfamily as silicateins) are broadly present in cyanobacteria. Proteins that may bind silica in other bacteria (CotB in Bacillus) also lack clear homologs in cyanobacteria. However, since the function of these proteins may depend largely on charge and protein folding characteristics, proteins involved in Si deposition may not be readily identifiable by primary sequence similarity. The broad diversity of proteins involved in Si metabolism in diverse organisms suggests that each had an independent evolutionary origin. Our results suggest that if Si-associated proteins exist in Synechococcus, they also may have a distinct evolutionary origin unrelated to known Si metabolic pathways.

  19. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  20. Functional characterisation of the Schizosaccharomyces pombe homologue of the leukaemia-associated translocation breakpoint binding protein translin and its binding partner, TRAX.

    PubMed

    Jaendling, Alessa; Ramayah, Soshila; Pryce, David W; McFarlane, Ramsay J

    2008-02-01

    Translin is a conserved protein which associates with the breakpoint junctions of chromosomal translocations linked with the development of some human cancers. It binds to both DNA and RNA and has been implicated in mRNA metabolism and regulation of genome stability. It has a binding partner, translin-associated protein X (TRAX), levels of which are regulated by the translin protein in higher eukaryotes. In this study we find that this regulatory function is conserved in the lower eukaryotes, suggesting that translin and TRAX have important functions which provide a selective advantage to both unicellular and multi-cellular eukaryotes, indicating that this function may not be tissue-specific in nature. However, to date, the biological importance of translin and TRAX remains unclear. Here we systematically investigate proposals that suggest translin and TRAX play roles in controlling mitotic cell proliferation, DNA damage responses, genome stability, meiotic/mitotic recombination and stability of GT-rich repeat sequences. We find no evidence for translin and/or TRAX primary function in these pathways, indicating that the conserved biochemical function of translin is not implicated in primary pathways for regulating genome stability and/or segregation.

  1. Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.

    PubMed

    Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing

    2016-08-24

    Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.

  2. Structure, function, and evolution of bacterial ATP-binding cassette systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davidson, A.L.; Dassa, E.; Orelle, C.

    2010-07-27

    The ATP-binding cassette (ABC) systems constitute one of the largest superfamilies of paralogous sequences. All ABC systems share a highly conserved ATP-hydrolyzing domain or protein (the ABC; also referred to as a nucleotide-binding domain [NBD]) that is unequivocally characterized by three short sequence motifs (Fig. 1): these are the Walker A and Walker B motifs, indicative of the presence of a nucleotide-binding site, and the signature motif, unique to ABC proteins, located upstream of the Walker B motif (426). Other motifs diagnostic of ABC proteins are also indicated in Fig. 1. The biological significance of these motifs is discussed inmore » Structure, Function, and Dynamics of the ABC. ABC systems are widespread among living organisms and have been detected in all genera of the three kingdoms of life, with remarkable conservation in the primary sequence of the cassette and in the organization of the constitutive domains or subunits (203, 420). ABC systems couple the energy of ATP hydrolysis to an impressively large variety of essential biological phenomena, comprising not only transmembrane (TM) transport, for which they are best known, but also several non-transport-related processes, such as translation elongation (62) and DNA repair (174). Although ABC systems deserve much attention because they are involved in severe human inherited diseases (107), they were first discovered and characterized in detail in prokaryotes, as early as the 1970s (13, 148, 238, 468). The most extensively analyzed systems were the high-affinity histidine and maltose uptake systems of Salmonella enterica serovar Typhimurium and Escherichia coli. Over 2 decades ago, after the completion of the nucleotide sequences encoding these transporters in the respective laboratories of Giovanna Ames and Maurice Hofnung, Hiroshi Nikaido and colleagues noticed that the two systems displayed a global similarity in the nature of their components and, moreover, that the primary sequences of MalK and HisP, the proteins suspected to energize these transporters, shared as much as 32% identity in amino acid residues when their sequences were aligned (171). Later, it was found that several bacterial proteins involved in uptake of nutrients, export of toxins, cell division, bacterial nodulation of plants, and DNA repair displayed the same similarity in their sequences (127, 196). This led to the notion that the conserved protein, which had been shown to bind ATP (198, 201), would probably energize the systems mentioned above by coupling the energy of ATP hydrolysis to transport. The latter was demonstrated with the maltose and histidine transporters by use of isolated membrane vesicles (105, 379) and purified transporters reconstituted into proteoliposomes (30, 98). The determination of the sequence of the first eukaryotic protein strongly similar to these bacterial transporters (the P-glycoprotein, involved in resistance of cancer cells to multiple drugs) (169, 179) demonstrated that these proteins were not restricted to prokaryotes. Two names, 'traffic ATPases' (15) and the more accepted name 'ABC transporters' (193, 218), were proposed for members of this new superfamily. ABC systems can be divided into three main functional categories, as follows. Importers mediate the uptake of nutrients in prokaryotes. The nature of the substrates that are transported is very wide, including mono- and oligosaccharides, organic and inorganic ions, amino acids, peptides, ironsiderophores, metals, polyamine cations, opines, and vitamins. Exporters are involved in the secretion of various molecules, such as peptides, lipids, hydrophobic drugs, polysaccharides, and proteins, including toxins such as hemolysin. The third category of systems is apparently not involved in transport, with some members being involved in translation of mRNA and in DNA repair. Despite the large, diverse population of substrates handled and the difference in the polarity of transport, importers and exporters share a common organization made of two hydrophobic membrane-spanning or integral membrane (IM) domains and two hydrophilic domains carrying the ABC peripherally associated with the IM domains on the cytosolic side of the membrane (26). In importers, these four domains are almost always independent polypeptide chains that come together to form a multimeric complex. In most exporters, including the E. coli hemolysin exporter HlyB, the N-terminal IM and the C-terminal ABC domains are fused as a single polypeptide chain (IM-ABC). An inverted organization in which the IM domain is C-terminal with respect to the ABC domain (ABC-IM) exists, such as in the MacB protein, involved in macrolide resistance in E. coli. No IM domain partners have been identified for ABC proteins falling into the third category, and these proteins consist of two ABCs fused together (ABC2).« less

  3. A proteomic analysis of leaf sheaths from rice.

    PubMed

    Shen, Shihua; Matsubae, Masami; Takao, Toshifumi; Tanaka, Naoki; Komatsu, Setsuko

    2002-10-01

    The proteins extracted from the leaf sheaths of rice seedlings were separated by 2-D PAGE, and analyzed by Edman sequencing and mass spectrometry, followed by database searching. Image analysis revealed 352 protein spots on 2-D PAGE after staining with Coomassie Brilliant Blue. The amino acid sequences of 44 of 84 proteins were determined; for 31 of these proteins, a clear function could be assigned, whereas for 12 proteins, no function could be assigned. Forty proteins did not yield amino acid sequence information, because they were N-terminally blocked, or the obtained sequences were too short and/or did not give unambiguous results. Fifty-nine proteins were analyzed by mass spectrometry; all of these proteins were identified by matching to the protein database. The amino acid sequences of 19 of 27 proteins analyzed by mass spectrometry were similar to the results of Edman sequencing. These results suggest that 2-D PAGE combined with Edman sequencing and mass spectrometry analysis can be effectively used to identify plant proteins.

  4. NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features

    PubMed Central

    Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

    2010-01-01

    β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC  = 0.50, Qtotal = 82.1%, sensitivity  = 75.6%, PPV  = 68.8% and AUC  = 0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17 – 0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. Conclusion The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences. PMID:21152409

  5. NetTurnP--neural network prediction of beta-turns by use of evolutionary information and predicted protein sequence features.

    PubMed

    Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

    2010-11-30

    β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.

  6. Sequence space and the ongoing expansion of the protein universe.

    PubMed

    Povolotskaya, Inna S; Kondrashov, Fyodor A

    2010-06-17

    The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

  7. ELMO Domains, Evolutionary and Functional Characterization of a Novel GTPase-activating Protein (GAP) Domain for Arf Protein Family GTPases*

    PubMed Central

    East, Michael P.; Bowzard, J. Bradford; Dacks, Joel B.; Kahn, Richard A.

    2012-01-01

    The human family of ELMO domain-containing proteins (ELMODs) consists of six members and is defined by the presence of the ELMO domain. Within this family are two subclassifications of proteins, based on primary sequence conservation, protein size, and domain architecture, deemed ELMOD and ELMO. In this study, we used homology searching and phylogenetics to identify ELMOD family homologs in genomes from across eukaryotic diversity. This demonstrated not only that the protein family is ancient but also that ELMOs are potentially restricted to the supergroup Opisthokonta (Metazoa and Fungi), whereas proteins with the ELMOD organization are found in diverse eukaryotes and thus were likely the form present in the last eukaryotic common ancestor. The segregation of the ELMO clade from the larger ELMOD group is consistent with their contrasting functions as unconventional Rac1 guanine nucleotide exchange factors and the Arf family GTPase-activating proteins, respectively. We used unbiased, phylogenetic sorting and sequence alignments to identify the most highly conserved residues within the ELMO domain to identify a putative GAP domain within the ELMODs. Three independent but complementary assays were used to provide an initial characterization of this domain. We identified a highly conserved arginine residue critical for both the biochemical and cellular GAP activity of ELMODs. We also provide initial evidence of the function of human ELMOD1 as an Arf family GAP at the Golgi. These findings provide the basis for the future study of the ELMOD family of proteins and a new avenue for the study of Arf family GTPases. PMID:23014990

  8. Evaluation of the Contributions of Individual Viral Genes to Newcastle Disease Virus Virulence and Pathogenesis

    PubMed Central

    Paldurai, Anandan; Kim, Shin-Hee; Nayak, Baibaswata; Xiao, Sa; Shive, Heather; Collins, Peter L.

    2014-01-01

    ABSTRACT Naturally occurring Newcastle disease virus (NDV) strains vary greatly in virulence. The presence of multibasic residues at the proteolytic cleavage site of the fusion (F) protein has been shown to be a primary determinant differentiating virulent versus avirulent strains. However, there is wide variation in virulence among virulent strains. There also are examples of incongruity between cleavage site sequence and virulence. These observations suggest that additional viral factors contribute to virulence. In this study, we evaluated the contribution of each viral gene to virulence individually and in different combinations by exchanging genes between velogenic (highly virulent) strain GB Texas (GBT) and mesogenic (moderately virulent) strain Beaudette C (BC). These two strains are phylogenetically closely related, and their F proteins contain identical cleavage site sequences, 112RRQKR↓F117. A total of 20 chimeric viruses were constructed and evaluated in vitro, in 1-day-old chicks, and in 2-week-old chickens. The results showed that both the envelope-associated and polymerase-associated proteins contribute to the difference in virulence between rBC and rGBT, with the envelope-associated proteins playing the greater role. The F protein was the major individual contributor and was sometimes augmented by the homologous M and HN proteins. The dramatic effect of F was independent of its cleavage site sequence since that was identical in the two strains. The polymerase L protein was the next major individual contributor and was sometimes augmented by the homologous N and P proteins. The leader and trailer regions did not appear to contribute to the difference in virulence between BC and GBT. IMPORTANCE This study is the first comprehensive and systematic study of NDV virulence and pathogenesis. Genetic exchanges between a mesogenic and a velogenic strain revealed that the fusion glycoprotein is the major virulence determinant regardless of the identical virulence protease cleavage site sequence present in both strains. The contribution of the large polymerase protein to NDV virulence is second only to that of the fusion glycoprotein. The identification of virulence determinants is of considerable importance, because of the potential to generate better live attenuated NDV vaccines. It may also be possible to apply these findings to other paramyxoviruses. PMID:24850737

  9. Conformational analysis of processivity clamps in solution demonstrates that tertiary structure does not correlate with protein dynamics.

    PubMed

    Fang, Jing; Nevin, Philip; Kairys, Visvaldas; Venclovas, Česlovas; Engen, John R; Beuning, Penny J

    2014-04-08

    The relationship between protein sequence, structure, and dynamics has been elusive. Here, we report a comprehensive analysis using an in-solution experimental approach to study how the conservation of tertiary structure correlates with protein dynamics. Hydrogen exchange measurements of eight processivity clamp proteins from different species revealed that, despite highly similar three-dimensional structures, clamp proteins display a wide range of dynamic behavior. Differences were apparent both for structurally similar domains within proteins and for corresponding domains of different proteins. Several of the clamps contained regions that underwent local unfolding with different half-lives. We also observed a conserved pattern of alternating dynamics of the α helices lining the inner pore of the clamps as well as a correlation between dynamics and the number of salt bridges in these α helices. Our observations reveal that tertiary structure and dynamics are not directly correlated and that primary structure plays an important role in dynamics. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Variants in ACTG2 underlie a substantial number of Australasian patients with primary chronic intestinal pseudo-obstruction.

    PubMed

    Ravenscroft, G; Pannell, S; O'Grady, G; Ong, R; Ee, H C; Faiz, F; Marns, L; Goel, H; Kumarasinghe, P; Sollis, E; Sivadorai, P; Wilson, M; Magoffin, A; Nightingale, S; Freckmann, M-L; Kirk, E P; Sachdev, R; Lemberg, D A; Delatycki, M B; Kamm, M A; Basnayake, C; Lamont, P J; Amor, D J; Jones, K; Schilperoort, J; Davis, M R; Laing, N G

    2018-05-21

    Primary chronic intestinal pseudo-obstruction (CIPO) is a rare, potentially life-threatening disorder characterized by severely impaired gastrointestinal motility. The objective of this study was to examine the contribution of ACTG2, LMOD1, MYH11, and MYLK mutations in an Australasian cohort of patients with a diagnosis of primary CIPO associated with visceral myopathy. Pediatric and adult patients with primary CIPO and suspected visceral myopathy were recruited from across Australia and New Zealand. Sanger sequencing of the genes encoding enteric gamma-actin (ACTG2) and smooth muscle leiomodin (LMOD1) was performed on DNA from patients, and their relatives, where available. MYH11 and MYLK were screened by next-generation sequencing. We identified heterozygous missense variants in ACTG2 in 7 of 17 families (~41%) diagnosed with CIPO and its associated conditions. We also identified a previously unpublished missense mutation (c.443C>T, p.Arg148Leu) in one family. One case presented with megacystis-microcolon-intestinal hypoperistalsis syndrome in utero with subsequent termination of pregnancy at 28 weeks' gestation. All of the substitutions identified occurred at arginine residues. No likely pathogenic variants in LMOD1, MYH11, or MYLK were identified within our cohort. ACTG2 mutations represent a significant underlying cause of primary CIPO with visceral myopathy and associated phenotypes in Australasian patients. Thus, ACTG2 sequencing should be considered in cases presenting with hypoperistalsis phenotypes with suspected visceral myopathy. It is likely that variants in other genes encoding enteric smooth muscle contractile proteins will contribute further to the genetic heterogeneity of hypoperistalsis phenotypes. © 2018 John Wiley & Sons Ltd.

  11. The 1-aminocyclopropane-1-carboxylate synthase of Cucurbita. Purification, properties, expression in Escherichia coli, and primary structure determination by DNA sequence analysis.

    PubMed

    Sato, T; Oeller, P W; Theologis, A

    1991-02-25

    The key regulatory enzyme in the biosynthetic pathway of the plant hormone ethylene is 1-aminocyclopropane-1-carboxylic acid (ACC) synthase (EC 4.4.1.14). We have partially purified ACC synthase 6,000-fold from Cucurbita fruit tissue treated with indoleacetic acid + benzyladenine + aminooxyacetic acid + LiCl. The enzyme has a specific activity of 35,000 nmol/h/mg protein, a pH optimum of 9.5, an isoelectric point of 5.0, a Km of 17 microM with respect to S-adenosylmethionine, and is a dimer of two identical subunits of approximately 46,000 Da each. The subunit exists in vivo as a 55,000-Da species similar in size to the primary in vitro translation product. DNA sequence analysis of the cDNA clone pACC1 revealed that the coding region of the ACC synthase mRNA spans 493 amino acids corresponding to a 55,779-Da polypeptide; and expression of the coding sequence (pACC1) in Escherichia coli as a COOH terminus hybrid of beta-galactosidase or as a nonhybrid polypeptide catalyzed the conversion of S-adenosylmethionine to ACC (Sato, T., and Theologis, A. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, 6621-6625). Immunoblotting experiments herein show that the molecular mass of the beta-galactosidase hybrid polypeptide is 170,000 Da, and the size of the largest nonhybrid polypeptide is 53,000 Da. The data suggest that the enzyme is post-translationally processed during protein purification.

  12. Diversity in the origins of proteostasis networks- a driver for protein function in evolution

    PubMed Central

    Powers, Evan T.; Balch, William E.

    2013-01-01

    Although a protein’s primary sequence largely determines its function, proteins can adopt different folding states in response to changes in the environment, some of which may be deleterious to the organism. All organisms, including Bacteria, Archaea and Eukarya, have evolved a protein homeostasis network, or proteostasis network, that consists of chaperones and folding factors, degradation components, signalling pathways and specialized compartmentalized modules that manage protein folding in response to environmental stimuli and variation. Surveying the origins of proteostasis networks reveals that they have co-evolved with the proteome to regulate the physiological state of the cell, reflecting the unique stresses that different cells or organisms experience, and that they have a key role in driving evolution by closely managing the link between the phenotype and the genotype. PMID:23463216

  13. Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

    PubMed

    Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

    2012-05-10

    RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.

  14. Thioesterases: A new perspective based on their primary and tertiary structures

    PubMed Central

    Cantu, David C; Chen, Yingfei; Reilly, Peter J

    2010-01-01

    Thioesterases (TEs) are classified into EC 3.1.2.1 through EC 3.1.2.27 based on their activities on different substrates, with many remaining unclassified (EC 3.1.2.–). Analysis of primary and tertiary structures of known TEs casts a new light on this enzyme group. We used strong primary sequence conservation based on experimentally proved proteins as the main criterion, followed by verification with tertiary structure superpositions, mechanisms, and catalytic residue positions, to accurately define TE families. At present, TEs fall into 23 families almost completely unrelated to each other by primary structure. It is assumed that all members of the same family have essentially the same tertiary structure; however, TEs in different families can have markedly different folds and mechanisms. Conversely, the latter sometimes have very similar tertiary structures and catalytic mechanisms despite being only slightly or not at all related by primary structure, indicating that they have common distant ancestors and can be grouped into clans. At present, four clans encompass 12 TE families. The new constantly updated ThYme (Thioester-active enzYmes) database contains TE primary and tertiary structures, classified into families and clans that are different from those currently found in the literature or in other databases. We review all types of TEs, including those cleaving CoA, ACP, glutathione, and other protein molecules, and we discuss their structures, functions, and mechanisms. PMID:20506386

  15. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  16. CRISPR-Cas9-mediated gene knockout in primary human airway epithelial cells reveals a proinflammatory role for MUC18.

    PubMed

    Chu, H W; Rios, C; Huang, C; Wesolowska-Andersen, A; Burchard, E G; O'Connor, B P; Fingerlin, T E; Nichols, D; Reynolds, S D; Seibold, M A

    2015-10-01

    Targeted knockout of genes in primary human cells using CRISPR-Cas9-mediated genome-editing represents a powerful approach to study gene function and to discern molecular mechanisms underlying complex human diseases. We used lentiviral delivery of CRISPR-Cas9 machinery and conditional reprogramming culture methods to knockout the MUC18 gene in human primary nasal airway epithelial cells (AECs). Massively parallel sequencing technology was used to confirm that the genome of essentially all cells in the edited AEC populations contained coding region insertions and deletions (indels). Correspondingly, we found mRNA expression of MUC18 was greatly reduced and protein expression was absent. Characterization of MUC18 knockout cell populations stimulated with TLR2, 3 and 4 agonists revealed that IL-8 (a proinflammatory chemokine) responses of AECs were greatly reduced in the absence of functional MUC18 protein. Our results show the feasibility of CRISPR-Cas9-mediated gene knockouts in AEC culture (both submerged and polarized), and suggest a proinflammatory role for MUC18 in airway epithelial response to bacterial and viral stimuli.

  17. Prediction of TF target sites based on atomistic models of protein-DNA complexes

    PubMed Central

    Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

    2008-01-01

    Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190

  18. Interactions of the EcoRV restriction endonuclease with fluorescent oligodeoxynucleotides.

    PubMed

    Erskine, S G; Halford, S E

    1995-05-19

    A self-complementary dodecadeoxyribonucleotide that contains the recognition sequence for the R.EcoRV ENase was synthesised with a primary amino group at its 5' terminus. The 5' amino function was labeled with the fluorescent dye 5-[dimethylamino] napthalene-1-sulfonyl chloride. The labeled oligodeoxyribonucleotide in its duplex form was shown to be a suitable substrate for kinetic studies on the ENase and that no significant dye-DNA or dye-protein interactions occurred. Finally, the binding of R.EcoRV to the labeled DNA was followed by detecting the fluorescence resonance energy transfer between the tryptophans of the protein and the fluorescent labels of the DNA.

  19. Detection of mitochondrial DNA mutations in primary breast cancer and fine-needle aspirates.

    PubMed

    Parrella, P; Xiao, Y; Fliss, M; Sanchez-Cespedes, M; Mazzarelli, P; Rinaldi, M; Nicol, T; Gabrielson, E; Cuomo, C; Cohen, D; Pandit, S; Spencer, M; Rabitti, C; Fazio, V M; Sidransky, D

    2001-10-15

    To determine the frequency and distribution of mitochondrial DNA mutations in breast cancer, 18 primary breast tumors were analyzed by direct sequencing. Twelve somatic mutations not present in matched lymphocytes and normal breast tissues were detected in 11 of the tumors screened (61%). Of these mutations, five (42%) were deletions or insertions in a homopolymeric C-stretch between nucleotides 303-315 (D310) within the D-loop. The remaining seven mutations (58%) were single-base substitutions in the coding (ND1, ND4, ND5, and cytochrome b genes) or noncoding regions (D-loop) of the mitochondrial genome. In three cases (25%), the mutations detected in coding regions led to amino acid substitutions in the protein sequence. We then screened an additional 46 primary breast tumors with a rapid PCR-based assay to identify poly-C alterations in D310, and we found seven more cancers with alterations. Using D310 mutations as clonal marker, we detected identical changes in five of five matched fine-needle aspirates and in four of four metastases-positive lymph nodes. The high frequency of D310 alterations in primary breast cancer combined with the high sensitivity of the PCR-based assays provides a new molecular tool for cancer detection.

  20. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  1. Light regulation of the abundance of mRNA encoding a nucleolin-like protein localized in the nucleoli of pea nuclei.

    PubMed Central

    Tong, C G; Reichler, S; Blumenthal, S; Balk, J; Hsieh, H L; Roux, S J

    1997-01-01

    A cDNA encoding a nucleolar protein was selected from a pea (Pisum sativum) plumule library, cloned, and sequenced. The translated sequence of the cDNA has significant percent identity to Xenopus laevis nucleolin (31%), the alfalfa (Medicago sativa) nucleolin homolog (66%), and the yeast (Saccharomyces cerevisiae) nucleolin homolog (NSR1) (28%). It also has sequence patterns in its primary structure that are characteristic of all nucleolins, including an N-terminal acidic motif, RNA recognition motifs, and a C-terminal Gly- and Arg-rich domain. By immunoblot analysis, the polyclonal antibodies used to select the cDNA bind selectively to a 90-kD protein in purified pea nuclei and nucleoli and to an 88-kD protein in extracts of Escherichia coli expressing the cDNA. In immunolocalization assays of pea plumule cells, the antibodies stained primarily a region surrounding the fibrillar center of nucleoli, where animal nucleolins are typically found. Southern analysis indicated that the pea nucleolin-like protein is encoded by a single gene, and northern analysis showed that the labeled cDNA binds to a single band of RNA, approximately the same size and the cDNA. After irradiation of etiolated pea seedlings by red light, the mRNA level in plumules decreased during the 1st hour and then increased to a peak of six times the 0-h level at 12 h. Far-red light reversed this effect of red light, and the mRNA accumulation from red/far-red light irradiation was equal to that found in the dark control. This indicates that phytochrome may regulate the expression of this gene. PMID:9193096

  2. Shotgun Protein Sequencing with Meta-contig Assembly*

    PubMed Central

    Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

    2012-01-01

    Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278

  3. Shotgun protein sequencing with meta-contig assembly.

    PubMed

    Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

    2012-10-01

    Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.

  4. Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively.

    PubMed

    Clifford, Jacob; Adami, Christoph

    2015-09-02

    Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.

  5. The application of CRISPR technology to high content screening in primary neurons.

    PubMed

    Callif, Ben L; Maunze, Brian; Krueger, Nick L; Simpson, Matthew T; Blackmore, Murray G

    2017-04-01

    Axon growth is coordinated by multiple interacting proteins that remain incompletely characterized. High content screening (HCS), in which manipulation of candidate genes is combined with rapid image analysis of phenotypic effects, has emerged as a powerful technique to identify key regulators of axon outgrowth. Here we explore the utility of a genome editing approach referred to as CRISPR (Clustered Regularly Interspersed Palindromic Repeats) for knockout screening in primary neurons. In the CRISPR approach a DNA-cleaving Cas enzyme is guided to genomic target sequences by user-created guide RNA (sgRNA), where it initiates a double-stranded break that ultimately results in frameshift mutation and loss of protein production. Using electroporation of plasmid DNA that co-expresses Cas9 enzyme and sgRNA, we first verified the ability of CRISPR targeting to achieve protein-level knockdown in cultured postnatal cortical neurons. Targeted proteins included NeuN (RbFox3) and PTEN, a well-studied regulator of axon growth. Effective knockdown lagged at least four days behind transfection, but targeted proteins were eventually undetectable by immunohistochemistry in >80% of transfected cells. Consistent with this, anti-PTEN sgRNA produced no changes in neurite outgrowth when assessed three days post-transfection. When week-long cultures were replated, however, PTEN knockdown consistently increased neurite lengths. These CRISPR-mediated PTEN effects were achieved using multi-well transfection and automated phenotypic analysis, indicating the suitability of PTEN as a positive control for future CRISPR-based screening efforts. Combined, these data establish an example of CRISPR-mediated protein knockdown in primary cortical neurons and its compatibility with HCS workflows. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle.

    PubMed

    Kirkness, Ewen F; Haas, Brian J; Sun, Weilin; Braig, Henk R; Perotti, M Alejandra; Clark, John M; Lee, Si Hyeock; Robertson, Hugh M; Kennedy, Ryan C; Elhaik, Eran; Gerlach, Daniel; Kriventseva, Evgenia V; Elsik, Christine G; Graur, Dan; Hill, Catherine A; Veenstra, Jan A; Walenz, Brian; Tubío, José Manuel C; Ribeiro, José M C; Rozas, Julio; Johnston, J Spencer; Reese, Justin T; Popadic, Aleksandar; Tojo, Marta; Raoult, Didier; Reed, David L; Tomoyasu, Yoshinori; Kraus, Emily; Krause, Emily; Mittapalli, Omprakash; Margam, Venu M; Li, Hong-Mei; Meyer, Jason M; Johnson, Reed M; Romero-Severson, Jeanne; Vanzee, Janice Pagel; Alvarez-Ponce, David; Vieira, Filipe G; Aguadé, Montserrat; Guirao-Rico, Sara; Anzola, Juan M; Yoon, Kyong S; Strycharz, Joseph P; Unger, Maria F; Christley, Scott; Lobo, Neil F; Seufferheld, Manfredo J; Wang, Naikuan; Dasch, Gregory A; Struchiner, Claudio J; Madey, Greg; Hannick, Linda I; Bidwell, Shelby; Joardar, Vinita; Caler, Elisabet; Shao, Renfu; Barker, Stephen C; Cameron, Stephen; Bruggner, Robert V; Regier, Allison; Johnson, Justin; Viswanathan, Lakshmi; Utterback, Terry R; Sutton, Granger G; Lawson, Daniel; Waterhouse, Robert M; Venter, J Craig; Strausberg, Robert L; Berenbaum, May R; Collins, Frank H; Zdobnov, Evgeny M; Pittendrigh, Barry R

    2010-07-06

    As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.

  7. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle

    PubMed Central

    Kirkness, Ewen F.; Haas, Brian J.; Sun, Weilin; Braig, Henk R.; Perotti, M. Alejandra; Clark, John M.; Lee, Si Hyeock; Robertson, Hugh M.; Kennedy, Ryan C.; Elhaik, Eran; Gerlach, Daniel; Kriventseva, Evgenia V.; Elsik, Christine G.; Graur, Dan; Hill, Catherine A.; Veenstra, Jan A.; Walenz, Brian; Tubío, José Manuel C.; Ribeiro, José M. C.; Rozas, Julio; Johnston, J. Spencer; Reese, Justin T.; Popadic, Aleksandar; Tojo, Marta; Raoult, Didier; Reed, David L.; Tomoyasu, Yoshinori; Kraus, Emily; Mittapalli, Omprakash; Margam, Venu M.; Li, Hong-Mei; Meyer, Jason M.; Johnson, Reed M.; Romero-Severson, Jeanne; VanZee, Janice Pagel; Alvarez-Ponce, David; Vieira, Filipe G.; Aguadé, Montserrat; Guirao-Rico, Sara; Anzola, Juan M.; Yoon, Kyong S.; Strycharz, Joseph P.; Unger, Maria F.; Christley, Scott; Lobo, Neil F.; Seufferheld, Manfredo J.; Wang, NaiKuan; Dasch, Gregory A.; Struchiner, Claudio J.; Madey, Greg; Hannick, Linda I.; Bidwell, Shelby; Joardar, Vinita; Caler, Elisabet; Shao, Renfu; Barker, Stephen C.; Cameron, Stephen; Bruggner, Robert V.; Regier, Allison; Johnson, Justin; Viswanathan, Lakshmi; Utterback, Terry R.; Sutton, Granger G.; Lawson, Daniel; Waterhouse, Robert M.; Venter, J. Craig; Strausberg, Robert L.; Collins, Frank H.; Zdobnov, Evgeny M.; Pittendrigh, Barry R.

    2010-01-01

    As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens. PMID:20566863

  8. Assessing Analytical Similarity of Proposed Amgen Biosimilar ABP 501 to Adalimumab.

    PubMed

    Liu, Jennifer; Eris, Tamer; Li, Cynthia; Cao, Shawn; Kuhns, Scott

    2016-08-01

    ABP 501 is being developed as a biosimilar to adalimumab. Comprehensive comparative analytical characterization studies have been conducted and completed. The objective of this study was to assess analytical similarity between ABP 501 and two adalimumab reference products (RPs), licensed by the United States Food and Drug Administration (adalimumab [US]) and authorized by the European Union (adalimumab [EU]), using state-of-the-art analytical methods. Comprehensive analytical characterization incorporating orthogonal analytical techniques was used to compare products. Physicochemical property comparisons comprised the primary structure related to amino acid sequence and post-translational modifications including glycans; higher-order structure; primary biological properties mediated by target and receptor binding; product-related substances and impurities; host-cell impurities; general properties of the finished drug product, including strength and formulation; subvisible and submicron particles and aggregates; and forced thermal degradation. ABP 501 had the same amino acid sequence and similar post-translational modification profiles compared with adalimumab RPs. Primary structure, higher-order structure, and biological activities were similar for the three products. Product-related size and charge variants and aggregate and particle levels were also similar. ABP 501 had very low residual host-cell protein and DNA. The finished ABP 501 drug product has the same strength with regard to protein concentration and fill volume as adalimumab RPs. ABP 501 and the RPs had a similar stability profile both in normal storage and thermal stress conditions. Based on the comprehensive analytical similarity assessment, ABP 501 was found to be similar to adalimumab with respect to physicochemical and biological properties.

  9. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  10. The Structure of Rauvolfia serpentina Strictosidine Synthase Is a Novel Six-Bladed β-Propeller Fold in Plant Proteins[W

    PubMed Central

    Ma, Xueyan; Panjikar, Santosh; Koepke, Juergen; Loris, Elke; Stöckigt, Joachim

    2006-01-01

    The enzyme strictosidine synthase (STR1) from the Indian medicinal plant Rauvolfia serpentina is of primary importance for the biosynthetic pathway of the indole alkaloid ajmaline. Moreover, STR1 initiates all biosynthetic pathways leading to the entire monoterpenoid indole alkaloid family representing an enormous structural variety of ∼2000 compounds in higher plants. The crystal structures of STR1 in complex with its natural substrates tryptamine and secologanin provide structural understanding of the observed substrate preference and identify residues lining the active site surface that contact the substrates. STR1 catalyzes a Pictet-Spengler–type reaction and represents a novel six-bladed β-propeller fold in plant proteins. Structure-based sequence alignment revealed a common repetitive sequence motif (three hydrophobic residues are followed by a small residue and a hydrophilic residue), indicating a possible evolutionary relationship between STR1 and several sequence-unrelated six-bladed β-propeller structures. Structural analysis and site-directed mutagenesis experiments demonstrate the essential role of Glu-309 in catalysis. The data will aid in deciphering the details of the reaction mechanism of STR1 as well as other members of this enzyme family. PMID:16531499

  11. The structure of Rauvolfia serpentina strictosidine synthase is a novel six-bladed beta-propeller fold in plant proteins.

    PubMed

    Ma, Xueyan; Panjikar, Santosh; Koepke, Juergen; Loris, Elke; Stöckigt, Joachim

    2006-04-01

    The enzyme strictosidine synthase (STR1) from the Indian medicinal plant Rauvolfia serpentina is of primary importance for the biosynthetic pathway of the indole alkaloid ajmaline. Moreover, STR1 initiates all biosynthetic pathways leading to the entire monoterpenoid indole alkaloid family representing an enormous structural variety of approximately 2000 compounds in higher plants. The crystal structures of STR1 in complex with its natural substrates tryptamine and secologanin provide structural understanding of the observed substrate preference and identify residues lining the active site surface that contact the substrates. STR1 catalyzes a Pictet-Spengler-type reaction and represents a novel six-bladed beta-propeller fold in plant proteins. Structure-based sequence alignment revealed a common repetitive sequence motif (three hydrophobic residues are followed by a small residue and a hydrophilic residue), indicating a possible evolutionary relationship between STR1 and several sequence-unrelated six-bladed beta-propeller structures. Structural analysis and site-directed mutagenesis experiments demonstrate the essential role of Glu-309 in catalysis. The data will aid in deciphering the details of the reaction mechanism of STR1 as well as other members of this enzyme family.

  12. Mapping PDB chains to UniProtKB entries.

    PubMed

    Martin, Andrew C R

    2005-12-01

    UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.

  13. Hypothesis and Theory: Revisiting Views on the Co-evolution of the Melanocortin Receptors and the Accessory Proteins, MRAP1 and MRAP2.

    PubMed

    Dores, Robert M

    2016-01-01

    The evolution of the melanocortin receptors (MCRs) is closely associated with the evolution of the melanocortin-2 receptor accessory proteins (MRAPs). Recent annotation of the elephant shark genome project revealed the sequence of a putative MRAP1 ortholog. The presence of this sequence in the genome of a cartilaginous fish raises the possibility that the mrap1 and mrap2 genes in the genomes of gnathostome vertebrates were the result of the chordate 2R genome duplication event. The presence of a putative MRAP1 ortholog in a cartilaginous fish genome is perplexing. Recent studies on melanocortin-2 receptor (MC2R) in the genomes of the elephant shark and the Japanese stingray indicate that these MC2R orthologs can be functionally expressed in CHO cells without co-expression of an exogenous mrap1 cDNA. The novel ligand selectivity of these cartilaginous fish MC2R orthologs is discussed. Finally, the origin of the mc2r and mc5r genes is reevaluated. The distinctive primary sequence conservation of MC2R and MC5R is discussed in light of the physiological roles of these two MCR paralogs.

  14. Primary structure and mapping of the hupA gene of Salmonella typhimurium.

    PubMed Central

    Higgins, N P; Hillyard, D

    1988-01-01

    In bacteria, the complex nucleoid structure is folded and maintained by negative superhelical tension and a set of type II DNA-binding proteins, also called histonelike proteins. The most abundant type II DNA-binding protein is HU. Southern blot analysis showed that Salmonella typhimurium contained two HU genes that corresponded to Escherichia coli genes hupA (encoding HU-2 protein) and hupB (encoding HU-1). Salmonella hupA was cloned, and the nucleotide sequence of the gene was determined. Comparison of hupA of E. coli and S. typhimurium revealed that the HU-2 proteins were identical and that there was high conservation of nucleotide sequences outside the coding frames of the genes. A 300-member genomic library of S. typhimurium was constructed by using random transposition of MudP, a specialized chimeric P22-Mu phage that packages chromosomal DNA unidirectionally from its insertion point. Oligonucleotide hybridization against the library identified one MudP insertion that lies within 28 kilobases of hupA; the MudP was 12% linked to purH at 90.5 min on the standard map. Plasmids expressing HU-2 had a surprising phenotype; they caused growth arrest when they were introduced into E. coli strains bearing a himA or hip mutation. These results suggest that IHF and HU have interactive roles in bacteria. Images PMID:3056912

  15. Recovery of known T-cell epitopes by computational scanning of a viral genome

    NASA Astrophysics Data System (ADS)

    Logean, Antoine; Rognan, Didier

    2002-04-01

    A new computational method (EpiDock) is proposed for predicting peptide binding to class I MHC proteins, from the amino acid sequence of any protein of immunological interest. Starting from the primary structure of the target protein, individual three-dimensional structures of all possible MHC-peptide (8-, 9- and 10-mers) complexes are obtained by homology modelling. A free energy scoring function (Fresno) is then used to predict the absolute binding free energy of all possible peptides to the class I MHC restriction protein. Assuming that immunodominant epitopes are usually found among the top MHC binders, the method can thus be applied to predict the location of immunogenic peptides on the sequence of the protein target. When applied to the prediction of HLA-A*0201-restricted T-cell epitopes from the Hepatitis B virus, EpiDock was able to recover 92% of known high affinity binders and 80% of known epitopes within a filtered subset of all possible nonapeptides corresponding to about one tenth of the full theoretical list. The proposed method is fully automated and fast enough to scan a viral genome in less than an hour on a parallel computing architecture. As it requires very few starting experimental data, EpiDock can be used: (i) to predict potential T-cell epitopes from viral genomes (ii) to roughly predict still unknown peptide binding motifs for novel class I MHC alleles.

  16. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  17. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  18. Novel Mechanism of Hemin Capture by Hbp2, the Hemoglobin-binding Hemophore from Listeria monocytogenes*

    PubMed Central

    Malmirchegini, G. Reza; Sjodt, Megan; Shnitkind, Sergey; Sawaya, Michael R.; Rosinski, Justin; Newton, Salete M.; Klebba, Phillip E.; Clubb, Robert T.

    2014-01-01

    Iron is an essential nutrient that is required for the growth of the bacterial pathogen Listeria monocytogenes. In cell cultures, this microbe secretes hemin/hemoglobin-binding protein 2 (Hbp2; Lmo2185) protein, which has been proposed to function as a hemophore that scavenges heme from the environment. Based on its primary sequence, Hbp2 contains three NEAr transporter (NEAT) domains of unknown function. Here we show that each of these domains mediates high affinity binding to ferric heme (hemin) and that its N- and C-terminal domains interact with hemoglobin (Hb). The results of hemin transfer experiments are consistent with Hbp2 functioning as an Hb-binding hemophore that delivers hemin to other Hbp2 proteins that are attached to the cell wall. Surprisingly, our work reveals that the central NEAT domain in Hbp2 binds hemin even though its primary sequence lacks a highly conserved YXXXY motif that is used by all other previously characterized NEAT domains to coordinate iron in the hemin molecule. To elucidate the mechanism of hemin binding by Hbp2, we determined crystal structures of its central NEAT domain (Hbp2N2; residues 183–303) in its free and hemin-bound states. The structures reveal an unprecedented mechanism of hemin binding in which Hbp2N2 undergoes a major conformational rearrangement that facilitates metal coordination by a non-canonical tyrosine residue. These studies highlight previously unrecognized plasticity in the hemin binding mechanism of NEAT domains and provide insight into how L. monocytogenes captures heme iron. PMID:25315777

  19. Mapping the distribution of packing topologies within protein interiors shows predominant preference for specific packing motifs

    PubMed Central

    2011-01-01

    Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry. PMID:21605466

  20. Design and testing for a nontagged F1-V fusion protein as vaccine antigen against bubonic and pneumonic plague.

    PubMed

    Powell, Bradford S; Andrews, Gerard P; Enama, Jeffrey T; Jendrek, Scott; Bolt, Chris; Worsham, Patricia; Pullen, Jeffrey K; Ribot, Wilson; Hines, Harry; Smith, Leonard; Heath, David G; Adamovicz, Jeffrey J

    2005-01-01

    A two-component recombinant fusion protein antigen was re-engineered and tested as a medical counter measure against the possible biological threat of aerosolized Yersinia pestis. The active component of the proposed subunit vaccine combines the F1 capsular protein and V virulence antigen of Y. pestis and improves upon the design of an earlier histidine-tagged fusion protein. In the current study, different production strains were screened for suitable expression and a purification process was optimized to isolate an F1-V fusion protein absent extraneous coding sequences. Soluble F1-V protein was isolated to 99% purity by sequential liquid chromatography including capture and refolding of urea-denatured protein via anion exchange, followed by hydrophobic interaction, concentration, and then transfer into buffered saline for direct use after frozen storage. Protein identity and primary structure were verified by mass spectrometry and Edman sequencing, confirming a purified product of 477 amino acids and removal of the N-terminal methionine. Purity, quality, and higher-order structure were compared between lots using RP-HPLC, intrinsic fluorescence, CD spectroscopy, and multi-angle light scattering spectroscopy, all of which indicated a consistent and properly folded product. As formulated with aluminum hydroxide adjuvant and administered in a single subcutaneous dose, this new F1-V protein also protected mice from wild-type and non-encapsulated Y. pestis challenge strains, modeling prophylaxis against pneumonic and bubonic plague. These findings confirm that the fusion protein architecture provides superior protection over the former licensed product, establish a foundation from which to create a robust production process, and set forth assays for the development of F1-V as the active pharmaceutical ingredient of the next plague vaccine.

  1. Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

    PubMed Central

    Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

    1993-01-01

    The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943

  2. Characterization of the canine mda-7 gene, transcripts and expression patterns

    PubMed Central

    Sandey, Maninder; Bird, R. Curtis; Das, Swadesh K.; Sarkar, Devanand; Curiel, David T.; Fisher, Paul B.; Smith, Bruce F.

    2014-01-01

    Human melanoma differentiation associated gene-7/interleukin-24 (mda-7/IL-24) displays potent growth suppressing and cell killing activity against a wide variety of human and rodent cancer cells. In this study, we identified a canine ortholog of the human mda-7/IL-24 gene located within a cluster of IL-10 family members on chromosome 7. The full-length mRNA sequence of canine mda-7 was determined, which encodes a 186-amino acid protein that has 66% similarity to human MDA-7/IL-24. Canine MDA-7 is constitutively expressed in cultured normal canine epidermal keratinocytes (NCEKs), and its expression levels are increased after lipopolysaccharide stimulation. In cultured NCEKs, the canine mda-7 pre-mRNA is differentially spliced, via exon skipping and alternate 5′-splice donor sites, to yield five splice variants (canine mda-7sv1, canine mda-7sv2, canine mda-7sv3, canine mda-7sv4 and canine mda-7sv5) that encode four protein isoforms of the canine MDA-7 protein. These protein isoforms have a conserved N-terminus (signal peptide sequence) and are dissimilar in amino acid sequences at their C-terminus. Canine MDA-7 is not expressed in primary canine tumor samples, and most tumor derived cancer cell lines tested, like its human counterpart. Unlike human MDA-7/IL-24, canine mda-7 mRNA is not expressed in unstimulated or lipopolysaccharide (LPS), concanavalin A (ConA) or phytohemagglutinin (PHA) stimulated canine peripheral blood mononuclear cells (PBMCs). Furthermore, in-silico analysis revealed that canonical canine MDA-7 has a potential 28 amino acid signal peptide sequence that can target it for active secretion. This data suggests that canine mda-7 is indeed an ortholog of human mda-7/IL-24, its protein product has high amino acid similarity to human MDA-7/IL-24 protein and it may possess similar biological properties to human MDA-7/IL-24, but its expression pattern is more restricted than its human ortholog. PMID:24865935

  3. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues.

    PubMed

    Guo, Song; Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields.

  4. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues

    PubMed Central

    Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields. PMID:27034949

  5. How do corals make rocks?

    NASA Astrophysics Data System (ADS)

    Falkowski, P. G.; Mass, T.; Drake, J.; Schaller, M. F.; Rosenthal, Y.; Schofield, O.; Sherrell, R. M.

    2014-12-01

    We have developed a three pronged approach to understanding how corals precipitate aragonite crystals and contain proxy biogeochemical information. Using proteomic and genomic approaches, we have identified 35 proteins in coral skeletons. Among these are a series of coral acidic proteins (CARPs). Based on their gene sequences, we cloned a subset of these proteins and purified them. Each of the proteins precipitate aragonite in vitro in unamended seawater. Antibodies raised against these proteins react with individual crystals of the native coral, clearly revealing that they are part of a biomineral structure. Based on the primary structure of the proteins we have developed a model of the precipitation reaction that focuses on a Lewis acid displacement of protons from bicarbonate anions by calcium ligated to the carboxyl groups on the CARPs. The reactions are highly acidic and are not manifestly influenced by pH above ca. 6. These results suggest that corals will maintain the ability to calcify in the coming centuries, despite acidification of the oceans.

  6. Terminal sequence importance of de novo proteins from binary-patterned library: stable artificial proteins with 11- or 12-amino acid alphabet.

    PubMed

    Okura, Hiromichi; Takahashi, Tsuyoshi; Mihara, Hisakazu

    2012-06-01

    Successful approaches of de novo protein design suggest a great potential to create novel structural folds and to understand natural rules of protein folding. For these purposes, smaller and simpler de novo proteins have been developed. Here, we constructed smaller proteins by removing the terminal sequences from stable de novo vTAJ proteins and compared stabilities between mutant and original proteins. vTAJ proteins were screened from an α3β3 binary-patterned library which was designed with polar/ nonpolar periodicities of α-helix and β-sheet. vTAJ proteins have the additional terminal sequences due to the method of constructing the genetically repeated library sequences. By removing the parts of the sequences, we successfully obtained the stable smaller de novo protein mutants with fewer amino acid alphabets than the originals. However, these mutants showed the differences on ANS binding properties and stabilities against denaturant and pH change. The terminal sequences, which were designed just as flexible linkers not as secondary structure units, sufficiently affected these physicochemical details. This study showed implications for adjusting protein stabilities by designing N- and C-terminal sequences.

  7. Combining one-step Sanger sequencing with phasing probe hybridization for HLA class I typing yields rapid, G-group resolution predicting 99% of unique full length protein sequences.

    PubMed

    Tu, Bin; Masaberg, Carly; Hou, Lihua; Behm, Daniel; Brescia, Peter; Cha, Nuri; Kariyawasam, Kanthi; Lee, Jar How; Nong, Thoa; Sells, John; Tausch, Paul; Yang, Ruyan; Ng, Jennifer; Hurley, Carolyn Katovich

    2017-02-01

    Sanger-based DNA sequencing of exons 2+3 of HLA class I alleles from a heterozygote frequently results in two or more alternative genotypes. This study was undertaken to reduce the time and effort required to produce a single high resolution HLA genotype. Samples were typed in parallel by Sanger sequencing and oligonucleotide probe hybridization. This workflow, together with optimization of analysis software, was tested and refined during the typing of over 42,000 volunteers for an unrelated hematopoietic progenitor cell donor registry. Next generation DNA sequencing (NGS) was applied to over 1000 of these samples to identify the alleles present within the G group designations. Single genotypes at G level resolution were obtained for over 95% of the loci without additional assays. The vast majority of alleles identified (>99%) were the primary allele giving the G groups their name. Only 0.7% of the alleles identified encoded protein variants that were not detected by a focus on the antigen recognition domain (ARD)-encoding exons. Our combined method routinely provides biologically relevant typing resolution at the level of the ARD. It can be applied to both single samples or to large volume typing supporting either bone marrow or solid organ transplantation using technologies currently available in many HLA laboratories. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  8. Both coding exons of the c-myc gene contribute to its posttranscriptional regulation in the quiescent liver and regenerating liver and after protein synthesis inhibition.

    PubMed Central

    Lavenu, A; Pistoi, S; Pournin, S; Babinet, C; Morello, D

    1995-01-01

    In vivo, the steady-state level of c-myc mRNA is mainly controlled by posttranscriptional mechanisms. Using a panel of transgenic mice in which various versions of the human c-myc proto-oncogene were under the control of major histocompatibility complex H-2Kb class I regulatory sequences, we have shown that the 5' and the 3' noncoding sequences are dispensable for obtaining a regulated expression of the transgene in adult quiescent tissues, at the start of liver regeneration, and after inhibition of protein synthesis. These results indicated that the coding sequences were sufficient to ensure a regulated c-myc expression. In the present study, we have pursued this analysis with transgenes containing one or the other of the two c-myc coding exons either alone or in association with the c-myc 3' untranslated region. We demonstrate that each of the exons contains determinants which control c-myc mRNA expression. Moreover, we show that in the liver, c-myc exon 2 sequences are able to down-regulate an otherwise stable H-2K mRNA when embedded within it and to induce its transient accumulation after cycloheximide treatment and soon after liver ablation. Finally, the use of transgenes with different coding capacities has allowed us to postulate that the primary mRNA sequence itself and not c-Myc peptides is an important component of c-myc posttranscriptional regulation. PMID:7623834

  9. On the Origin of Protein Superfamilies and Superfolds

    NASA Astrophysics Data System (ADS)

    Magner, Abram; Szpankowski, Wojciech; Kihara, Daisuke

    2015-02-01

    Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.

  10. Optimizing high performance computing workflow for protein functional annotation.

    PubMed

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  11. Optimizing high performance computing workflow for protein functional annotation

    PubMed Central

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-01-01

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

  12. BACHSCORE. A tool for evaluating efficiently and reliably the quality of large sets of protein structures

    NASA Astrophysics Data System (ADS)

    Sarti, E.; Zamuner, S.; Cossio, P.; Laio, A.; Seno, F.; Trovato, A.

    2013-12-01

    In protein structure prediction it is of crucial importance, especially at the refinement stage, to score efficiently large sets of models by selecting the ones that are closest to the native state. We here present a new computational tool, BACHSCORE, that allows its users to rank different structural models of the same protein according to their quality, evaluated by using the BACH++ (Bayesian Analysis Conformation Hunt) scoring function. The original BACH statistical potential was already shown to discriminate with very good reliability the protein native state in large sets of misfolded models of the same protein. BACH++ features a novel upgrade in the solvation potential of the scoring function, now computed by adapting the LCPO (Linear Combination of Pairwise Orbitals) algorithm. This change further enhances the already good performance of the scoring function. BACHSCORE can be accessed directly through the web server: bachserver.pd.infn.it. Catalogue identifier: AEQD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQD_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 130159 No. of bytes in distributed program, including test data, etc.: 24 687 455 Distribution format: tar.gz Programming language: C++. Computer: Any computer capable of running an executable produced by a g++ compiler (4.6.3 version). Operating system: Linux, Unix OS-es. RAM: 1 073 741 824 bytes Classification: 3. Nature of problem: Evaluate the quality of a protein structural model, taking into account the possible “a priori” knowledge of a reference primary sequence that may be different from the amino-acid sequence of the model; the native protein structure should be recognized as the best model. Solution method: The contact potential scores the occurrence of any given type of residue pair in 5 possible contact classes (α-helical contact, parallel β-sheet contact, anti-parallel β-sheet contact, side-chain contact, no contact). The solvation potential scores the occurrence of any residue type in 2 possible environments: buried and solvent exposed. Residue environment is assigned by adapting the LCPO algorithm. Residues present in the reference primary sequence and not present in the model structure contribute to the model score as solvent exposed and as non contacting all other residues. Restrictions: Input format file according to the Protein Data Bank standard Additional comments: Parameter values used in the scoring function can be found in the file /folder-to-bachscore/BACH/examples/bach_std.par. Running time: Roughly one minute to score one hundred structures on a desktop PC, depending on their size.

  13. Chemistry of gluten proteins.

    PubMed

    Wieser, Herbert

    2007-04-01

    Gluten proteins play a key role in determining the unique baking quality of wheat by conferring water absorption capacity, cohesivity, viscosity and elasticity on dough. Gluten proteins can be divided into two main fractions according to their solubility in aqueous alcohols: the soluble gliadins and the insoluble glutenins. Both fractions consist of numerous, partially closely related protein components characterized by high glutamine and proline contents. Gliadins are mainly monomeric proteins with molecular weights (MWs) around 28,000-55,000 and can be classified according to their different primary structures into the alpha/beta-, gamma- and omega-type. Disulphide bonds are either absent or present as intrachain crosslinks. The glutenin fraction comprises aggregated proteins linked by interchain disulphide bonds; they have a varying size ranging from about 500,000 to more than 10 million. After reduction of disulphide bonds, the resulting glutenin subunits show a solubility in aqueous alcohols similar to gliadins. Based on primary structure, glutenin subunits have been divided into the high-molecular-weight (HMW) subunits (MW=67,000-88,000) and low-molecular-weight (LMW) subunits (MW=32,000-35,000). Each gluten protein type consists or two or three different structural domains; one of them contains unique repetitive sequences rich in glutamine and proline. Native glutenins are composed of a backbone formed by HMW subunit polymers and of LMW subunit polymers branched off from HMW subunits. Non-covalent bonds such as hydrogen bonds, ionic bonds and hydrophobic bonds are important for the aggregation of gliadins and glutenins and implicate structure and physical properties of dough.

  14. [Environment of tryptophan residues in proteins--a factor for stability to oxidative nitrosylation. I. Analysis of primary structure].

    PubMed

    Beda, N V; Nedospasov, A A

    2001-01-01

    Micellar catalysis under aerobic conditions effectively accelerates oxidative nitrosylation because of solubilization of NO and O2 by protein membranes and hydrophobic nuclei. Nitrosylating intermediates NOx (NO2, N2O3, N2O4) form mainly in the hydrophobic phase, and therefore their solubility in aqueous phase is low and hydrolysis is rapid, local concentration of NOx in the hydrophobic phase being essentially higher than in aqueous. Tryptophan is a hydrophobic residue and can nitrosylate with the formation of isomer N-nitrosotryptophans (NOW). Without denitrosylation mechanism, the accumulation of NOW in proteins of NO-synthesizing organisms would be constant, and long-living proteins would contain essential amounts of NOW, which is however not the case. Using Protein Data Bank (more than 78,000 sequences) we investigated the distribution of tryptophan residues environment (22 residues on each side of polypeptide chain) in proteins with known primary structure. Charged and polar residues (D, H, K, N, Q, R, S) are more incident in the immediate surrounding of tryptophan (-6, -5, -2, -1, 1, 2, 4) and hydrophobic residues (A, F, I, L, V, Y) are more rare than in remote positions. Hence, an essential part of tryptophan residues is situated in hydrophilic environment, which decreases the nitrosylation velocity because of lower NOx concentration in aqueous phase and allows the denitrosylation reactions course via nitrosonium ion transfer on nucleophils of functional groups of protein and low-molecular compounds in aqueous phase.

  15. Mutations in the Gene Encoding IFT Dynein Complex Component WDR34 Cause Jeune Asphyxiating Thoracic Dystrophy

    PubMed Central

    Schmidts, Miriam; Vodopiutz, Julia; Christou-Savina, Sonia; Cortés, Claudio R.; McInerney-Leo, Aideen M.; Emes, Richard D.; Arts, Heleen H.; Tüysüz, Beyhan; D’Silva, Jason; Leo, Paul J.; Giles, Tom C.; Oud, Machteld M.; Harris, Jessica A.; Koopmans, Marije; Marshall, Mhairi; Elçioglu, Nursel; Kuechler, Alma; Bockenhauer, Detlef; Moore, Anthony T.; Wilson, Louise C.; Janecke, Andreas R.; Hurles, Matthew E.; Emmet, Warren; Gardiner, Brooke; Streubel, Berthold; Dopita, Belinda; Zankl, Andreas; Kayserili, Hülya; Scambler, Peter J.; Brown, Matthew A.; Beales, Philip L.; Wicking, Carol; Duncan, Emma L.; Mitchison, Hannah M.

    2013-01-01

    Bidirectional (anterograde and retrograde) motor-based intraflagellar transport (IFT) governs cargo transport and delivery processes that are essential for primary cilia growth and maintenance and for hedgehog signaling functions. The IFT dynein-2 motor complex that regulates ciliary retrograde protein transport contains a heavy chain dynein ATPase/motor subunit, DYNC2H1, along with other less well functionally defined subunits. Deficiency of IFT proteins, including DYNC2H1, underlies a spectrum of skeletal ciliopathies. Here, by using exome sequencing and a targeted next-generation sequencing panel, we identified a total of 11 mutations in WDR34 in 9 families with the clinical diagnosis of Jeune syndrome (asphyxiating thoracic dystrophy). WDR34 encodes a WD40 repeat-containing protein orthologous to Chlamydomonas FAP133, a dynein intermediate chain associated with the retrograde intraflagellar transport motor. Three-dimensional protein modeling suggests that the identified mutations all affect residues critical for WDR34 protein-protein interactions. We find that WDR34 concentrates around the centrioles and basal bodies in mammalian cells, also showing axonemal staining. WDR34 coimmunoprecipitates with the dynein-1 light chain DYNLL1 in vitro, and mining of proteomics data suggests that WDR34 could represent a previously unrecognized link between the cytoplasmic dynein-1 and IFT dynein-2 motors. Together, these data show that WDR34 is critical for ciliary functions essential to normal development and survival, most probably as a previously unrecognized component of the mammalian dynein-IFT machinery. PMID:24183451

  16. Recombinant Mip-PilE-FlaA dominant epitopes vaccine candidate against Legionella pneumophila.

    PubMed

    He, Jinlei; Huang, Fan; Chen, Han; Chen, Qiwei; Zhang, Junrong; Li, Jiao; Chen, Dali; Chen, Jianping

    2017-06-01

    Legionella pneumophila is the main causative agent of Legionnaires' disease, which is a severe multi-system disease with pneumonia as the primary manifestation. We designed a recombinant Mip-PilE-FlaA dominant epitopes vaccine against Legionella pneumophila to prevent the disease and evaluated its immunogenicity and protective immunity. The protein structures of Mip, PilE and FlaA were analyzed using a computer, and the gene sequences of the dominant epitopes of the three proteins were selected to construct and optimize the vaccine. The optimized mip, pilE, flaA and recombinant mip-pilE-flaA gene sequences were cloned, expressed and purified. The purified proteins were used as dominant epitopes vaccines to immunize BALB/c mice and determine the protective immunity and immunogenicity of these purified proteins. The identification confirmed that the recombinant mip-pilE-flaA was successfully cloned and expressed. ELISA revealed that the Mip-PilE-FlaA group produced the highest IgG response, and this protein may considerably improve the production of some cytokines in BALB/c mice. Histopathology analyses of lungs from mice immunized with Mip-PilE-FlaA revealed a certain protective effect. Our work demonstrated that the recombinant dominant epitopes of Mip-PilE-FlaA exhibited strong immunogenicity and immune protection, and this protein may be an efficient epitopes vaccine candidate against Legionella pneumophila. Copyright © 2017 European Federation of Immunological Societies. Published by Elsevier B.V. All rights reserved.

  17. Recurrent and functional regulatory mutations in breast cancer.

    PubMed

    Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad

    2017-07-06

    Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.

  18. Landscape of genomic diversity and trait discovery in soybean.

    PubMed

    Valliyodan, Babu; Dan Qiu; Patil, Gunvant; Zeng, Peng; Huang, Jiaying; Dai, Lu; Chen, Chengxuan; Li, Yanjun; Joshi, Trupti; Song, Li; Vuong, Tri D; Musket, Theresa A; Xu, Dong; Shannon, J Grover; Shifeng, Cheng; Liu, Xin; Nguyen, Henry T

    2016-03-31

    Cultivated soybean [Glycine max (L.) Merr.] is a primary source of vegetable oil and protein. We report a landscape analysis of genome-wide genetic variation and an association study of major domestication and agronomic traits in soybean. A total of 106 soybean genomes representing wild, landraces, and elite lines were re-sequenced at an average of 17x depth with a 97.5% coverage. Over 10 million high-quality SNPs were discovered, and 35.34% of these have not been previously reported. Additionally, 159 putative domestication sweeps were identified, which includes 54.34 Mbp (4.9%) and 4,414 genes; 146 regions were involved in artificial selection during domestication. A genome-wide association study of major traits including oil and protein content, salinity, and domestication traits resulted in the discovery of novel alleles. Genomic information from this study provides a valuable resource for understanding soybean genome structure and evolution, and can also facilitate trait dissection leading to sequencing-based molecular breeding.

  19. Landscape of genomic diversity and trait discovery in soybean

    PubMed Central

    Valliyodan, Babu; Dan Qiu; Patil, Gunvant; Zeng, Peng; Huang, Jiaying; Dai, Lu; Chen, Chengxuan; Li, Yanjun; Joshi, Trupti; Song, Li; Vuong, Tri D.; Musket, Theresa A.; Xu, Dong; Shannon, J. Grover; Shifeng, Cheng; Liu, Xin; Nguyen, Henry T.

    2016-01-01

    Cultivated soybean [Glycine max (L.) Merr.] is a primary source of vegetable oil and protein. We report a landscape analysis of genome-wide genetic variation and an association study of major domestication and agronomic traits in soybean. A total of 106 soybean genomes representing wild, landraces, and elite lines were re-sequenced at an average of 17x depth with a 97.5% coverage. Over 10 million high-quality SNPs were discovered, and 35.34% of these have not been previously reported. Additionally, 159 putative domestication sweeps were identified, which includes 54.34 Mbp (4.9%) and 4,414 genes; 146 regions were involved in artificial selection during domestication. A genome-wide association study of major traits including oil and protein content, salinity, and domestication traits resulted in the discovery of novel alleles. Genomic information from this study provides a valuable resource for understanding soybean genome structure and evolution, and can also facilitate trait dissection leading to sequencing-based molecular breeding. PMID:27029319

  20. Databases and Associated Tools for Glycomics and Glycoproteomics.

    PubMed

    Lisacek, Frederique; Mariethoz, Julien; Alocci, Davide; Rudd, Pauline M; Abrahams, Jodie L; Campbell, Matthew P; Packer, Nicolle H; Ståhle, Jonas; Widmalm, Göran; Mullen, Elaine; Adamczyk, Barbara; Rojas-Macias, Miguel A; Jin, Chunsheng; Karlsson, Niclas G

    2017-01-01

    The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).

  1. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  2. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework.

    PubMed

    Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I

    2018-04-14

    Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. Elman RNN based classification of proteins sequences on account of their mutual information.

    PubMed

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Computationally mapping sequence space to understand evolutionary protein engineering.

    PubMed

    Armstrong, Kathryn A; Tidor, Bruce

    2008-01-01

    Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.

  5. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  6. Analysis of the complete nucleotide sequence and functional organization of the genome of Streptococcus pneumoniae bacteriophage Cp-1.

    PubMed

    Martín, A C; López, R; García, P

    1996-06-01

    Cp-1, a bacteriophage infecting Streptococcus pneumoniae, has a linear double-stranded DNA genome, with a terminal protein covalently linked to its 5' ends, that replicates by the protein-priming mechanism. We describe here the complete DNA sequence and transcriptional map of the Cp-1 genome. These analyses have led to the firm assignment of 10 genes and the localization of 19 additional open reading frames in the 19,345-bp Cp-1 DNA. Striking similarities and differences between some of these proteins and those of the Bacillus subtilis phage phi 29, a system that also replicates its DNA by the protein-priming mechanism, have been revealed. The genes coding for structural proteins and assembly factors are located in the central part of the Cp-1 genome. Several proteins corresponding to the predicted gene products were identified by in vitro and in vivo expression of the cloned genes. Mature major head protein from the virion particles results from hydrolysis of the primary gene product at the His-49 residue, whereas the phage gene is expressed in Escherichia coli without modification. We have also identified two open reading frames coding for proteins that show high degrees of similarity to the N- and C-terminal regions, respectively, of the single tail protein identified in phi 29. Sequencing and primer extension analysis suggest transcription of a small RNA showing a secondary structure similar to that of the prohead RNA required for the ATP-dependent packaging of phi 29 DNA. On the basis of its temporal expression, transcription of the Cp-1 genome takes place in two stages, early and late. Combined Northern (RNA) blot and primer extension experiments allowed us to map the 5' initiation sites of the transcripts, and we found that only three genes were transcribed from right to left. These analyses reveal that there are also noticeable differences between Cp-l and phi 29 in transcriptional organization. Considered together, the observations reported here provide new tangible evidence on phylogenetic relationships between B. subtilis and S. pneumoniae.

  7. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  8. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    PubMed

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  9. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

    PubMed Central

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  10. Intermediates and the folding of proteins L and G

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, Scott; Head-Gordon, Teresa

    We use a minimalist protein model, in combination with a sequence design strategy, to determine differences in primary structure for proteins L and G that are responsible for the two proteins folding through distinctly different folding mechanisms. We find that the folding of proteins L and G are consistent with a nucleation-condensation mechanism, each of which is described as helix-assisted {beta}-1 and {beta}-2 hairpin formation, respectively. We determine that the model for protein G exhibits an early intermediate that precedes the rate-limiting barrier of folding and which draws together misaligned secondary structure elements that are stabilized by hydrophobic core contactsmore » involving the third {beta}-strand, and presages the later transition state in which the correct strand alignment of these same secondary structure elements is restored. Finally the validity of the targeted intermediate ensemble for protein G was analyzed by fitting the kinetic data to a two-step first order reversible reaction, proving that protein G folding involves an on-pathway early intermediate, and should be populated and therefore observable by experiment.« less

  11. Intermediates and the folding of proteins L and G

    PubMed Central

    Brown, Scott; Head-Gordon, Teresa

    2004-01-01

    We use a minimalist protein model, in combination with a sequence design strategy, to determine differences in primary structure for proteins L and G, which are responsible for the two proteins folding through distinctly different folding mechanisms. We find that the folding of proteins L and G are consistent with a nucleation-condensation mechanism, each of which is described as helix-assisted β-1 and β-2 hairpin formation, respectively. We determine that the model for protein G exhibits an early intermediate that precedes the rate-limiting barrier of folding, and which draws together misaligned secondary structure elements that are stabilized by hydrophobic core contacts involving the third β-strand, and presages the later transition state in which the correct strand alignment of these same secondary structure elements is restored. Finally, the validity of the targeted intermediate ensemble for protein G was analyzed by fitting the kinetic data to a two-step first-order reversible reaction, proving that protein G folding involves an on-pathway early intermediate, and should be populated and therefore observable by experiment. PMID:15044729

  12. Special AT-rich sequence binding protein 1 promotes tumor growth and metastasis of esophageal squamous cell carcinoma.

    PubMed

    Ma, Jun; Wu, Kaiming; Zhao, Zhenxian; Miao, Rong; Xu, Zhe

    2017-03-01

    Esophageal squamous cell carcinoma is one of the most aggressive malignancies worldwide. Special AT-rich sequence binding protein 1 is a nuclear matrix attachment region binding protein which participates in higher order chromatin organization and tissue-specific gene expression. However, the role of special AT-rich sequence binding protein 1 in esophageal squamous cell carcinoma remains unknown. In this study, western blot and quantitative real-time polymerase chain reaction analysis were performed to identify differentially expressed special AT-rich sequence binding protein 1 in a series of esophageal squamous cell carcinoma tissue samples. The effects of special AT-rich sequence binding protein 1 silencing by two short-hairpin RNAs on cell proliferation, migration, and invasion were assessed by the CCK-8 assay and transwell assays in esophageal squamous cell carcinoma in vitro. Special AT-rich sequence binding protein 1 was significantly upregulated in esophageal squamous cell carcinoma tissue samples and cell lines. Silencing of special AT-rich sequence binding protein 1 inhibited the proliferation of KYSE450 and EC9706 cells which have a relatively high level of special AT-rich sequence binding protein 1, and the ability of migration and invasion of KYSE450 and EC9706 cells was distinctly suppressed. Special AT-rich sequence binding protein 1 could be a potential target for the treatment of esophageal squamous cell carcinoma and inhibition of special AT-rich sequence binding protein 1 may provide a new strategy for the prevention of esophageal squamous cell carcinoma invasion and metastasis.

  13. Conserved structural and functional aspects of the tripartite motif gene family point towards therapeutic applications in multiple diseases.

    PubMed

    Gushchina, Liubov V; Kwiatkowski, Thomas A; Bhattacharya, Sayak; Weisleder, Noah L

    2018-05-01

    The tripartite motif (TRIM) gene family is a highly conserved group of E3 ubiquitin ligase proteins that can establish substrate specificity for the ubiquitin-proteasome complex and also have proteasome-independent functions. While several family members were studied previously, it is relatively recent that over 80 genes, based on sequence homology, were grouped to establish the TRIM gene family. Functional studies of various TRIM genes linked these proteins to modulation of inflammatory responses showing that they can contribute to a wide variety of disease states including cardiovascular, neurological and musculoskeletal diseases, as well as various forms of cancer. Given the fundamental role of the ubiquitin-proteasome complex in protein turnover and the importance of this regulation in most aspects of cellular physiology, it is not surprising that TRIM proteins display a wide spectrum of functions in a variety of cellular processes. This broad range of function and the highly conserved primary amino acid sequence of family members, particularly in the canonical TRIM E3 ubiquitin ligase domain, complicates the development of therapeutics that specifically target these proteins. A more comprehensive understanding of the structure and function of TRIM proteins will help guide therapeutic development for a number of different diseases. This review summarizes the structural organization of TRIM proteins, their domain architecture, common and unique post-translational modifications within the family, and potential binding partners and targets. Further discussion is provided on efforts to target TRIM proteins as therapeutic agents and how our increasing understanding of the nature of TRIM proteins can guide discovery of other therapeutics in the future. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Relative stability of major types of beta-turns as a function of amino acid composition: a study based on Ab initio energetic and natural abundance data.

    PubMed

    Perczel, András; Jákli, Imre; McAllister, Michael A; Csizmadia, Imre G

    2003-06-06

    Folding properties of small globular proteins are determined by their amino acid sequence (primary structure). This holds both for local (secondary structure) and for global conformational features of linear polypeptides and proteins composed from natural amino acid derivatives. It thus provides the rational basis of structure prediction algorithms. The shortest secondary structure element, the beta-turn, most typically adopts either a type I or a type II form, depending on the amino acid composition. Herein we investigate the sequence-dependent folding stability of both major types of beta-turns using simple dipeptide models (-Xxx-Yyy-). Gas-phase ab initio properties of 16 carefully selected and suitably protected dipeptide models (for example Val-Ser, Ala-Gly, Ser-Ser) were studied. For each backbone fold most probable side-chain conformers were considered. Fully optimized 321G RHF molecular structures were employed in medium level [B3LYP/6-311++G(d,p)//RHF/3-21G] energy calculations to estimate relative populations of the different backbone conformers. Our results show that the preference for beta-turn forms as calculated by quantum mechanics and observed in Xray determined proteins correlates significantly.

  15. Hemoglobin redux: combining neutron and X-ray diffraction with mass spectrometry to analyse the quaternary state of oxidized hemoglobins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mueser, Timothy C., E-mail: timothy.mueser@utoledo.edu; Griffith, Wendell P.; Kovalevsky, Andrey Y.

    2010-11-01

    X-ray and neutron diffraction studies of cyanomethemoglobin are being used to evaluate the structural waters within the dimer–dimer interface involved in quaternary-state transitions. Improvements in neutron diffraction instrumentation are affording the opportunity to re-examine the structures of vertebrate hemoglobins and to interrogate proton and solvent position changes between the different quaternary states of the protein. For hemoglobins of unknown primary sequence, structural studies of cyanomethemoglobin (CNmetHb) are being used to help to resolve sequence ambiguity in the mass spectra. These studies have also provided additional structural evidence for the involvement of oxidized hemoglobin in the process of erythrocyte senescence. X-raymore » crystal studies of Tibetan snow leopard CNmetHb have shown that this protein crystallizes in the B state, a structure with a more open dyad, which possibly has relevance to RBC band 3 protein binding and erythrocyte senescence. R-state equine CNmetHb crystal studies elaborate the solvent differences in the switch and hinge region compared with a human deoxyhemoglobin T-state neutron structure. Lastly, comparison of histidine protonation between the T and R state should enumerate the Bohr-effect protons.« less

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klenk, Hans-Peter; Lu, Megan; Lucas, Susan

    Saccharomonospora marina Liu et al. 2010 is a member to the genomically so far poorly characterized genus Saccharomonospora in the family Pseudonocardiaceae. Members of the genus Sacharomonospora are of interest because they originate from diverse habitats, such as leaf litter, manure, compost, surface of peat, moist, over-heated grain, and ocean sediment, where they might play a role in the primary degradation of plant material by attacking hemicellulose. Organisms belonging to the genus are usually Gram-positive staining, non-acid fast, and classify among the actinomycetes. Next to S. viridis and S. azurea, S. marina is the third member in the genus Saccharomonosporamore » for with a completely sequenced (permanent draft status) type strain genome will be published. Here we describe the features of this organism, together with the complete genome sequence, and annotation. The 5,965,593 bp long chromosome with its 5,727 protein-coding and 57 RNA genes was sequenced as part of the DOE funded Community Sequencing Program (CSP) 2010 at the Joint Genome Institute (JGI).« less

  17. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

    PubMed

    Roca, Alberto I

    2014-01-01

    The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.

  18. A topological approach for protein classification

    DOE PAGES

    Cang, Zixuan; Mu, Lin; Wu, Kedi; ...

    2015-11-04

    Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.

  19. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  20. Sequence variations of the partially dominant DELLA gene Rht-B1c in wheat and their functional impacts

    PubMed Central

    Ma, Zhengqiang

    2013-01-01

    Rht-B1c, allelic to the DELLA protein-encoding gene Rht-B1a, is a natural mutation documented in common wheat (Triticum aestivum). It confers variation to a number of traits related to cell and plant morphology, seed dormancy, and photosynthesis. The present study was conducted to examine the sequence variations of Rht-B1c and their functional impacts. The results showed that Rht-B1c was partially dominant or co-dominant for plant height, and exhibited an increased dwarfing effect. At the sequence level, Rht-B1c differed from Rht-B1a by one 2kb Veju retrotransposon insertion, three coding region single nucleotide polymorphisms (SNPs), one 197bp insertion, and four SNPs in the 1kb upstream sequence. Haplotype investigations, association analyses, transient expression assays, and expression profiling showed that the Veju insertion was primarily responsible for the extreme dwarfing effect. It was found that the Veju insertion changed processing of the Rht-B1c transcripts and resulted in DELLA motif primary structure disruption. Expression assays showed that Rht-B1c caused reduction of total Rht-1 transcript levels, and up-regulation of GATA-like transcription factors and genes positively regulated by these factors, suggesting that one way in which Rht-1 proteins affect plant growth and development is through GATA-like transcription factor regulation. PMID:23918966

  1. Dissecting the relationship between protein structure and sequence variation

    NASA Astrophysics Data System (ADS)

    Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

    2015-03-01

    Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.

  2. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    PubMed

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  3. Characterization of R132H mutation-specific IDH1 antibody binding in brain tumors.

    PubMed

    Capper, David; Weissert, Susanne; Balss, Jörg; Habel, Antje; Meyer, Jochen; Jäger, Diana; Ackermann, Ulrike; Tessmer, Claudia; Korshunov, Andrey; Zentgraf, Hanswalter; Hartmann, Christian; von Deimling, Andreas

    2010-01-01

    Heterozygous point mutations of isocitrate dehydrogenase (IDH)1 codon 132 are frequent in grade II and III gliomas. Recently, we reported an antibody specific for the IDH1R132H mutation. Here we investigate the capability of this antibody to differentiate wild type and mutated IDH1 protein in central nervous system (CNS) tumors by Western blot and immunohistochemistry. Results of protein analysis are correlated to sequencing data. In Western blot, anti-IDH1R132H mouse monoclonal antibody mIDH1R132H detected a specific band only in mutated tumors. Immunohistochemistry of 345 primary brain tumors demonstrated a strong cytoplasmic and weaker nuclear staining in 122 cases. Correlation with direct sequencing of 186 cases resulted in consensus of 177 cases. Genetic retesting of cases with conflicting findings resulted in a match of 186/186 cases, with all discrepancies resolving in favor of immunohistochemistry. Intriguing is the ability of mIDH1R132H to detect single infiltrating tumor cells. The very high frequency and the distribution of this mutation among specific brain tumor entities allow the highly sensitive and specific discrimination of various tumors by immunohistochemistry, such as anaplastic astrocytoma from primary glioblastoma or diffuse astrocytoma World Health Organization (WHO) grade II from pilocytic astrocytoma or ependymoma. Noteworthy is the discrimination of the infiltrating edge of tumors with IDH1 mutation from reactive gliosis.

  4. Complete nucleotide and derived amino acid sequence of cDNA encoding the mitochondrial uncoupling protein of rat brown adipose tissue: lack of a mitochondrial targeting presequence.

    PubMed Central

    Ridley, R G; Patel, H V; Gerber, G E; Morton, R C; Freeman, K B

    1986-01-01

    A cDNA clone spanning the entire amino acid sequence of the nuclear-encoded uncoupling protein of rat brown adipose tissue mitochondria has been isolated and sequenced. With the exception of the N-terminal methionine the deduced N-terminus of the newly synthesized uncoupling protein is identical to the N-terminal 30 amino acids of the native uncoupling protein as determined by protein sequencing. This proves that the protein contains no N-terminal mitochondrial targeting prepiece and that a targeting region must reside within the amino acid sequence of the mature protein. Images PMID:3012461

  5. LRRC6 Mutation Causes Primary Ciliary Dyskinesia with Dynein Arm Defects

    PubMed Central

    Horani, Amjad; Ferkol, Thomas W.; Shoseyov, David; Wasserman, Mollie G.; Oren, Yifat S.; Kerem, Batsheva; Amirav, Israel; Cohen-Cymberknoh, Malena; Dutcher, Susan K.; Brody, Steven L.; Elpeleg, Orly; Kerem, Eitan

    2013-01-01

    Despite recent progress in defining the ciliome, the genetic basis for many cases of primary ciliary dyskinesia (PCD) remains elusive. We evaluated five children from two unrelated, consanguineous Palestinian families who had PCD with typical clinical features, reduced nasal nitric oxide concentrations, and absent dynein arms. Linkage analyses revealed a single common homozygous region on chromosome 8 and one candidate was conserved in organisms with motile cilia. Sequencing revealed a single novel mutation in LRRC6 (Leucine-rich repeat containing protein 6) that fit the model of autosomal recessive genetic transmission, leading to a change of a highly conserved amino acid from aspartic acid to histidine (Asp146His). LRRC6 was localized to the cytoplasm and was up-regulated during ciliogenesis in human airway epithelial cells in a Foxj1-dependent fashion. Nasal epithelial cells isolated from affected individuals and shRNA-mediated silencing in human airway epithelial cells, showed reduced LRRC6 expression, absent dynein arms, and slowed cilia beat frequency. Dynein arm proteins were either absent or mislocalized to the cytoplasm in airway epithelial cells from a primary ciliary dyskinesia subject. These findings suggest that LRRC6 plays a role in dynein arm assembly or trafficking and when mutated leads to primary ciliary dyskinesia with laterality defects. PMID:23527195

  6. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cang, Zixuan; Mu, Lin; Wu, Kedi

    Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.

  8. The primary structure of the hemoglobin of spectacled bear (Tremarctos ornatus, Carnivora).

    PubMed

    Hofmann, O; Braunitzer, G

    1987-08-01

    The complete primary structure of the alpha- and beta-chains of the hemoglobin of Spectacled Bear (Tremarctos ornatus) is presented. Following cleavage of the heme-protein link and chain separation by RP-HPLC, their amino-acid sequences were determined by Edman degradation in liquid- and gas-phase sequenators. The hemoglobin of Spectacled Bear displays only five amino-acid exchanges to that of Polar Bear (Ursus maritimus, Ursinae) and Asiatic Black Bear (Ursus tibetanus, Ursinae) whereas 8 and 12 replacements, respectively, to Giant Panda (Ailuropoda melanoleuca) and Lesser Panda (Ailurus fulgens) can be found. This clearly demonstrates that the Spectacled Bear, the most aberrant bear of the Ursidae, is somewhat intermediate between Pandas and Ursinae.

  9. Understanding protein evolution: from protein physics to Darwinian selection.

    PubMed

    Zeldovich, Konstantin B; Shakhnovich, Eugene I

    2008-01-01

    Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.

  10. Covalent structure of chicken pepsinogen.

    PubMed

    Baudys, M; Kostka, V

    1983-10-17

    Chicken pepsinogen is a glycoprotein consisting of a single polypeptide chain and containing the following 367 amino acid residues: Asp23, Asn16, Thr26, Ser41, Glu14, Gln11, Pro18, Gly31, Ala17, Cys7, Val25, Met9, Ile23, Leu28, Tyr22, Phe20, His8, Lys17, Arg7, Trp4. The Mr-value of the protein is 42 074. This value includes the carbohydrate moiety of the protein, i.e. Man3, (GlcNAc)7, (-SO3H)5. The primary fragmentation of the molecule was effected by limited trypsinolysis at arginine residues after preceding modification of the lysines with citraconic anhydride. All eight peptides expected in theory were obtained and their size, amino acid composition, and N-terminal amino acid sequence were characterized. To elucidate the amino acid sequence of these large fragments the latter were subjected to secondary cleavage by CNBr, trypsin (after removal of the protecting groups from the lysines), the proteinase from Staphylococcus aureus V8 strain, alpha-chymotrypsin, hydroxylamine, or dilute acid; the resulting peptides were isolated by gel permeation and ion-exchange chromatography and by the fingerprint techniques. Overlaps at sites of the arginine residues were obtained in an earlier study [Baudys, M. & Kostka, V. (1982) Collect. Czech. Chem. Commun. 47, 2814-2832]. Chicken pepsinogen shows the highest degree of homology with the primary structures of pepsinogens A. The internal homologies are apparent in the neighborhood of the two active aspartic acid residues. We have assigned tentatively chicken pepsinogen to the group of pepsinogens A (EC 3.4.23.1); this assignment is a result both of our sequence studies and of an investigation of the kinetic characteristics of the enzyme.

  11. Regulation of polycystin-1 ciliary trafficking by motifs at its C-terminus and polycystin-2 but not by cleavage at the GPS site

    PubMed Central

    Su, Xuefeng; Wu, Maoqing; Yao, Gang; El-Jouni, Wassim; Luo, Chong; Tabari, Azadeh; Zhou, Jing

    2015-01-01

    ABSTRACT Failure to localize membrane proteins to the primary cilium causes a group of diseases collectively named ciliopathies. Polycystin-1 (PC1, also known as PKD1) is a large ciliary membrane protein defective in autosomal dominant polycystic kidney disease (ADPKD). Here, we developed a large set of PC1 expression constructs and identified multiple sequences, including a coiled-coil motif in the C-terminal tail of PC1, regulating full-length PC1 trafficking to the primary cilium. Ciliary trafficking of wild-type and mutant PC1 depends on the dose of polycystin-2 (PC2, also known as PKD2), and the formation of a PC1–PC2 complex. Modulation of the ciliary trafficking module mediated by the VxP ciliary-targeting sequence and Arf4 and Asap1 does not affect the ciliary localization of full-length PC1. PC1 also promotes PC2 ciliary trafficking. PC2 mutations truncating its C-terminal tail but not those changing the VxP sequence to AxA or impairing the pore of the channel, leading to a dead channel, affect PC1 ciliary trafficking. Cleavage at the GPCR proteolytic site (GPS) of PC1 is not required for PC1 trafficking to cilia. We propose a mutually dependent model for the ciliary trafficking of PC1 and PC2, and that PC1 ciliary trafficking is regulated by multiple cis-acting elements. As all pathogenic PC1 mutations tested here are defective in ciliary trafficking, ciliary trafficking might serve as a functional read-out for ADPKD. PMID:26430213

  12. Regulation of polycystin-1 ciliary trafficking by motifs at its C-terminus and polycystin-2 but not by cleavage at the GPS site.

    PubMed

    Su, Xuefeng; Wu, Maoqing; Yao, Gang; El-Jouni, Wassim; Luo, Chong; Tabari, Azadeh; Zhou, Jing

    2015-11-15

    Failure to localize membrane proteins to the primary cilium causes a group of diseases collectively named ciliopathies. Polycystin-1 (PC1, also known as PKD1) is a large ciliary membrane protein defective in autosomal dominant polycystic kidney disease (ADPKD). Here, we developed a large set of PC1 expression constructs and identified multiple sequences, including a coiled-coil motif in the C-terminal tail of PC1, regulating full-length PC1 trafficking to the primary cilium. Ciliary trafficking of wild-type and mutant PC1 depends on the dose of polycystin-2 (PC2, also known as PKD2), and the formation of a PC1-PC2 complex. Modulation of the ciliary trafficking module mediated by the VxP ciliary-targeting sequence and Arf4 and Asap1 does not affect the ciliary localization of full-length PC1. PC1 also promotes PC2 ciliary trafficking. PC2 mutations truncating its C-terminal tail but not those changing the VxP sequence to AxA or impairing the pore of the channel, leading to a dead channel, affect PC1 ciliary trafficking. Cleavage at the GPCR proteolytic site (GPS) of PC1 is not required for PC1 trafficking to cilia. We propose a mutually dependent model for the ciliary trafficking of PC1 and PC2, and that PC1 ciliary trafficking is regulated by multiple cis-acting elements. As all pathogenic PC1 mutations tested here are defective in ciliary trafficking, ciliary trafficking might serve as a functional read-out for ADPKD. © 2015. Published by The Company of Biologists Ltd.

  13. A highly divergent gene cluster in honey bees encodes a novel silk family.

    PubMed

    Sutherland, Tara D; Campbell, Peter M; Weisman, Sarah; Trueman, Holly E; Sriskantha, Alagacone; Wanjura, Wolfgang J; Haritos, Victoria S

    2006-11-01

    The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin1-4) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-rich amid low GC intergenic regions. The genes encode similar proteins that are highly helical and predicted to form unusually tight coiled coils. Despite the similarity in size, structure, and composition of the encoded proteins, the genes have low primary sequence identity. We propose that the four fiber genes have arisen from gene duplication events but have subsequently diverged significantly. The silk-associated genes encode proteins likely to act as a glue (AmelSA1) and involved in silk processing (AmelSA2). Although the silks of honey bees and silkmoths both originate in larval labial glands, the silk proteins are completely different in their primary, secondary, and tertiary structures as well as the genomic arrangement of the genes encoding them. This implies independent evolutionary origins for these functionally related proteins.

  14. Ligatoxin B, a new cytotoxic protein with a novel helix-turn-helix DNA-binding domain from the mistletoe Phoradendron liga.

    PubMed Central

    Li, Shi-Sheng; Gullbo, Joachim; Lindholm, Petra; Larsson, Rolf; Thunberg, Eva; Samuelsson, Gunnar; Bohlin, Lars; Claeson, Per

    2002-01-01

    A new basic protein, designated ligatoxin B, containing 46 amino acid residues has been isolated from the mistletoe Phoradendron liga (Gill.) Eichl. (Viscaceae). The protein's primary structure, determined unambiguously using a combination of automated Edman degradation, trypsin enzymic digestion, and tandem MS analysis, was 1-KSCCPSTTAR-NIYNTCRLTG-ASRSVCASLS-GCKIISGSTC-DSGWNH-46. Ligatoxin B exhibited in vitro cytotoxic activities on the human lymphoma cell line U-937-GTB and the primary multidrug-resistant renal adenocarcinoma cell line ACHN, with IC50 values of 1.8 microM and 3.2 microM respectively. Sequence alignment with other thionins identified a new member of the class 3 thionins, ligatoxin B, which is similar to the earlier described ligatoxin A. As predicted by the method of homology modelling, ligatoxin B shares a three-dimensional structure with the viscotoxins and purothionins and so may have the same mode of cytotoxic action. The novel similarities observed by structural comparison of the helix-turn-helix (HTH) motifs of the thionins, including ligatoxin B, and the HTH DNA-binding proteins, led us to propose the working hypothesis that thionins represent a new group of DNA-binding proteins. This working hypothesis could be useful in further dissecting the molecular mechanisms of thionin cytotoxicity and of thionin opposition to multidrug resistance, and useful in clarifying the physiological function of thionins in plants. PMID:12049612

  15. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  16. Diversity of the P2 protein among nontypeable Haemophilus influenzae isolates.

    PubMed Central

    Bell, J; Grass, S; Jeanteur, D; Munson, R S

    1994-01-01

    The genes for outer membrane protein P2 of four nontypeable Haemophilus influenzae strains were cloned and sequenced. The derived amino acid sequences were compared with the outer membrane protein P2 sequence from H. influenzae type b MinnA and the sequences of P2 from three additional nontypeable H. influenzae strains. The sequences were 76 to 94% identical. The sequences had regions with considerable variability separated by regions which were highly conserved. The variable regions mapped to putative surface-exposed loops of the protein. PMID:8188390

  17. Sequence of the chloroplast 16S rRNA gene and its surrounding regions of Chlamydomonas reinhardii.

    PubMed Central

    Dron, M; Rahire, M; Rochaix, J D

    1982-01-01

    The sequence of a 2 kb DNA fragment containing the chloroplast 16S ribosomal RNA gene from Chlamydomonas reinhardii and its flanking regions has been determined. The algal 16S rRNA sequence (1475 nucleotides) and secondary structure are highly related to those found in bacteria and in the chloroplasts of higher plants. In contrast, the flanking regions are very different. In C. reinhardii the 16S rRNA gene is surrounded by AT rich segments of about 180 bases, which are followed by a long stretch of complementary bases separated from each other by 1833 nucleotides. It is likely that these structures play an important role in the folding and processing of the precursor of 16S rRNA. The primary and secondary structures of the binding sites of two ribosomal proteins in the 16SrRNAs of E. coli and C. reinhardii are considerably related. Images PMID:6296784

  18. Amino acid sequence of tyrosinase from Neurospora crassa.

    PubMed Central

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  19. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    PubMed

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  20. Molecular characteristics of insect vitellogenins and vitellogenin receptors.

    PubMed

    Sappington, T W; Raikhel, A S

    1998-01-01

    The recent cloning and sequencing of several insect vitellogenins (Vg), the major yolk protein precursor of most oviparous animals, and the mosquito Vg receptor (VgR) has brought the study of insect vitellogenesis to a new plane. Insect Vgs are homologous to nematode and vertebrate Vgs. All but one of the insect Vgs for which we know the primary structure are cleaved into two subunits at a site [(R/K)X(R/K)R or RXXR with an adjacent beta-turn] recognized by subtilisin-like proprotein convertases. In four of the Vgs, the cleavage site is near the N-terminus, but in one insect species, it is near the C-terminus of the Vg precursor. Multiple alignments of these Vg sequences indicate that the variation in cleavage location has not arisen through exon shuffling, but through local modifications of the amino acid sequences. A wasp Vg precursor is not cleaved, apparently because the sequence at the presumed ancestral cleavage site has been mutated from RXRR to LYRR and is no longer recognized by convertases. Some insect Vgs contain polyserine domains which are reminiscent of, but not homologous to, the phosvitin domain in vertebrate Vgs. The sequence of the mosquito VgR revealed that it is a member of the low-density lipoprotein receptor (LDLR) family. Though resembling chicken and frog VgRs, which are also members of the LDLR family, it is twice as big, carrying two clusters of cysteine-rich complement-type (Class A) repeats (implicated in ligand-binding) instead of one like vertebrate VgRs and LDLRs. It is very similar in sequence and domain arrangement to the Drosophila yolk protein receptor (YPR), despite a non-vitellogenin ligand for the latter. Though vertebrate VgRs, insect VgR/YPRs, and LDLR-related proteins/megalins all accommodate one cluster of eight Class A repeats, fingerprint analysis of the repeats in these clusters indicate they are not directly homologous with one another, but have undergone differing histories of duplications, deletions, and exon shuffling so that their apparent similarity is superficial. The so-called epidermal growth factor precursor region contains two types of motifs (cysteine-rich Class B repeats and YWXD repeats) which occur independently of one another in diverse proteins, and are often involved in protein-protein interactions, suggesting that they potentially are involved in dimerization of VgRs and other LDLR-family proteins. Like the LDLR, but unlike vertebrate VgRs and the Drosophila YPR, the mosquito VgR contains a putative O-linked sugar region on the extra-cellular side of the transmembrane domain. Its function is unclear, but may protect the receptor from membrane-bound proteases. The cytoplasmic tail of insect VgR/YPRs contains a di-leucine (or leucine-isoleucine) internalization signal, unlike the tight-turn tyrosine motif of other LDLR-family proteins. The importance of understanding the details of yolk protein uptake by oocytes lies in its potential for exploitation in novel insect control strategies, and the molecular characterization of the proteins involved has made the development of such strategies a realistic possibility.

  1. Hepatitis C Virus E2 Protein Induces Upregulation of IL-8 Pathways and Production of Heat Shock Proteins in Human Thyroid Cells.

    PubMed

    Hammerstad, Sara Salehi; Stefan, Mihaela; Blackard, Jason; Owen, Randall P; Lee, Hanna J; Concepcion, Erlinda; Yi, Zhengzi; Zhang, Weijia; Tomer, Yaron

    2017-02-01

    Thyroiditis is one of the most common extrahepatic manifestations of hepatitis C virus (HCV) infection. By binding to surface cell receptor CD81, HCV envelope glycoprotein E2 mediates entry of HCV into cells. Studies have shown that different viral proteins may individually induce host responses to infection. We hypothesized that HCV E2 protein binding to CD81 expressed on thyroid cells activates a cascade of inflammatory responses that can trigger autoimmune thyroiditis in susceptible individuals. Human thyroid cell lines ML-1 and human thyrocytes in primary cell culture were treated with HCV recombinant E2 protein. The expression of major proinflammatory cytokines was measured at the messenger RNA and protein levels. Next-generation transcriptome analysis was used to identify early changes in gene expression in thyroid cells induced by E2. HCV envelope protein E2 induced strong inflammatory responses in human thyrocytes, resulting in production of interleukin (IL)-8, IL-6, and tumor necrosis factor-α. Furthermore, the E2 protein induced production of several heat shock proteins including HSP60, HSP70p12A, and HSP10, in human primary thyrocytes. In thyroid cell line ML-1, RNA sequencing identified upregulation of molecules involved in innate immune pathways with high levels of proinflammatory cytokines and chemokines and increased expression of costimulatory molecules, specifically CD40, known to be a major thyroid autoimmunity gene. Our data support a key role for HCV envelope protein E2 in triggering thyroid autoimmunity through activation of cytokine pathways by bystander mechanisms. Copyright © 2017 by the Endocrine Society

  2. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  3. Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase).

    PubMed

    Odronitz, Florian; Kollmar, Martin

    2006-11-29

    Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.

  4. Primary structure of prostaglandin G/H synthase from sheep vesicular gland determined from the complementary DNA sequence.

    PubMed Central

    DeWitt, D L; Smith, W L

    1988-01-01

    Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548

  5. Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

    PubMed

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-12-27

    Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.

  6. Aminoacyl-tRNA synthetases database Y2K

    PubMed Central

    Szymanski, Maciej; Barciszewski, Jan

    2000-01-01

    The aminoacyl-tRNA synthetases (AARS) are a diverse group of enzymes that ensure the fidelity of transfer of genetic information from DNA into protein. They catalyse the attachment of amino acids to transfer RNAs and thereby establish the rules of the genetic code by virtue of matching the nucleotide triplet of the anticodon with its cognate amino acid. Currently, 818 AARS primary structures have been reported from archaebacteria, eubacteria, mitochondria, chloroplasts and eukaryotic cells. The database is a compilation of the amino acid sequences of all AARSs, known to date, which are available as separate entries or alignments of related proteins via the WWW at http://rose.man.poznan.pl/aars/index.html PMID:10592262

  7. Aminoacyl-tRNA synthetases database Y2K.

    PubMed

    Szymanski, M; Barciszewski, J

    2000-01-01

    The aminoacyl-tRNA synthetases (AARS) are a diverse group of enzymes that ensure the fidelity of transfer of genetic information from DNA into protein. They catalyse the attachment of amino acids to transfer RNAs and thereby establish the rules of the genetic code by virtue of matching the nucleotide triplet of the anticodon with its cognate amino acid. Currently, 818 AARS primary structures have been reported from archaebacteria, eubacteria, mitochondria, chloro-plasts and eukaryotic cells. The database is a compilation of the amino acid sequences of all AARSs, known to date, which are available as separate entries or alignments of related proteins via the WWW at http://rose.man.poznan.pl/aars/index.html

  8. Stability of the Influenza Virus Hemagglutinin Protein Correlates with Evolutionary Dynamics.

    PubMed

    Klein, Eili Y; Blumenkrantz, Deena; Serohijos, Adrian; Shakhnovich, Eugene; Choi, Jeong-Mo; Rodrigues, João V; Smith, Brendan D; Lane, Andrew P; Feldman, Andrew; Pekosz, Andrew

    2018-01-01

    Protein thermodynamics are an integral determinant of viral fitness and one of the major drivers of protein evolution. Mutations in the influenza A virus (IAV) hemagglutinin (HA) protein can eliminate neutralizing antibody binding to mediate escape from preexisting antiviral immunity. Prior research on the IAV nucleoprotein suggests that protein stability may constrain seasonal IAV evolution; however, the role of stability in shaping the evolutionary dynamics of the HA protein has not been explored. We used the full coding sequence of 9,797 H1N1pdm09 HA sequences and 16,716 human seasonal H3N2 HA sequences to computationally estimate relative changes in the thermal stability of the HA protein between 2009 and 2016. Phylogenetic methods were used to characterize how stability differences impacted the evolutionary dynamics of the virus. We found that pandemic H1N1 IAV strains split into two lineages that had different relative HA protein stabilities and that later variants were descended from the higher-stability lineage. Analysis of the mutations associated with the selective sweep of the higher-stability lineage found that they were characterized by the early appearance of highly stabilizing mutations, the earliest of which was not located in a known antigenic site. Experimental evidence further suggested that H1N1 HA stability may be correlated with in vitro virus production and infection. A similar analysis of H3N2 strains found that surviving lineages were also largely descended from viruses predicted to encode more-stable HA proteins. Our results suggest that HA protein stability likely plays a significant role in the persistence of different IAV lineages. IMPORTANCE One of the constraints on fast-evolving viruses, such as influenza virus, is protein stability, or how strongly the folded protein holds together. Despite the importance of this protein property, there has been limited investigation of the impact of the stability of the influenza virus hemagglutinin protein-the primary antibody target of the immune system-on its evolution. Using a combination of computational estimates of stability and experiments, our analysis found that viruses with more-stable hemagglutinin proteins were associated with long-term persistence in the population. There are two potential reasons for the observed persistence. One is that more-stable proteins tolerate destabilizing mutations that less-stable proteins could not, thus increasing opportunities for immune escape. The second is that greater stability increases the fitness of the virus through increased production of infectious particles. Further research on the relative importance of these mechanisms could help inform the annual influenza vaccine composition decision process.

  9. Sequence repeats and protein structure

    NASA Astrophysics Data System (ADS)

    Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

    2012-11-01

    Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.

  10. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    PubMed Central

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  11. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    PubMed

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  12. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  13. Demonstration of GTG as an endogenous initiation codon for a human mRNA transcript revealed by molecular cloning of the serpin endopin 2B.

    PubMed

    Hwang, Shin-Rong; Garza, Christina Z; Wegrzyn, Jill; Hook, Vivian Y H

    2004-08-16

    This study demonstrates utilization of the novel GTG initiation codon for translation of a human mRNA transcript that encodes the serpin endopin 2B, a protease inhibitor. Molecular cloning revealed the nucleotide sequence of the human endopin 2B cDNA. Its deduced primary sequence shows high homology to bovine endopin 2A that possesses cross-class protease inhibition of elastase and papain. Notably, the human endopin 2B cDNA sequence revealed GTG as the predicted translation initiation codon; the predicted translation product of 46 kDa endopin 2B was produced by in vitro translation of 35S-endopin 2B with mammalian (rabbit) protein translation components. Importantly, bioinformatic studies demonstrated the presence of the entire human endopin 2B cDNA sequence with GTG as initiation codon within the human genome on chromosome 14. Further evidence for GTG as a functional initiation codon was illustrated by GTG-mediated in vitro translation of the heterologous protein EGFP, and by GTG-mediated expression of EGFP in mammalian PC12 cells. Mutagenesis of GTG to GTC resulted in the absence of EGFP expression in PC12 cells, indicating the function of GTG as an initiation codon. In addition, it was apparent that the GTG initiation codon produces lower levels of translated protein compared to ATG as initiation codon. Significantly, GTG-mediated translation of endopin 2B demonstrates a functional human gene product not previously predicted from initial analyses of the human genome. Further analyses based on GTG as an alternative initiation codon may predict new candidate genes of the human genome.

  14. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    PubMed Central

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4−/− mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases. PMID:25644381

  15. Cloning and expression of a CYP720B orthologue involved in the biosynthesis of diterpene resin acids in Pinus brutia.

    PubMed

    Semiz, Asli; Sen, Alaattin

    2015-03-01

    Cytochrome P450 monooxygenases mediate a broad range of oxidative reactions involved in the biosynthesis of both primary and secondary metabolites in plants. Until now, only two P450 genes, CYP720B1 from Pinus taeda and CYP720B4 from Picea sitchensis, have been functionally characterised and described in the literature. The purpose of this study was to describe the cloning and expression of CYP720B from Pinus brutia due to its suggested role in the synthesis of bioactive compounds used for chemical defence against insects. A PCR product of the P. brutia CYP720B gene was cloned into the pCR8/GW/TOPO cloning vector. After optimising the sequence for codon usage in yeast, it was transferred into the inducible expression vector pYES-DEST52 and transfected into the S. cerevisiae INVSc1 strain. Sequence analysis showed that the P. brutia CYP720B gene contains an open reading frame of 1,464 nucleotides, which encodes a 53,570 Da putative protein of 487 amino acid residues. The putative protein contains the classic heme-binding sequence motif that is conserved in all P450 enzymes. It shares 99 and 61% identity with the deduced amino acid sequences of CYP720B1 from Pinus taeda and CYP720B4 from Picea sitchensis, respectively. Recombinant CYP720B protein expression was confirmed using western blot analysis. Furthermore, recombinant CYP720B was functionally active, showing a Soret peak at approximately 448 nm in the reduced CO difference spectra. These data suggest that the cloned gene is an orthologue of CYP720B in P. brutia and might be involved in DRA biosynthesis.

  16. Dynamics of domain coverage of the protein sequence universe.

    PubMed

    Rekapalli, Bhanu; Wuichet, Kristin; Peterson, Gregory D; Zhulin, Igor B

    2012-11-16

    The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its "dark matter". Here we suggest that true size of "dark matter" is much larger than stated by current definitions. We propose an approach to reducing the size of "dark matter" by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of "dark matter"; however, its absolute size increases substantially with the growth of sequence data.

  17. The prokaryotic antecedents of the ubiquitin-signaling system and the early evolution of ubiquitin-like β-grasp domains

    PubMed Central

    Iyer, Lakshminarayan M; Burroughs, A Maxwell; Aravind, L

    2006-01-01

    Background Ubiquitin (Ub)-mediated signaling is one of the hallmarks of all eukaryotes. Prokaryotic homologs of Ub (ThiS and MoaD) and E1 ligases have been studied in relation to sulfur incorporation reactions in thiamine and molybdenum/tungsten cofactor biosynthesis. However, there is no evidence for entire protein modification systems with Ub-like proteins and deconjugation by deubiquitinating enzymes in prokaryotes. Hence, the evolutionary assembly of the eukaryotic Ub-signaling apparatus remains unclear. Results We systematically analyzed prokaryotic Ub-related β-grasp fold proteins using sensitive sequence profile searches and structural analysis. Consequently, we identified novel Ub-related proteins beyond the characterized ThiS, MoaD, TGS, and YukD domains. To understand their functional associations, we sought and recovered several conserved gene neighborhoods and domain architectures. These included novel associations involving diverse sulfur metabolism proteins, siderophore biosynthesis and the gene encoding the transfer mRNA binding protein SmpB, as well as domain fusions between Ub-like domains and PIN-domain related RNAses. Most strikingly, we found conserved gene neighborhoods in phylogenetically diverse bacteria combining genes for JAB domains (the primary de-ubiquitinating isopeptidases of the proteasomal complex), along with E1-like adenylating enzymes and different Ub-related proteins. Further sequence analysis of other conserved genes in these neighborhoods revealed several Ub-conjugating enzyme/E2-ligase related proteins. Genes for an Ub-like protein and a JAB domain peptidase were also found in the tail assembly gene cluster of certain caudate bacteriophages. Conclusion These observations imply that members of the Ub family had already formed strong functional associations with E1-like proteins, UBC/E2-related proteins, and JAB peptidases in the bacteria. Several of these Ub-like proteins and the associated protein families are likely to function together in signaling systems just as in eukaryotes. PMID:16859499

  18. The proteomic landscape of triple-negative breast cancer.

    PubMed

    Lawrence, Robert T; Perez, Elizabeth M; Hernández, Daniel; Miller, Chris P; Haas, Kelsey M; Irie, Hanna Y; Lee, Su-In; Blau, C Anthony; Villén, Judit

    2015-04-28

    Triple-negative breast cancer is a heterogeneous disease characterized by poor clinical outcomes and a shortage of targeted treatment options. To discover molecular features of triple-negative breast cancer, we performed quantitative proteomics analysis of twenty human-derived breast cell lines and four primary breast tumors to a depth of more than 12,000 distinct proteins. We used this data to identify breast cancer subtypes at the protein level and demonstrate the precise quantification of biomarkers, signaling proteins, and biological pathways by mass spectrometry. We integrated proteomics data with exome sequence resources to identify genomic aberrations that affect protein expression. We performed a high-throughput drug screen to identify protein markers of drug sensitivity and understand the mechanisms of drug resistance. The genome and proteome provide complementary information that, when combined, yield a powerful engine for therapeutic discovery. This resource is available to the cancer research community to catalyze further analysis and investigation. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  19. A Thermoacidophile-Specific Protein Family, DUF3211, Functions as a Fatty Acid Carrier with Novel Binding Mode

    PubMed Central

    Miyakawa, Takuya; Sawano, Yoriko; Miyazono, Ken-ichi; Miyauchi, Yumiko; Hatano, Ken-ichi

    2013-01-01

    STK_08120 is a member of the thermoacidophile-specific DUF3211 protein family from Sulfolobus tokodaii strain 7. Its molecular function remains obscure, and sequence similarities for obtaining functional remarks are not available. In this study, the crystal structure of STK_08120 was determined at 1.79-Å resolution to predict its probable function using structure similarity searches. The structure adopts an α/β structure of a helix-grip fold, which is found in the START domain proteins with cavities for hydrophobic substrates or ligands. The detailed structural features implied that fatty acids are the primary ligand candidates for STK_08120, and binding assays revealed that the protein bound long-chain saturated fatty acids (>C14) and their trans-unsaturated types with an affinity equal to that for major fatty acid binding proteins in mammals and plants. Moreover, the structure of an STK_08120-myristic acid complex revealed a unique binding mode among fatty acid binding proteins. These results suggest that the thermoacidophile-specific protein family DUF3211 functions as a fatty acid carrier with a novel binding mode. PMID:23836863

  20. Identification of a novel MIP frameshift mutation associated with congenital cataract in a Chinese family by whole-exome sequencing and functional analysis.

    PubMed

    Long, Xigui; Huang, Yanru; Tan, Hu; Li, Zhuo; Zhang, Rui; Linpeng, Siyuan; Lv, Weigang; Cao, Yingxi; Li, Haoxian; Liang, Desheng; Wu, Lingqian

    2018-04-26

    To detect the underlying pathogenesis of congenital cataract in a four-generation Chinese family. Whole-exome sequencing (WES) of family members (III:4, IV:4, and IV:6) was performed. Sanger sequencing and bioinformatics analysis were subsequently conducted. Full-length WT-MIP or K228fs-MIP fused to HA markers at the N-terminal was transfected into HeLa cells. Next, quantitative real-time PCR, western blotting and immunofluorescence confocal laser scanning were performed. The age of onset for nonsyndromic cataracts in male patients was by 1-year old, earlier than for female patients, who exhibited onset at adulthood. A novel c.682_683delAA (p.K228fs230X) mutation in main intrinsic protein (MIP) cosegregated with the cataract phenotype. The instability index and unfolded states for truncated MIP were predicted to increase by bioinformatics analysis. The mRNA transcription level of K228fs-MIP was reduced compared with that of WT-MIP, and K228fs-MIP protein expression was also lower than that of WT-MIP. Immunofluorescence images showed that WT-MIP principally localized to the plasma membrane, whereas the mutant protein was trapped in the cytoplasm. Our study generated genetic and primary functional evidence for a novel c.682_683delAA mutation in MIP that expands the variant spectrum of MIP and help us better understand the molecular basis of cataract.

  1. Recombinant protein secretion in Pseudozyma flocculosa and Pseudozyma antarctica with a novel signal peptide.

    PubMed

    Cheng, Yali; Avis, Tyler J; Bolduc, Sébastien; Zhao, Yingyi; Anguenot, Raphaël; Neveu, Bertrand; Labbé, Caroline; Belzile, François; Bélanger, Richard R

    2008-12-01

    Secretion of recombinant proteins aims to reproduce the correct posttranslational modifications of the expressed protein while simplifying its recovery. In this study, secretion signal sequences from an abundantly secreted 34-kDa protein (P34) from Pseudozyma flocculosa were cloned. The efficiency of these sequences in the secretion of recombinant green fluorescent protein (GFP) was investigated in two Pseudozyma species and compared with other secretion signal sequences, from S. cerevisiae and Pseudozyma spp. The results indicate that various secretion signal sequences were functional and that the P34 signal peptide was the most effective secretion signal sequence in both P. flocculosa and P. antarctica. The cells correctly processed the secretion signal sequences, including P34 signal peptide, and mature GFP was recovered from the culture medium. This is the first report of functional secretion signal sequences in P. flocculosa. These sequences can be used to test the secretion of other recombinant proteins and for studying the secretion pathway in P. flocculosa and P. antarctica.

  2. A homozygous FANCM mutation underlies a familial case of non-syndromic primary ovarian insufficiency

    PubMed Central

    Caburet, Sandrine; Guigon, Celine; Mäkinen, Marika; Tanner, Laura; Hietala, Marja; Urbanska, Kaja; Bellutti, Laura; Legois, Bérangère; Bessieres, Bettina; Gougeon, Alain; Benachi, Alexandra; Livera, Gabriel; Rosselli, Filippo

    2017-01-01

    Primary Ovarian Insufficiency (POI) affects ~1% of women under forty. Exome sequencing of two Finnish sisters with non-syndromic POI revealed a homozygous mutation in FANCM, leading to a truncated protein (p.Gln1701*). FANCM is a DNA-damage response gene whose heterozygous mutations predispose to breast cancer. Compared to the mother's cells, the patients’ lymphocytes displayed higher levels of basal and mitomycin C (MMC)-induced chromosomal abnormalities. Their lymphoblasts were hypersensitive to MMC and MMC-induced monoubiquitination of FANCD2 was impaired. Genetic complementation of patient's cells with wild-type FANCM improved their resistance to MMC re-establishing FANCD2 monoubiquitination. FANCM was more strongly expressed in human fetal germ cells than in somatic cells. FANCM protein was preferentially expressed along the chromosomes in pachytene cells, which undergo meiotic recombination. This mutation may provoke meiotic defects leading to a depleted follicular stock, as in Fancm-/- mice. Our findings document the first Mendelian phenotype due to a biallelic FANCM mutation. PMID:29231814

  3. Assessment of the Requirements for Magnesium Transporters in Bacillus subtilis

    PubMed Central

    Wakeman, Catherine A.; Goodson, Jonathan R.; Zacharia, Vineetha M.

    2014-01-01

    Magnesium is the most abundant divalent metal in cells and is required for many structural and enzymatic functions. For bacteria, at least three families of proteins function as magnesium transporters. In recent years, it has been shown that a subset of these transport proteins is regulated by magnesium-responsive genetic control elements. In this study, we investigated the cellular requirements for magnesium homeostasis in the model microorganism Bacillus subtilis. Putative magnesium transporter genes were mutationally disrupted, singly and in combination, in order to assess their general importance. Mutation of only one of these genes resulted in strong dependency on supplemental extracellular magnesium. Notably, this transporter gene, mgtE, is known to be under magnesium-responsive genetic regulatory control. This suggests that the identification of magnesium-responsive genetic mechanisms may generally denote primary transport proteins for bacteria. To investigate whether B. subtilis encodes yet additional classes of transport mechanisms, suppressor strains that permitted the growth of a transporter-defective mutant were identified. Several of these strains were sequenced to determine the genetic basis of the suppressor phenotypes. None of these mutations occurred in transport protein homologues; instead, they affected housekeeping functions, such as signal recognition particle components and ATP synthase machinery. From these aggregate data, we speculate that the mgtE protein provides the primary route of magnesium import in B. subtilis and that the other putative transport proteins are likely to be utilized for more-specialized growth conditions. PMID:24415722

  4. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331

  5. Prediction of glycolipid-binding domains from the amino acid sequence of lipid raft-associated proteins: application to HpaA, a protein involved in the adhesion of Helicobacter pylori to gastrointestinal cells.

    PubMed

    Fantini, Jacques; Garmy, Nicolas; Yahi, Nouara

    2006-09-12

    Protein-glycolipid interactions mediate the attachment of various pathogens to the host cell surface as well as the association of numerous cellular proteins with lipid rafts. Thus, it is of primary importance to identify the protein domains involved in glycolipid recognition. Using structure similarity searches, we could identify a common glycolipid-binding domain in the three-dimensional structure of several proteins known to interact with lipid rafts. Yet the three-dimensional structure of most raft-targeted proteins is still unknown. In the present study, we have identified a glycolipid-binding domain in the amino acid sequence of a bacterial adhesin (Helicobacter pylori adhesin A, HpaA). The prediction was based on the major properties of the glycolipid-binding domains previously characterized by structural searches. A short (15-mer) synthetic peptide corresponding to this putative glycolipid-binding domain was synthesized, and we studied its interaction with glycolipid monolayers at the air-water interface. The synthetic HpaA peptide recognized LacCer but not Gb3. This glycolipid specificity was in line with that of the whole bacterium. Molecular modeling studies gave some insights into this high selectivity of interaction. It also suggested that Phe147 in HpaA played a key role in LacCer recognition, through sugar-aromatic CH-pi stacking interactions with the hydrophobic side of the galactose ring of LacCer. Correspondingly, the replacement of Phe147 with Ala strongly affected LacCer recognition, whereas substitution with Trp did not. Our method could be used to identify glycolipid-binding domains in microbial and cellular proteins interacting with lipid shells, rafts, and other specialized membrane microdomains.

  6. The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

    NASA Astrophysics Data System (ADS)

    Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

    2017-01-01

    Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.

  7. Nucleotide sequence of soybean chloroplast DNA regions which contain the psb A and trn H genes and cover the ends of the large single copy region and one end of the inverted repeats.

    PubMed

    Spielmann, A; Stutz, E

    1983-10-25

    The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2.

  8. Modeling repetitive, non‐globular proteins

    PubMed Central

    Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun

    2016-01-01

    Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323

  9. Evaluating the protein coding potential of exonized transposable element sequences

    PubMed Central

    Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

    2007-01-01

    Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258

  10. Lack of Association of Mutations in Optineurin With Disease in Patients With Adult-onset Primary Open-angle Glaucoma

    PubMed Central

    Wiggs, Janey L.; Auguste, Josette; Allingham, R. Rand; Flor, Jason D.; Pericak-Vance, Margaret A.; Rogers, Kathryn; LaRocque, Karen R.; Graham, Felicia L.; Broomer, Bob; Del Bono, Elizabeth; Haines, Jonathan L.; Hauser, Michael

    2005-01-01

    Objective: To determine whether mutations in the optineurin gene contribute to susceptibility to adult-onset primary open-angle glaucoma. Methods: The optineurin gene was screened in 86 probands with adult-onset primary open-angle glaucoma and in 80 age-matched control subjects. Exons 4 and 5, containing the recurrent mutations identified in patients with normal-tension glaucoma, were sequenced in all individuals studied, while the remaining exons were screened for DNA sequence variants with denaturing high-performance liquid chromatography. Results: The recurrent mutation, Met98Lys, previously found to be associated with an increased risk of disease was found in 8 (9%) of 86 probands. We also found the Met98Lys mutation in 10% of individuals from a control population of similar age, sex, and ethnicity. Consistent segregation of the mutation with the disease was not demonstrated in any of the 8 families. No other DNA changes altering the amino acid structure of the protein were found. Conclusion: The mutations in the optineurin gene associated with normal-tension glaucoma are not associated with adult-onset primary open-angle glaucoma in this patient population. Clinical Relevance: Genetic abnormalities that render the optic nerve susceptible to degeneration are excellent candidates for genetic factors that could contribute to adult-onset primary open-angle glaucoma. Mutations in optineurin have been associated with normal-tension glaucoma, but are not associated with disease in patients with adult-onset primary open-angle glaucoma. This result may indicate that normal-tension glaucoma is not necessarily part of the phenotypic spectrum of adult open-angle glaucoma. PMID:12912697

  11. Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1.

    PubMed Central

    Le Gall, O; Candresse, T; Brault, V; Dunez, J

    1989-01-01

    The nucleotide sequence of the RNA1 of hungarian grapevine chrome mosaic virus, a nepovirus very closely related to tomato black ring virus, has been determined from cDNA clones. It is 7212 nucleotides in length excluding the 3' terminal poly(A) tail and contains a large open reading frame extending from nucleotides 216 to 6971. The presumably encoded polyprotein is 2252 amino acids in length with a molecular weight of 250 kDa. The primary structure of the polyprotein was compared with that of other viral polyproteins, revealing the same general genetic organization as that of other picorna-like viruses (comoviruses, potyviruses and picornaviruses), except that an additional protein is suspected to occupy the N-terminus of the polyprotein. PMID:2798128

  12. On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

    PubMed Central

    Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

    2013-01-01

    The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608

  13. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

    PubMed Central

    2014-01-01

    Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393

  14. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  15. Use of designed sequences in protein structure recognition.

    PubMed

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  16. Increasing Sequence Diversity with Flexible Backbone Protein Design: The Complete Redesign of a Protein Hydrophobic Core

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murphy, Grant S.; Mills, Jeffrey L.; Miley, Michael J.

    2015-10-15

    Protein design tests our understanding of protein stability and structure. Successful design methods should allow the exploration of sequence space not found in nature. However, when redesigning naturally occurring protein structures, most fixed backbone design algorithms return amino acid sequences that share strong sequence identity with wild-type sequences, especially in the protein core. This behavior places a restriction on functional space that can be explored and is not consistent with observations from nature, where sequences of low identity have similar structures. Here, we allow backbone flexibility during design to mutate every position in the core (38 residues) of a four-helixmore » bundle protein. Only small perturbations to the backbone, 12 {angstrom}, were needed to entirely mutate the core. The redesigned protein, DRNN, is exceptionally stable (melting point >140C). An NMR and X-ray crystal structure show that the side chains and backbone were accurately modeled (all-atom RMSD = 1.3 {angstrom}).« less

  17. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  18. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

    PubMed

    Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B; Halpern, Aaron L; Williamson, Shannon J; Remington, Karin; Eisen, Jonathan A; Heidelberg, Karla B; Manning, Gerard; Li, Weizhong; Jaroszewski, Lukasz; Cieplak, Piotr; Miller, Christopher S; Li, Huiying; Mashiyama, Susan T; Joachimiak, Marcin P; van Belle, Christopher; Chandonia, John-Marc; Soergel, David A; Zhai, Yufeng; Natarajan, Kannan; Lee, Shaun; Raphael, Benjamin J; Bafna, Vineet; Friedman, Robert; Brenner, Steven E; Godzik, Adam; Eisenberg, David; Dixon, Jack E; Taylor, Susan S; Strausberg, Robert L; Frazier, Marvin; Venter, J Craig

    2007-03-01

    Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

  19. Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?

    PubMed

    Sridhar, Settu; Guruprasad, Kunchur

    2014-01-01

    We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to 'swap' certain short peptide sequences in naturally occurring proteins with their corresponding 'inverted' peptides and generate 'artificial' proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5-12 and 18 amino acid residues. Our analysis illustrates with examples that such 'artificial' proteins may be generated by identifying peptides with 'similar structural environment' and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides.

  20. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.

  1. Adenovirus E1A and E1B-19K Proteins Protect Human Hepatoma Cells from Transforming Growth Factor β1-induced Apoptosis

    PubMed Central

    Tarakanova, Vera L.; Wold, William S. M.

    2009-01-01

    Primary and some transformed hepatocytes undergo apoptosis in response to transforming growth factor β1 (TGFβ). We report that infection with species C human adenovirus conferred resistance to TGFβ-induced apoptosis in human hepatocellular carcinoma cells (Huh-7). Protection against TGFβ-mediated cell death in adenovirus-infected cells correlated with the maintenance of normal nuclear morphology, lack of pro-caspases 8 and 3 processing, maintenance of the mitochondrial membrane potential, and lack of cellular DNA degradation. The TGFβ pro-apoptotic signaling pathway was blocked upstream of mitochondria in adenovirus-infected cells. Both the N-terminal sequences of the E1A proteins and the E1B-19K protein were necessary to protect infected cells against TGFβ-induced apoptosis. PMID:19854227

  2. De novo protein sequencing by combining top-down and bottom-up tandem mass spectra.

    PubMed

    Liu, Xiaowen; Dekker, Lennard J M; Wu, Si; Vanduijn, Martijn M; Luider, Theo M; Tolić, Nikola; Kou, Qiang; Dvorkin, Mikhail; Alexandrova, Sonya; Vyatkina, Kira; Paša-Tolić, Ljiljana; Pevzner, Pavel A

    2014-07-03

    There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified. High-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass values, boosting the development of top-down MS. Top-down tandem mass spectra cover whole proteins. However, top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on data sets of two proteins showed that TBNovo achieved high sequence coverage and high sequence accuracy.

  3. Complex alternative splicing of acetylcholinesterase transcripts in Torpedo electric organ; primary structure of the precursor of the glycolipid-anchored dimeric form.

    PubMed Central

    Sikorav, J L; Duval, N; Anselmet, A; Bon, S; Krejci, E; Legay, C; Osterlund, M; Reimund, B; Massoulié, J

    1988-01-01

    In this paper, we show the existence of alternative splicing in the 3' region of the coding sequence of Torpedo acetylcholinesterase (AChE). We describe two cDNA structures which both diverge from the previously described coding sequence of the catalytic subunit of asymmetric (A) forms (Schumacher et al., 1986; Sikorav et al., 1987). They both contain a coding sequence followed by a non-coding sequence and a poly(A) stretch. Both of these structures were shown to exist in poly(A)+ RNAs, by S1 mapping experiments. The divergent region encoded by the first sequence corresponds to the precursor of the globular dimeric form (G2a), since it contains the expected C-terminal amino acids, Ala-Cys. These amino acids are followed by a 29 amino acid extension which contains a hydrophobic segment and must be replaced by a glycolipid in the mature protein. Analyses of intact G2a AChE showed that the common domain of the protein contains intersubunit disulphide bonds. The divergent region of the second type of cDNA consists of an adjacent genomic sequence, which is removed as an intron in A and Ga mRNAs, but may encode a distinct, less abundant catalytic subunit. The structures of the cDNA clones indicate that they are derived from minor mRNAs, shorter than the three major transcripts which have been described previously (14.5, 10.5 and 5.5 kb). Oligonucleotide probes specific for the asymmetric and globular terminal regions hybridize with the three major transcripts, indicating that their size is determined by 3'-untranslated regions which are not related to the differential splicing leading to A and Ga forms. Images PMID:3181125

  4. De novo assembly and comparative transcriptome analysis of the foot from Chinese green mussel (Perna viridis) in response to cadmium stimulation

    PubMed Central

    You, Xinxin; Wang, Jintu; Chen, Jieming; Peng, Chao; Shi, Qiong

    2017-01-01

    The Chinese green mussel, Perna viridis, is a marine bivalve with important economic values as well as biomonitoring roles for aquatic pollution. Byssus, secreted by the foot gland, has been proved to bind heavy metals effectively. In this study, using the RNA sequencing technology, we performed comparative transcriptomic analysis on the mussel feet with or without inducing by cadmium (Cd). Our current work is aiming at providing insights into the molecular mechanisms of byssus binding to heavy metal ions. The transcriptome sequencing generated a total of 26.13-Gb raw data. After a careful assembly of clean data, we obtained a primary set of 105,127 unigenes, in which 32,268 unigenes were annotated. Based on the expression profiles, we identified 9,048 differentially expressed genes (DEGs) between Cd treatment (50 or 100 μg/L) at 48 h and the control, suggesting an extensive transcriptome response of the mussels during the Cd stimulation. Moreover, we observed that the expression levels of 54 byssus protein coding genes increased significantly after the 48-h Cd stimulation. In addition, 16 critical byssus protein coding genes were picked for profiling by quantitative real-time PCR (qRT-PCR). Finally, we reached a primary conclusion that high content of tyrosine (Tyr), cysteine (Cys), histidine (His) residues or the special motif plays an important role in the accumulation of heavy metals in byssus. We also proposed an interesting model for the confirmed byssal Cd accumulation, in which biosynthesis of byssus proteins may play simultaneously critical roles since their transcription levels were significantly elevated. PMID:28520756

  5. Bone Matrix Proteins: Isolation and Characterization of a Novel Cell-binding Keratan Sulfate Proteoglycan (Osteoadherin) from Bovine Bone

    PubMed Central

    Wendel, Mikael; Sommarin, Yngve; Heinegård, Dick

    1998-01-01

    A small cell-binding proteoglycan for which we propose the name osteoadherin was extracted from bovine bone with guanidine hydrochloride–containing EDTA. It was purified to homogeneity using a combination of ion-exchange chromatography, hydroxyapatite chromatography, and gel filtration. The Mr of the proteoglycan was 85,000 as determined by SDS-PAGE. The protein is rich in aspartic acid, glutamic acid, and leucine. Two internal octapeptides from the proteoglycan contained the sequences Glu-Ile-Asn-Leu-Ser-His-Asn-Lys and Arg-Asp-Leu-Tyr-Phe-Asn-Lys-Ile. These sequences are not previously described, and support the notion that osteoadherin belongs to the family of leucine-rich repeat proteins. A monospecific antiserum was raised in rabbits. An enzyme-linked immunosorbent assay was developed, and showed the osteoadherin content of bone extracts to be 0.4 mg/g of tissue wet weight, whereas none was found in extracts of various other bovine tissues. Metabolic labeling of primary bovine osteoblasts followed by immunoprecipitation showed the cells to synthesize and secrete the proteoglycan. Digesting the immunoprecipitated osteoadherin with N-glycosidase reduced its apparent size to 47 kD, thus showing the presence of several N-linked oligosaccharides. Digestion with keratanase indicated some of the oligosaccharides to be extended to keratan sulfate chains. In immunohistochemical studies of the bovine fetal rib growth plate, osteoadherin was exclusively identified in the primary bone spongiosa. Osteoadherin binds to hydroxyapatite. A potential function of this proteoglycan is to bind cells, since we showed it to be as efficient as fibronectin in promoting osteoblast attachment in vitro. The binding appears to be mediated by the integrin αvβ3, since this was the only integrin isolated by osteoadherin affinity chromatography of surface-iodinated osteoblast extracts. PMID:9566981

  6. PhDAHP1 is required for floral volatile benzenoid/phenylpropanoid biosynthesis in Petunia × hybrida cv 'Mitchell Diploid'.

    PubMed

    Langer, Kelly M; Jones, Correy R; Jaworski, Elizabeth A; Rushing, Gabrielle V; Kim, Joo Young; Clark, David G; Colquhoun, Thomas A

    2014-07-01

    Floral volatile benzenoid/phenylpropanoid (FVBP) biosynthesis consists of numerous enzymatic and regulatory processes. The initial enzymatic step bridging primary metabolism to secondary metabolism is the condensation of phosphoenolpyruvate (PEP) and erythrose-4-phosphate (E4P) carried out via 3-DEOXY-D-ARABINO-HEPTULOSONATE-7-PHOSPHATE (DAHP) synthase. Here, identified, cloned, localized, and functionally characterized were two DAHP synthases from the model plant species Petunia × hybrida cv 'Mitchell Diploid' (MD). Full-length transcript sequences for PhDAHP1 and PhDAHP2 were identified and cloned using cDNA SMART libraries constructed from pooled MD corolla and leaf total RNA. Predicted amino acid sequence of PhDAHP1 and PhDAHP2 proteins were 76% and 80% identical to AtDAHP1 and AtDAHP2 from Arabidopsis, respectively. PhDAHP1 transcript accumulated to relatively highest levels in petal limb and tube tissues, while PhDAHP2 accumulated to highest levels in leaf and stem tissues. Through floral development, PhDAHP1 transcript accumulated to highest levels during open flower stages, and PhDAHP2 transcript remained constitutive throughout. Radiolabeled PhDAHP1 and PhDAHP2 proteins localized to plastids, however, PhDAHP2 localization appeared less efficient. PhDAHP1 RNAi knockdown petunia lines were reduced in total FVBP emission compared to MD, while PhDAHP2 RNAi lines emitted 'wildtype' FVBP levels. These results demonstrate that PhDAHP1 is the principal DAHP synthase protein responsible for the coupling of metabolites from primary metabolism to secondary metabolism, and the ultimate biosynthesis of FVBPs in the MD flower. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. A Functional-Phylogenetic Classification System for Transmembrane Solute Transporters

    PubMed Central

    Saier, Milton H.

    2000-01-01

    A comprehensive classification system for transmembrane molecular transporters has been developed and recently approved by the transport panel of the nomenclature committee of the International Union of Biochemistry and Molecular Biology. This system is based on (i) transporter class and subclass (mode of transport and energy coupling mechanism), (ii) protein phylogenetic family and subfamily, and (iii) substrate specificity. Almost all of the more than 250 identified families of transporters include members that function exclusively in transport. Channels (115 families), secondary active transporters (uniporters, symporters, and antiporters) (78 families), primary active transporters (23 families), group translocators (6 families), and transport proteins of ill-defined function or of unknown mechanism (51 families) constitute distinct categories. Transport mode and energy coupling prove to be relatively immutable characteristics and therefore provide primary bases for classification. Phylogenetic grouping reflects structure, function, mechanism, and often substrate specificity and therefore provides a reliable secondary basis for classification. Substrate specificity and polarity of transport prove to be more readily altered during evolutionary history and therefore provide a tertiary basis for classification. With very few exceptions, a phylogenetic family of transporters includes members that function by a single transport mode and energy coupling mechanism, although a variety of substrates may be transported, sometimes with either inwardly or outwardly directed polarity. In this review, I provide cross-referencing of well-characterized constituent transporters according to (i) transport mode, (ii) energy coupling mechanism, (iii) phylogenetic grouping, and (iv) substrates transported. The structural features and distribution of recognized family members throughout the living world are also evaluated. The tabulations should facilitate familial and functional assignments of newly sequenced transport proteins that will result from future genome sequencing projects. PMID:10839820

  8. The length but not the sequence of peptide linker modules exerts the primary influence on the conformations of protein domains in cellulosome multi-enzyme complexes.

    PubMed

    Różycki, Bartosz; Cazade, Pierre-André; O'Mahony, Shane; Thompson, Damien; Cieplak, Marek

    2017-08-16

    Cellulosomes are large multi-protein catalysts produced by various anaerobic microorganisms to efficiently degrade plant cell-wall polysaccharides down into simple sugars. X-ray and physicochemical structural characterisations show that cellulosomes are composed of numerous protein domains that are connected by unstructured polypeptide segments, yet the properties and possible roles of these 'linker' peptides are largely unknown. We have performed coarse-grained and all-atom molecular dynamics computer simulations of a number of cellulosomal linkers of different lengths and compositions. Our data demonstrates that the effective stiffness of the linker peptides, as quantified by the equilibrium fluctuations in the end-to-end distances, depends primarily on the length of the linker and less so on the specific amino acid sequence. The presence of excluded volume - provided by the domains that are connected - dampens the motion of the linker residues and reduces the effective stiffness of the linkers. Simultaneously, the presence of the linkers alters the conformations of the protein domains that are connected. We demonstrate that short, stiff linkers induce significant rearrangements in the folded domains of the mini-cellulosome composed of endoglucanase Cel8A in complex with scaffoldin ScafT (Cel8A-ScafT) of Clostridium thermocellum as well as in a two-cohesin system derived from the scaffoldin ScaB of Acetivibrio cellulolyticus. We give experimentally testable predictions on structural changes in protein domains that depend on the length of linkers.

  9. Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

    PubMed Central

    Odronitz, Florian; Kollmar, Martin

    2006-01-01

    Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497

  10. Sarcocystis neurona Merozoites Express a Family of Immunogenic Surface Antigens That Are Orthologues of the Toxoplasma gondii Surface Antigens (SAGs) and SAG-Related Sequences†

    PubMed Central

    Howe, Daniel K.; Gaji, Rajshekhar Y.; Mroz-Barrett, Meaghan; Gubbels, Marc-Jan; Striepen, Boris; Stamper, Shelby

    2005-01-01

    Sarcocystis neurona is a member of the Apicomplexa that causes myelitis and encephalitis in horses but normally cycles between the opossum and small mammals. Analysis of an S. neurona expressed sequence tag (EST) database revealed four paralogous proteins that exhibit clear homology to the family of surface antigens (SAGs) and SAG-related sequences of Toxoplasma gondii. The primary peptide sequences of the S. neurona proteins are consistent with the two-domain structure that has been described for the T. gondii SAGs, and each was predicted to have an amino-terminal signal peptide and a carboxyl-terminal glycolipid anchor addition site, suggesting surface localization. All four proteins were confirmed to be membrane associated and displayed on the surface of S. neurona merozoites. Due to their surface localization and homology to T. gondii surface antigens, these S. neurona proteins were designated SnSAG1, SnSAG2, SnSAG3, and SnSAG4. Consistent with their homology, the SnSAGs elicited a robust immune response in infected and immunized animals, and their conserved structure further suggests that the SnSAGs similarly serve as adhesins for attachment to host cells. Whether the S. neurona SAG family is as extensive as the T. gondii SAG family remains unresolved, but it is probable that additional SnSAGs will be revealed as more S. neurona ESTs are generated. The existence of an SnSAG family in S. neurona indicates that expression of multiple related surface antigens is not unique to the ubiquitous organism T. gondii. Instead, the SAG gene family is a common trait that presumably has an essential, conserved function(s). PMID:15664946

  11. One precursor, three apolipoproteins: the relationship between two crustacean lipoproteins, the large discoidal lipoprotein and the high density lipoprotein/β-glucan binding protein.

    PubMed

    Stieb, Stefanie; Roth, Ziv; Dal Magro, Christina; Fischer, Sabine; Butz, Eric; Sagi, Amir; Khalaila, Isam; Lieb, Bernhard; Schenk, Sven; Hoeger, Ulrich

    2014-12-01

    The novel discoidal lipoprotein (dLp) recently detected in the crayfish, differs from other crustacean lipoproteins in its large size, apoprotein composition and high lipid binding capacity, We identified the dLp sequence by transcriptome analyses of the hepatopancreas and mass spectrometry. Further de novo assembly of the NGS data followed by BLAST searches using the sequence of the high density lipoprotein/1-glucan binding protein (HDL-BGBP) of Astacus leptodactylus as query revealed a putative precursor molecule with an open reading frame of 14.7 kb and a deduced primary structure of 4889 amino acids. The presence of an N-terminal lipid bind- ing domain and a DUF 1943 domain suggests the relationship with the large lipid transfer proteins. Two-putative dibasic furin cleavage sites were identified bordering the sequence of the HDL-BGBP. When subjected to mass spectroscopic analyses, tryptic peptides of the large apoprotein of dLp matched the N-terminal part of the precursor, while the peptides obtained for its small apoprotein matched the C-terminal part. Repeating the analysis in the prawn Macrobrachium rosenbergii revealed a similar protein with identical domain architecture suggesting that our findings do not represent an isolated instance. Our results indicate that the above three apolipoproteins (i.e HDL-BGBP and both the large and the small subunit of dLp) are translated as a large precursor. Cleavage at the furin type sites releases two subunits forming a heterodimeric dLP particle, while the remaining part forms an HDL-BGBP whose relationship with other lipoproteins as well as specific functions are yet to be elucidated.

  12. Characterization of causative allergens for wheat-dependent exercise-induced anaphylaxis sensitized with hydrolyzed wheat proteins in facial soap.

    PubMed

    Yokooji, Tomoharu; Kurihara, Saki; Murakami, Tomoko; Chinuki, Yuko; Takahashi, Hitoshi; Morita, Eishin; Harada, Susumu; Ishii, Kaori; Hiragun, Makiko; Hide, Michihiro; Matsuo, Hiroaki

    2013-12-01

    In Japan, hydrolyzed wheat proteins (HWP) have been reported to cause wheat-dependent exercise-induced anaphylaxis (WDEIA) by transcutaneous sensitization using HWP-containing soap. Patients develop allergic reactions not only with soap use, but also with exercise after the intake of wheat protein (WP). ω5-Gliadin and HMW-glutenin were identified as major allergens in conventional WP-WDEIA patients. However, the allergens in HWP-WDEIA have yet to be elucidated. Sera were obtained from 22 patients with HWP-sensitized WDEIA. The allergenic activities of HWP and six recombinant wheat gluten proteins, including α/β-, γ-, ω1,2- and ω5-gliadin and low- and high molecular weight (HMW)-glutenins, were characterized by immunoblot analysis and histamine releasing test. IgE-binding epitopes were identified using arrays of overlapping peptides synthesized on SPOTs membrane. Immunoblot analysis showed that IgE antibodies (Abs) from HWP-WDEIA bound to α/β-, γ- and ω1,2-gliadin. Recombinant γ-gliadin induced significant histamine release from basophils in eight of 11 patients with HWP-WDEIA. An IgE-binding epitope "QPQQPFPQ" was identified within the primary sequence of γ-gliadin, and the deamidated peptide containing the "PEEPFP" sequence bound with IgE Abs more strongly compared to the native epitope-peptide. The epitope-peptide inhibited IgE-binding to HWP, indicating that the specific IgE to HWP cross-reacts with γ-gliadin. HWP-WDEIA patients could be sensitized to HWP containing a PEEPFP sequence, and WDEIA symptoms after WP ingestion could partly be induced by γ-gliadin. These findings could be useful to help develop tools for diagnosis and desensitization therapy for HWP-WDEIA.

  13. Confirmation of the "protein-traffic-hypothesis" and the "protein-localization-hypothesis" using the diabetes-mellitus-type-1-knock-in and transgenic-murine-models and the trepitope sequences.

    PubMed

    Arneth, Borros

    2012-10-01

    As possible mechanisms to explain the emergence of autoimmune diseases, the current author has suggested in earlier papers two new pathways: the "protein localization hypothesis" and the "protein traffic hypothesis". The "protein localization hypothesis" states that an autoimmune disease develops if a protein accumulates in a previously unoccupied compartment, that did not previously contain that protein. Similarly, the "protein traffic hypothesis" states that a sudden error within the transport of a certain protein leads to the emergence of an autoimmune disease. The current article discusses the usefulness of the different commercially available transgenic murine models of diabetes mellitus type 1 to confirm the aforementioned hypotheses. This discussion shows that several transgenic murine models of diabetes mellitus type 1 are in-line and confirm the aforementioned hypotheses. Furthermore, these hypotheses are additionally inline with the occurrence of several newly discovered protein sequences, the so-called trepitope sequences. These sequences modulate the immune response to certain proteins. The current study analyzed to what extent the hypotheses are supported by the occurrence of these new sequences. Thereby the occurrence of the trepitope sequences provides additional evidence supporting the aforementioned hypotheses. Both the "protein localization hypothesis" and the "protein traffic hypothesis" have the potential to lead to new causal therapy concepts. The "protein localization hypothesis" and the "protein traffic hypothesis" provide conceptional explanations for the diabetes mouse models as well as for the newly discovered trepitope sequences. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Dynamics of domain coverage of the protein sequence universe

    PubMed Central

    2012-01-01

    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439

  15. Rational Protein Engineering Guided by Deep Mutational Scanning

    PubMed Central

    Shin, HyeonSeok; Cho, Byung-Kwan

    2015-01-01

    Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267

  16. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

    PubMed Central

    Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying molecular mechanisms of jellyfish stinging. The findings of this study may also be used in comparative studies of gene expression profiling among different jellyfish species. PMID:26551022

  17. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

    PubMed

    Liu, Guoyan; Zhou, Yonghong; Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish. We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying molecular mechanisms of jellyfish stinging. The findings of this study may also be used in comparative studies of gene expression profiling among different jellyfish species.

  18. Chameleon sequences in neurodegenerative diseases.

    PubMed

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Chameleon sequences in neurodegenerative diseases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bahramali, Golnaz; Goliaei, Bahram, E-mail: goliaei@ut.ac.ir; Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix tomore » coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.« less

  20. Comparative analysis of ribosomal protein L5 sequences from bacteria of the genus Thermus.

    PubMed

    Jahn, O; Hartmann, R K; Boeckh, T; Erdmann, V A

    1991-06-01

    The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.

Top