functionally similar proteins: Topics by Science.gov

Sample records for functionally similar proteins

Protein-protein interaction network-based detection of functionally similar proteins within species.

PubMed

Song, Baoxing; Wang, Fen; Guo, Yang; Sang, Qing; Liu, Min; Li, Dengyun; Fang, Wei; Zhang, Deli

2012-07-01

Although functionally similar proteins across species have been widely studied, functionally similar proteins within species showing low sequence similarity have not been examined in detail. Identification of these proteins is of significant importance for understanding biological functions, evolution of protein families, progression of co-evolution, and convergent evolution and others which cannot be obtained by detection of functionally similar proteins across species. Here, we explored a method of detecting functionally similar proteins within species based on graph theory. After denoting protein-protein interaction networks using graphs, we split the graphs into subgraphs using the 1-hop method. Proteins with functional similarities in a species were detected using a method of modified shortest path to compare these subgraphs and to find the eligible optimal results. Using seven protein-protein interaction networks and this method, some functionally similar proteins with low sequence similarity that cannot detected by sequence alignment were identified. By analyzing the results, we found that, sometimes, it is difficult to separate homologous from convergent evolution. Evaluation of the performance of our method by gene ontology term overlap showed that the precision of our method was excellent. Copyright © 2012 Wiley Periodicals, Inc.
Understand protein functions by comparing the similarity of local structural environments.

PubMed

Chen, Jiawen; Xie, Zhong-Ru; Wu, Yinghao

2017-02-01

The three-dimensional structures of proteins play an essential role in regulating binding between proteins and their partners, offering a direct relationship between structures and functions of proteins. It is widely accepted that the function of a protein can be determined if its structure is similar to other proteins whose functions are known. However, it is also observed that proteins with similar global structures do not necessarily correspond to the same function, while proteins with very different folds can share similar functions. This indicates that function similarity is originated from the local structural information of proteins instead of their global shapes. We assume that proteins with similar local environments prefer binding to similar types of molecular targets. In order to testify this assumption, we designed a new structural indicator to define the similarity of local environment between residues in different proteins. This indicator was further used to calculate the probability that a given residue binds to a specific type of structural neighbors, including DNA, RNA, small molecules and proteins. After applying the method to a large-scale non-redundant database of proteins, we show that the positive signal of binding probability calculated from the local structural indicator is statistically meaningful. In summary, our studies suggested that the local environment of residues in a protein is a good indicator to recognize specific binding partners of the protein. The new method could be a potential addition to a suite of existing template-based approaches for protein function prediction. Copyright © 2016 Elsevier B.V. All rights reserved.
FunSimMat: a comprehensive functional similarity database

PubMed Central

Schlicker, Andreas; Albrecht, Mario

2008-01-01

Functional similarity based on Gene Ontology (GO) annotation is used in diverse applications like gene clustering, gene expression data analysis, protein interaction prediction and evaluation. However, there exists no comprehensive resource of functional similarity values although such a database would facilitate the use of functional similarity measures in different applications. Here, we describe FunSimMat (Functional Similarity Matrix, http://funsimmat.bioinf.mpi-inf.mpg.de/), a large new database that provides several different semantic similarity measures for GO terms. It offers various precomputed functional similarity values for proteins contained in UniProtKB and for protein families in Pfam and SMART. The web interface allows users to efficiently perform both semantic similarity searches with GO terms and functional similarity searches with proteins or protein families. All results can be downloaded in tab-delimited files for use with other tools. An additional XML–RPC interface gives automatic online access to FunSimMat for programs and remote services. PMID:17932054
Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships.

PubMed

Gold, Nicola D; Jackson, Richard M

2006-02-03

The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

PubMed

Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

2016-01-01

Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
Topology-function conservation in protein-protein interaction networks.

PubMed

Davis, Darren; Yaveroğlu, Ömer Nebil; Malod-Dognin, Noël; Stojmirovic, Aleksandar; Pržulj, Nataša

2015-05-15

Proteins underlay the functioning of a cell and the wiring of proteins in protein-protein interaction network (PIN) relates to their biological functions. Proteins with similar wiring in the PIN (topology around them) have been shown to have similar functions. This property has been successfully exploited for predicting protein functions. Topological similarity is also used to guide network alignment algorithms that find similarly wired proteins between PINs of different species; these similarities are used to transfer annotation across PINs, e.g. from model organisms to human. To refine these functional predictions and annotation transfers, we need to gain insight into the variability of the topology-function relationships. For example, a function may be significantly associated with specific topologies, while another function may be weakly associated with several different topologies. Also, the topology-function relationships may differ between different species. To improve our understanding of topology-function relationships and of their conservation among species, we develop a statistical framework that is built upon canonical correlation analysis. Using the graphlet degrees to represent the wiring around proteins in PINs and gene ontology (GO) annotations to describe their functions, our framework: (i) characterizes statistically significant topology-function relationships in a given species, and (ii) uncovers the functions that have conserved topology in PINs of different species, which we term topologically orthologous functions. We apply our framework to PINs of yeast and human, identifying seven biological process and two cellular component GO terms to be topologically orthologous for the two organisms. © The Author 2015. Published by Oxford University Press.
Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

PubMed

Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

2015-09-01

The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks.

PubMed

Le, Duc-Hau

2015-01-01

Protein complexes formed by non-covalent interaction among proteins play important roles in cellular functions. Computational and purification methods have been used to identify many protein complexes and their cellular functions. However, their roles in terms of causing disease have not been well discovered yet. There exist only a few studies for the identification of disease-associated protein complexes. However, they mostly utilize complicated heterogeneous networks which are constructed based on an out-of-date database of phenotype similarity network collected from literature. In addition, they only apply for diseases for which tissue-specific data exist. In this study, we propose a method to identify novel disease-protein complex associations. First, we introduce a framework to construct functional similarity protein complex networks where two protein complexes are functionally connected by either shared protein elements, shared annotating GO terms or based on protein interactions between elements in each protein complex. Second, we propose a simple but effective neighborhood-based algorithm, which yields a local similarity measure, to rank disease candidate protein complexes. Comparing the predictive performance of our proposed algorithm with that of two state-of-the-art network propagation algorithms including one we used in our previous study, we found that it performed statistically significantly better than that of these two algorithms for all the constructed functional similarity protein complex networks. In addition, it ran about 32 times faster than these two algorithms. Moreover, our proposed method always achieved high performance in terms of AUC values irrespective of the ways to construct the functional similarity protein complex networks and the used algorithms. The performance of our method was also higher than that reported in some existing methods which were based on complicated heterogeneous networks. Finally, we also tested our method with prostate cancer and selected the top 100 highly ranked candidate protein complexes. Interestingly, 69 of them were evidenced since at least one of their protein elements are known to be associated with prostate cancer. Our proposed method, including the framework to construct functional similarity protein complex networks and the neighborhood-based algorithm on these networks, could be used for identification of novel disease-protein complex associations.
Recognition of functional sites in protein structures.

PubMed

Shulman-Peleg, Alexandra; Nussinov, Ruth; Wolfson, Haim J

2004-06-04

Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.
Resolving protein structure-function-binding site relationships from a binding site similarity network perspective.

PubMed

Mudgal, Richa; Srinivasan, Narayanaswamy; Chandra, Nagasuma

2017-07-01

Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non-homologous protein families, leading to mis-annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold-function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold-function-binding site relationships has been systematically generated. A network-based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one-to-one relationships. Binding site similarity networks integrated with fold, function, and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly-pharmacology, and designing enzymes with new functional capabilities. Proteins 2017; 85:1319-1335. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
[Analysis of virulence factors of Porphyromonas endodontalis based on comparative proteomics technique].

PubMed

Li, H; Ji, H; Wu, S S; Hou, B X

2016-12-09

Objective: To analyze the protein expression profile and the potential virulence factors of Porphyromonas endodontalis (Pe) via comparison with that of two strains of Porphyromonas gingivalis (Pg) with high and low virulences, respectively. Methods: Whole cell comparative proteomics of Pe ATCC35406 was examined and compared with that of high virulent strain Pg W83 andlow virulent strain Pg ATCC33277, respectively. Isobaric tags for relative and absolute quantitation (iTRAQ) combined with nano liquid chromatography-tandem mass spectrometry (Nano-LC-MS/MS) were adopted to identify and quantitate the proteins of Pe and two strains of Pg with various virulences by using the methods of isotopically labeled peptides, mass spectrometric detection and bioinformatics analysis. The biological functions of similar proteins expressed by Pe ATCC35406 and two strains of Pg were quantified and analyzed. Results: Totally 1 210 proteins were identified while Pe compared with Pg W83. There were 130 proteins (10.74% of the total proteins) expressed similarly, including 89 known functional proteins and 41 proteins of unknown functions. Totally 1 223 proteins were identified when Pe compared with Pg ATCC33277. There were 110 proteins (8.99% of the total proteins) expressed similarly, including 72 known functional proteins and 38 proteins of unknown functions. The similarly expressed proteins in Pe and Pg strains with various virulences mainly focused on catalytic activity and binding function, including recombination activation gene (RagA), lipoprotein, chaperonin Dnak, Clp family proteins (ClpC and ClpX) and various iron-binding proteins. They were involved in metabolism and cellular processes. In addition, the type and number of similar virulence proteins between Pe and high virulence Pg were higher than those between Pe and low virulence Pg. Conclusions: Lipoprotein, oxygen resistance protein, iron binding protein were probably the potential virulence factors of Pe ATCC35406. It was speculated that pathogenicity of Pe was more similar to high virulence Pg than that to low virulence strain.
Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

PubMed Central

Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

2015-01-01

The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648
Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

PubMed

Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

2018-01-01

Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

PubMed Central

2012-01-01

Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery. PMID:23281852
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

PubMed

Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

2012-01-01

To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.
Protein Function Prediction: Problems and Pitfalls.

PubMed

Pearson, William R

2015-09-03

The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood. Copyright © 2015 John Wiley & Sons, Inc.
PaperBLAST: Text Mining Papers for Information about Homologs.

PubMed

Price, Morgan N; Arkin, Adam P

2017-01-01

Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.
PaperBLAST: Text Mining Papers for Information about Homologs

DOE PAGES

Price, Morgan N.; Arkin, Adam P.

2017-08-15

Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less
PaperBLAST: Text Mining Papers for Information about Homologs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, Morgan N.; Arkin, Adam P.

Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less
PaperBLAST: Text Mining Papers for Information about Homologs

PubMed Central

Arkin, Adam P.

2017-01-01

ABSTRACT Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions. PMID:28845458

Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

PubMed

Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

2016-07-01

Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.
A new definition and properties of the similarity value between two protein structures.

PubMed

Saberi Fathi, S M

2016-10-01

Knowledge regarding the 3D structure of a protein provides useful information about the protein's functional properties. Particularly, structural similarity between proteins can be used as a good predictor of functional similarity. One method that uses the 3D geometrical structure of proteins in order to compare them is the similarity value (SV). In this paper, we introduce a new definition of the SV measure for comparing two proteins. To this end, we consider the mass of the protein's atoms and concentrate on the number of protein's atoms to be compared. This defines a new measure, called the weighted similarity value (WSV), adding physical properties to geometrical properties. We also show that our results are in good agreement with the results obtained by TM-SCORE and DALILITE. WSV can be of use in protein classification and in drug discovery.
Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM.

PubMed

Tuncbag, Nurcan; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem

2011-08-11

Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.
Quantitative protein localization signatures reveal an association between spatial and functional divergences of proteins.

PubMed

Loo, Lit-Hsin; Laksameethanasan, Danai; Tung, Yi-Ling

2014-03-01

Protein subcellular localization is a major determinant of protein function. However, this important protein feature is often described in terms of discrete and qualitative categories of subcellular compartments, and therefore it has limited applications in quantitative protein function analyses. Here, we present Protein Localization Analysis and Search Tools (PLAST), an automated analysis framework for constructing and comparing quantitative signatures of protein subcellular localization patterns based on microscopy images. PLAST produces human-interpretable protein localization maps that quantitatively describe the similarities in the localization patterns of proteins and major subcellular compartments, without requiring manual assignment or supervised learning of these compartments. Using the budding yeast Saccharomyces cerevisiae as a model system, we show that PLAST is more accurate than existing, qualitative protein localization annotations in identifying known co-localized proteins. Furthermore, we demonstrate that PLAST can reveal protein localization-function relationships that are not obvious from these annotations. First, we identified proteins that have similar localization patterns and participate in closely-related biological processes, but do not necessarily form stable complexes with each other or localize at the same organelles. Second, we found an association between spatial and functional divergences of proteins during evolution. Surprisingly, as proteins with common ancestors evolve, they tend to develop more diverged subcellular localization patterns, but still occupy similar numbers of compartments. This suggests that divergence of protein localization might be more frequently due to the development of more specific localization patterns over ancestral compartments than the occupation of new compartments. PLAST enables systematic and quantitative analyses of protein localization-function relationships, and will be useful to elucidate protein functions and how these functions were acquired in cells from different organisms or species. A public web interface of PLAST is available at http://plast.bii.a-star.edu.sg.
Quantitative Protein Localization Signatures Reveal an Association between Spatial and Functional Divergences of Proteins

PubMed Central

Loo, Lit-Hsin; Laksameethanasan, Danai; Tung, Yi-Ling

2014-01-01

Protein subcellular localization is a major determinant of protein function. However, this important protein feature is often described in terms of discrete and qualitative categories of subcellular compartments, and therefore it has limited applications in quantitative protein function analyses. Here, we present Protein Localization Analysis and Search Tools (PLAST), an automated analysis framework for constructing and comparing quantitative signatures of protein subcellular localization patterns based on microscopy images. PLAST produces human-interpretable protein localization maps that quantitatively describe the similarities in the localization patterns of proteins and major subcellular compartments, without requiring manual assignment or supervised learning of these compartments. Using the budding yeast Saccharomyces cerevisiae as a model system, we show that PLAST is more accurate than existing, qualitative protein localization annotations in identifying known co-localized proteins. Furthermore, we demonstrate that PLAST can reveal protein localization-function relationships that are not obvious from these annotations. First, we identified proteins that have similar localization patterns and participate in closely-related biological processes, but do not necessarily form stable complexes with each other or localize at the same organelles. Second, we found an association between spatial and functional divergences of proteins during evolution. Surprisingly, as proteins with common ancestors evolve, they tend to develop more diverged subcellular localization patterns, but still occupy similar numbers of compartments. This suggests that divergence of protein localization might be more frequently due to the development of more specific localization patterns over ancestral compartments than the occupation of new compartments. PLAST enables systematic and quantitative analyses of protein localization-function relationships, and will be useful to elucidate protein functions and how these functions were acquired in cells from different organisms or species. A public web interface of PLAST is available at http://plast.bii.a-star.edu.sg. PMID:24603469
A new method to improve network topological similarity search: applied to fold recognition

PubMed Central

Lhota, John; Hauptman, Ruth; Hart, Thomas; Ng, Clara; Xie, Lei

2015-01-01

Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics. Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. Availability and implementation: Source code freely available upon request Contact: lxie@iscb.org PMID:25717198
PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics.

PubMed

von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I

2006-02-06

The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins

PubMed Central

Nakai, Shuryo; Li-Chan, Eunice CY; Dou, Jinglie

2005-01-01

Background Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families. Results Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme. Conclusion Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available. PMID:15904486
Computational mining for hypothetical patterns of amino acid side chains in protein data bank (PDB)

NASA Astrophysics Data System (ADS)

Ghani, Nur Syatila Ab; Firdaus-Raih, Mohd

2018-04-01

The three-dimensional structure of a protein can provide insights regarding its function. Functional relationship between proteins can be inferred from fold and sequence similarities. In certain cases, sequence or fold comparison fails to conclude homology between proteins with similar mechanism. Since the structure is more conserved than the sequence, a constellation of functional residues can be similarly arranged among proteins of similar mechanism. Local structural similarity searches are able to detect such constellation of amino acids among distinct proteins, which can be useful to annotate proteins of unknown function. Detection of such patterns of amino acids on a large scale can increase the repertoire of important 3D motifs since available known 3D motifs currently, could not compensate the ever-increasing numbers of uncharacterized proteins to be annotated. Here, a computational platform for an automated detection of 3D motifs is described. A fuzzy-pattern searching algorithm derived from IMagine an Amino Acid 3D Arrangement search EnGINE (IMAAAGINE) was implemented to develop an automated method for searching of hypothetical patterns of amino acid side chains in Protein Data Bank (PDB), without the need for prior knowledge on related sequence or structure of pattern of interest. We present an example of the searches, which is the detection of a hypothetical pattern derived from known structural motif of C2H2 structural pattern from zinc fingers. The conservation of particular patterns of amino acid side chains in unrelated proteins is highlighted. This approach can act as a complementary method for available structure- and sequence-based platforms and may contribute in improving functional association between proteins.
Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions.

PubMed

Li, Yang; Yang, Jianyi

2017-04-24

The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.
An improved method for functional similarity analysis of genes based on Gene Ontology.

PubMed

Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

2016-12-23

Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .
Functional Evolution of PLP-dependent Enzymes based on Active-Site Structural Similarities

PubMed Central

Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

2014-01-01

Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5’-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the Comparison of Protein Active Site Structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. PMID:24920327
Functional evolution of PLP-dependent enzymes based on active-site structural similarities.

PubMed

Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

2014-10-01

Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5'-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional-fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. © 2014 Wiley Periodicals, Inc.
Interrogation of Mammalian Protein Complex Structure, Function, and Membership Using Genome-Scale Fitness Screens.

PubMed

Pan, Joshua; Meyers, Robin M; Michel, Brittany C; Mashtalir, Nazar; Sizemore, Ann E; Wells, Jonathan N; Cassel, Seth H; Vazquez, Francisca; Weir, Barbara A; Hahn, William C; Marsh, Joseph A; Tsherniak, Aviad; Kadoch, Cigall

2018-05-23

Protein complexes are assemblies of subunits that have co-evolved to execute one or many coordinated functions in the cellular environment. Functional annotation of mammalian protein complexes is critical to understanding biological processes, as well as disease mechanisms. Here, we used genetic co-essentiality derived from genome-scale RNAi- and CRISPR-Cas9-based fitness screens performed across hundreds of human cancer cell lines to assign measures of functional similarity. From these measures, we systematically built and characterized functional similarity networks that recapitulate known structural and functional features of well-studied protein complexes and resolve novel functional modules within complexes lacking structural resolution, such as the mammalian SWI/SNF complex. Finally, by integrating functional networks with large protein-protein interaction networks, we discovered novel protein complexes involving recently evolved genes of unknown function. Taken together, these findings demonstrate the utility of genetic perturbation screens alone, and in combination with large-scale biophysical data, to enhance our understanding of mammalian protein complexes in normal and disease states. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
A topological approach for protein classification

DOE PAGES

Cang, Zixuan; Mu, Lin; Wu, Kedi; ...

2015-11-04

Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Cang, Zixuan; Mu, Lin; Wu, Kedi

Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.
Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

PubMed Central

Kinjo, Akira R.; Nakamura, Haruki

2012-01-01

Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478
Bacterial interactomes: Interacting protein partners share similar function and are validated in independent assays more frequently than previously reported.

DOE PAGES

Shatsky, Maxim; Allen, Simon; Gold, Barbara; ...

2016-05-01

Numerous affinity purification – mass-spectrometry (AP-MS) and yeast two hybrid (Y2H) screens have each defined thousands of pairwise protein-protein interactions (PPIs), most between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial Y2H and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli. Compared to the nine published interactomes, our two networks are smaller; are much less highly connected; have significantly lower false discovery rates; and are much moremore » enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays. Lastly, our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested.« less
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

PubMed

Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

2013-12-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.
On the Regularities of the Polar Profiles of Proteins Related to Ebola Virus Infection and their Functional Domains.

PubMed

Polanco, Carlos; Samaniego Mendoza, José Lino; Buhse, Thomas; Uversky, Vladimir N; Bañuelos Chao, Ingrid Paola; Bañuelos Cedano, Marcela Angola; Tavera, Fernando Michel; Tavera, Daniel Michel; Falconi, Manuel; Ponce de León, Abelardo Vela

2018-03-06

The number of fatalities and economic losses caused by the Ebola virus infection across the planet culminated in the havoc that occurred between August and November 2014. However, little is known about the molecular protein profile of this devastating virus. This work represents a thorough bioinformatics analysis of the regularities of charge distribution (polar profiles) in two groups of proteins and their functional domains associated with Ebola virus disease: Ebola virus proteins and Human proteins interacting with Ebola virus. Our analysis reveals that a fragment exists in each of these proteins-one named the "functional domain"-with the polar profile similar to the polar profile of the protein that contains it. Each protein is formed by a group of short sub-sequences, where each fragment has a different and distinctive polar profile and where the polar profile between adjacent short sub-sequences changes orderly and gradually to coincide with the polar profile of the whole protein. When using the charge distribution as a metric, it was observed that it effectively discriminates the proteins from their functional domains. As a counterexample, the same test was applied to a set of synthetic proteins built for that purpose, revealing that any of the regularities reported here for the Ebola virus proteins and human proteins interacting with Ebola virus were not present in the synthetic proteins. Our results indicate that the polar profile of each protein studied and its corresponding functional domain are similar. Thus, when building each protein from its functional domai-adding one amino acid at a time and plotting each time its polar profile-it was observed that the resulting graphs can be divided into groups with similar polar profiles.

Binding ligand prediction for proteins using partial matching of local surface patches.

PubMed

Sael, Lee; Kihara, Daisuke

2010-01-01

Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.
Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

PubMed Central

Sael, Lee; Kihara, Daisuke

2010-01-01

Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188
Identification of functional candidates amongst hypothetical proteins of Mycobacterium leprae Br4923, a causative agent of leprosy.

PubMed

Naqvi, Ahmad Abu Turab; Ahmad, Faizan; Hassan, Md Imtaiyaz

2015-01-01

Mycobacterium leprae is an intracellular obligate parasite that causes leprosy in humans, and it leads to the destruction of peripheral nerves and skin deformation. Here, we report an extensive analysis of the hypothetical proteins (HPs) from M. leprae strain Br4923, assigning their functions to better understand the mechanism of pathogenesis and to search for potential therapeutic interventions. The genome of M. leprae encodes 1604 proteins, of which the functions of 632 are not known (HPs). In this paper, we predicted the probable functions of 312 HPs. First, we classified all HPs into families and subfamilies on the basis of sequence similarity, followed by domain assignment, which provides many clues for their possible function. However, the functions of 320 proteins were not predicted because of low sequence similarity with proteins of known function. Annotated HPs were categorized into enzymes, binding proteins, transporters, and proteins involved in cellular processes. We found several novel proteins whose functions were unknown for M. leprae. These proteins have a requisite association with bacterial virulence and pathogenicity. Finally, our sequence-based analysis will be helpful for further validation and the search for potential drug targets while developing effective drugs to cure leprosy.
Analysis of sequence repeats of proteins in the PDB.

PubMed

Mary Rajathei, David; Selvaraj, Samuel

2013-12-01

Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.
SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

PubMed Central

Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

2014-01-01

The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881
Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering.

PubMed

Boari de Lima, Elisa; Meira, Wagner; Melo-Minardi, Raquel Cardoso de

2016-06-01

As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem's complexity. Hence, this work's purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.
Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering

PubMed Central

Boari de Lima, Elisa; Meira, Wagner; de Melo-Minardi, Raquel Cardoso

2016-01-01

As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem’s complexity. Hence, this work’s purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity. PMID:27348631
Efficient protein structure search using indexing methods

PubMed Central

2013-01-01

Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543
Efficient protein structure search using indexing methods.

PubMed

Kim, Sungchul; Sael, Lee; Yu, Hwanjo

2013-01-01

Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.

PubMed

Pesaranghader, Ahmad; Matwin, Stan; Sokolova, Marina; Beiko, Robert G

2016-05-01

Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A graph-based semantic similarity measure for the gene ontology.

PubMed

Alvarez, Marco A; Yan, Changhui

2011-12-01

Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.
Evolutionary turnover of kinetochore proteins: a ship of Theseus?

PubMed Central

Drinnenberg, Ines A.; Henikoff, Steven; Malik, Harmit S.

2016-01-01

Summary The kinetochore is a multi-protein complex that mediates the attachment of a eukaryotic chromosome to the mitotic spindle. The protein composition of kinetochores is similar across species as divergent as yeast and human. However, recent findings have revealed an unexpected degree of compositional diversity in kinetochores. For example, kinetochore proteins that are essential in some species have been lost in others, whereas new kinetochore proteins have emerged in other lineages. Even in lineages with similar kinetochore composition, individual kinetochore proteins have functionally diverged to acquire either essential or redundant roles. Thus, despite functional conservation, the repertoire of kinetochore proteins has undergone recurrent evolutionary turnover. PMID:26877204
Quality assessment of protein model-structures based on structural and functional similarities.

PubMed

Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata

2012-09-21

Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.
Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties

PubMed Central

Shi, Xiaohe; Lu, Wen-Cong; Cai, Yu-Dong; Chou, Kuo-Chen

2011-01-01

Background With the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner. Methodology/Principal Findings Although many efforts have been made in this regard, most of them were based on either sequence similarity or protein-protein interaction (PPI) information. However, the former often fails to work if a query protein has no or very little sequence similarity to any function-known proteins, while the latter had similar problem if the relevant PPI information is not available. In view of this, a new approach is proposed by hybridizing the PPI information and the biochemical/physicochemical features of protein sequences. The overall first-order success rates by the new predictor for the functions of mouse proteins on training set and test set were 69.1% and 70.2%, respectively, and the success rate covered by the results of the top-4 order from a total of 24 orders was 65.2%. Conclusions/Significance The results indicate that the new approach is quite promising that may open a new avenue or direction for addressing the difficult and complicated problem. PMID:21283518
Adsorption and conformations of lysozyme and α-lactalbumin at a water-octane interface

NASA Astrophysics Data System (ADS)

Cheung, David L.

2017-11-01

As proteins contain both hydrophobic and hydrophilic amino acids, they will readily adsorb onto interfaces between water and hydrophobic fluids such as oil. This adsorption normally causes changes in the protein structure, which can result in loss of protein function and irreversible adsorption, leading to the formation of protein interfacial films. While this can be advantageous in some applications (e.g., food technology), in most cases it limits our ability to exploit protein functionality at interfaces. To understand and control protein interfacial adsorption and function, it is necessary to understand the microscopic conformation of proteins at liquid interfaces. In this paper, molecular dynamics simulations are used to investigate the adsorption and conformation of two similar proteins, lysozyme and α-lactalbumin, at a water-octane interface. While they both adsorb onto the interface, α-lactalbumin does so in a specific orientation, mediated by two amphipathic helices, while lysozyme adsorbs in a non-specific manner. Using replica exchange simulations, both proteins are found to possess a number of distinct interfacial conformations, with compact states similar to the solution conformation being most common for both proteins. Decomposing the different contributions to the protein energy at oil-water interfaces suggests that conformational change for α-lactalbumin, unlike lysozyme, is driven by favourable protein-oil interactions. Revealing these differences between the factors that govern the conformational change at interfaces in otherwise similar proteins can give insight into the control of protein interfacial adsorption, aggregation, and function.
Bacterial Interactomes: Interacting Protein Partners Share Similar Function and Are Validated in Independent Assays More Frequently Than Previously Reported*

PubMed Central

Shatsky, Maxim; Allen, Simon; Gold, Barbara L.; Liu, Nancy L.; Juba, Thomas R.; Reveco, Sonia A.; Elias, Dwayne A.; Prathapam, Ramadevi; He, Jennifer; Yang, Wenhong; Szakal, Evelin D.; Liu, Haichuan; Singer, Mary E.; Geller, Jil T.; Lam, Bonita R.; Saini, Avneesh; Trotter, Valentine V.; Hall, Steven C.; Fisher, Susan J.; Brenner, Steven E.; Chhabra, Swapnil R.; Hazen, Terry C.; Wall, Judy D.; Witkowska, H. Ewa; Biggin, Mark D.; Chandonia, John-Marc; Butland, Gareth

2016-01-01

Numerous affinity purification-mass spectrometry (AP-MS) and yeast two-hybrid screens have each defined thousands of pairwise protein-protein interactions (PPIs), most of which are between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here, we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial yeast two-hybrid and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli. Compared with the nine published interactomes, our two networks are smaller, are much less highly connected, and have significantly lower false discovery rates. In addition, our interactomes are much more enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays than the pairs reported in prior studies. Our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested. PMID:26873250
Dissimilar sweet proteins from plants: oddities or normal components?

PubMed

Picone, Delia; Temussi, Piero Andrea

2012-10-01

The fruits of a few tropical plants contain intensely sweet proteins. Their common property points to a protein family. Generally, proteins belonging to the same family share similar folds, similar sequences and, at least in part, similar function but sweet proteins constitute an exception to this rule. Apart from sharing the rather unusual taste function, they show no obvious similarities either in their sequences or in three-dimensional structures. In this review we describe the nature, structure and mechanism of action of the best known sweet tasting proteins, including two taste modifying proteins. Sweet proteins stand out among sweet molecules because their volume is not compatible with an interaction with orthosteric active sites of the sweet taste receptor. The best explanation of their mechanism of action is the interaction with the external surface of the sweet taste receptor, according to a model that has been named "wedge model". It is hypothesized that this mode of action may be related to the ability of other members of their protein families to inhibit different enzymes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Sequence-similar, structure-dissimilar protein pairs in the PDB.

PubMed

Kosloff, Mickey; Kolodny, Rachel

2008-05-01

It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
The crystal structure of Erwinia amylovora AmyR, a member of the YbjN protein family, shows similarity to type III secretion chaperones but suggests different cellular functions

PubMed Central

Bartho, Joseph D.; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O.; Zhao, Youfu; Walsh, Martin A.

2017-01-01

AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family. PMID:28426806
The crystal structure of Erwinia amylovora AmyR, a member of the YbjN protein family, shows similarity to type III secretion chaperones but suggests different cellular functions.

PubMed

Bartho, Joseph D; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O; Zhao, Youfu; Walsh, Martin A; Benini, Stefano

2017-01-01

AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family.

Exhaustive comparison and classification of ligand-binding surfaces in proteins

PubMed Central

Murakami, Yoichi; Kinoshita, Kengo; Kinjo, Akira R; Nakamura, Haruki

2013-01-01

Many proteins function by interacting with other small molecules (ligands). Identification of ligand-binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand-binding protein sequences and functions. Consequently, we classified the patches into ∼2000 well-characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross-fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes. PMID:23934772
Functional Module Search in Protein Networks based on Semantic Similarity Improves the Analysis of Proteomics Data*

PubMed Central

Boyanova, Desislava; Nilla, Santosh; Klau, Gunnar W.; Dandekar, Thomas; Müller, Tobias; Dittrich, Marcus

2014-01-01

The continuously evolving field of proteomics produces increasing amounts of data while improving the quality of protein identifications. Albeit quantitative measurements are becoming more popular, many proteomic studies are still based on non-quantitative methods for protein identification. These studies result in potentially large sets of identified proteins, where the biological interpretation of proteins can be challenging. Systems biology develops innovative network-based methods, which allow an integrated analysis of these data. Here we present a novel approach, which combines prior knowledge of protein-protein interactions (PPI) with proteomics data using functional similarity measurements of interacting proteins. This integrated network analysis exactly identifies network modules with a maximal consistent functional similarity reflecting biological processes of the investigated cells. We validated our approach on small (H9N2 virus-infected gastric cells) and large (blood constituents) proteomic data sets. Using this novel algorithm, we identified characteristic functional modules in virus-infected cells, comprising key signaling proteins (e.g. the stress-related kinase RAF1) and demonstrate that this method allows a module-based functional characterization of cell types. Analysis of a large proteome data set of blood constituents resulted in clear separation of blood cells according to their developmental origin. A detailed investigation of the T-cell proteome further illustrates how the algorithm partitions large networks into functional subnetworks each representing specific cellular functions. These results demonstrate that the integrated network approach not only allows a detailed analysis of proteome networks but also yields a functional decomposition of complex proteomic data sets and thereby provides deeper insights into the underlying cellular processes of the investigated system. PMID:24807868
On the detection of functionally coherent groups of protein domains with an extension to protein annotation

PubMed Central

McLaughlin, William A; Chen, Ken; Hou, Tingjun; Wang, Wei

2007-01-01

Background Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of domains that span across two or more proteins and are linked by a cooperative function. Such functional domain combinations can be useful for protein annotation. Results Using a new computational method, we have identified 114 groups of domains, referred to as domain assembly units (DASSEM units), in the proteome of budding yeast Saccharomyces cerevisiae. The units participate in many important cellular processes such as transcription regulation, translation initiation, and mRNA splicing. Within the units the domains were found to function in a cooperative manner; and each domain contributed to a different aspect of the unit's overall function. The member domains of DASSEM units were found to be significantly enriched among proteins contained in transcription modules, defined as genes sharing similar expression profiles and presumably similar functions. The observation further confirmed the functional coherence of DASSEM units. The functional linkages of units were found in both functionally characterized and uncharacterized proteins, which enabled the assessment of protein function based on domain composition. Conclusion A new computational method was developed to identify groups of domains that are linked by a common function in the proteome of Saccharomyces cerevisiae. These groups can either lie within individual proteins or span across different proteins. We propose that the functional linkages among the domains within the DASSEM units can be used as a non-homology based tool to annotate uncharacterized proteins. PMID:17937820
High-confidence prediction of global interactomes based on genome-wide coevolutionary networks

PubMed Central

Juan, David; Pazos, Florencio; Valencia, Alfonso

2008-01-01

Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques have been developed to predict interaction partners. The observed degree of similarity between the phylogenetic trees of two proteins is the result of many different factors besides the actual interaction or functional relationship between them. Such factors influence the performance of interaction predictions. One aspect that can influence this similarity is related to the fact that a given protein interacts with many others, and hence it must adapt to all of them. Accordingly, the interaction or coadaptation signal within its tree is a composite of the influence of all of the interactors. Here, we introduce a new estimator of coevolution to overcome this and other problems. Instead of relying on the individual value of tree similarity between two proteins, we use the whole network of similarities between all of the pairs of proteins within a genome to reassess the similarity of that pair, thereby taking into account its coevolutionary context. We show that this approach offers a substantial improvement in interaction prediction performance, providing a degree of accuracy/coverage comparable with, or in some cases better than, that of experimental techniques. Moreover, important information on the structure, function, and evolution of macromolecular complexes can be inferred with this methodology. PMID:18199838
High-confidence prediction of global interactomes based on genome-wide coevolutionary networks.

PubMed

Juan, David; Pazos, Florencio; Valencia, Alfonso

2008-01-22

Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques have been developed to predict interaction partners. The observed degree of similarity between the phylogenetic trees of two proteins is the result of many different factors besides the actual interaction or functional relationship between them. Such factors influence the performance of interaction predictions. One aspect that can influence this similarity is related to the fact that a given protein interacts with many others, and hence it must adapt to all of them. Accordingly, the interaction or coadaptation signal within its tree is a composite of the influence of all of the interactors. Here, we introduce a new estimator of coevolution to overcome this and other problems. Instead of relying on the individual value of tree similarity between two proteins, we use the whole network of similarities between all of the pairs of proteins within a genome to reassess the similarity of that pair, thereby taking into account its coevolutionary context. We show that this approach offers a substantial improvement in interaction prediction performance, providing a degree of accuracy/coverage comparable with, or in some cases better than, that of experimental techniques. Moreover, important information on the structure, function, and evolution of macromolecular complexes can be inferred with this methodology.
Molecular Dynamics Information Improves cis-Peptide-Based Function Annotation of Proteins.

PubMed

Das, Sreetama; Bhadra, Pratiti; Ramakumar, Suryanarayanarao; Pal, Debnath

2017-08-04

cis-Peptide bonds, whose occurrence in proteins is rare but evolutionarily conserved, are implicated to play an important role in protein function. This has led to their previous use in a homology-independent, fragment-match-based protein function annotation method. However, proteins are not static molecules; dynamics is integral to their activity. This is nicely epitomized by the geometric isomerization of cis-peptide to trans form for molecular activity. Hence we have incorporated both static (cis-peptide) and dynamics information to improve the prediction of protein molecular function. Our results show that cis-peptide information alone cannot detect functional matches in cases where cis-trans isomerization exists but 3D coordinates have been obtained for only the trans isomer or when the cis-peptide bond is incorrectly assigned as trans. On the contrary, use of dynamics information alone includes false-positive matches for cases where fragments with similar secondary structure show similar dynamics, but the proteins do not share a common function. Combining the two methods reduces errors while detecting the true matches, thereby enhancing the utility of our method in function annotation. A combined approach, therefore, opens up new avenues of improving existing automated function annotation methodologies.
Network-based function prediction and interactomics: the case for metabolic enzymes.

PubMed

Janga, S C; Díaz-Mejía, J Javier; Moreno-Hagelsieb, G

2011-01-01

As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases. © 2010 Elsevier Inc. All rights reserved.
Distinguishing between biochemical and cellular function: Are there peptide signatures for cellular function of proteins?

PubMed

Jain, Shruti; Bhattacharyya, Kausik; Bakshi, Rachit; Narang, Ankita; Brahmachari, Vani

2017-04-01

The genome annotation and identification of gene function depends on conserved biochemical activity. However, in the cell, proteins with the same biochemical function can participate in different cellular pathways and cannot complement one another. Similarly, two proteins of very different biochemical functions are put in the same class of cellular function; for example, the classification of a gene as an oncogene or a tumour suppressor gene is not related to its biochemical function, but is related to its cellular function. We have taken an approach to identify peptide signatures for cellular function in proteins with known biochemical function. ATPases as a test case, we classified ATPases (2360 proteins) and kinases (517 proteins) from the human genome into different cellular function categories such as transcriptional, replicative, and chromatin remodelling proteins. Using publicly available tool, MEME, we identify peptide signatures shared among the members of a given category but not between cellular functional categories; for example, no motif sharing is seen between chromatin remodelling and transporter ATPases, similarly between receptor Serine/Threonine Kinase and Receptor Tyrosine Kinase. There are motifs shared within each category with significant E value and high occurrence. This concept of signature for cellular function was applied to developmental regulators, the polycomb and trithorax proteins which led to the prediction of the role of INO80, a chromatin remodelling protein, in development. This has been experimentally validated earlier for its role in homeotic gene regulation and its interaction with regulatory complexes like the Polycomb and Trithorax complex. Proteins 2017; 85:682-693. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks.

PubMed

Gerlt, John A; Bouvier, Jason T; Davidson, Daniel B; Imker, Heidi J; Sadkhin, Boris; Slater, David R; Whalen, Katie L

2015-08-01

The Enzyme Function Initiative, an NIH/NIGMS-supported Large-Scale Collaborative Project (EFI; U54GM093342; http://enzymefunction.org/), is focused on devising and disseminating bioinformatics and computational tools as well as experimental strategies for the prediction and assignment of functions (in vitro activities and in vivo physiological/metabolic roles) to uncharacterized enzymes discovered in genome projects. Protein sequence similarity networks (SSNs) are visually powerful tools for analyzing sequence relationships in protein families (H.J. Atkinson, J.H. Morris, T.E. Ferrin, and P.C. Babbitt, PLoS One 2009, 4, e4345). However, the members of the biological/biomedical community have not had access to the capability to generate SSNs for their "favorite" protein families. In this article we announce the EFI-EST (Enzyme Function Initiative-Enzyme Similarity Tool) web tool (http://efi.igb.illinois.edu/efi-est/) that is available without cost for the automated generation of SSNs by the community. The tool can create SSNs for the "closest neighbors" of a user-supplied protein sequence from the UniProt database (Option A) or of members of any user-supplied Pfam and/or InterPro family (Option B). We provide an introduction to SSNs, a description of EFI-EST, and a demonstration of the use of EFI-EST to explore sequence-function space in the OMP decarboxylase superfamily (PF00215). This article is designed as a tutorial that will allow members of the community to use the EFI-EST web tool for exploring sequence/function space in protein families. Copyright © 2015 Elsevier B.V. All rights reserved.
Ab Initio Structural Modeling of and Experimental Validation for Chlamydia trachomatis Protein CT296 Reveal Structural Similarity to Fe(II) 2-Oxoglutarate-Dependent Enzymes▿

PubMed Central

Kemege, Kyle E.; Hickey, John M.; Lovell, Scott; Battaile, Kevin P.; Zhang, Yang; Hefty, P. Scott

2011-01-01

Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF) CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-Å Cα root mean square deviation [RMSD]) the high-resolution (1.8-Å) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur. PMID:21965559
Ab initio structural modeling of and experimental validation for Chlamydia trachomatis protein CT296 reveal structural similarity to Fe(II) 2-oxoglutarate-dependent enzymes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kemege, Kyle E.; Hickey, John M.; Lovell, Scott

2012-02-13

Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF)more » CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-{angstrom} C{alpha} root mean square deviation [RMSD]) the high-resolution (1.8-{angstrom}) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur.« less
A Thermoacidophile-Specific Protein Family, DUF3211, Functions as a Fatty Acid Carrier with Novel Binding Mode

PubMed Central

Miyakawa, Takuya; Sawano, Yoriko; Miyazono, Ken-ichi; Miyauchi, Yumiko; Hatano, Ken-ichi

2013-01-01

STK_08120 is a member of the thermoacidophile-specific DUF3211 protein family from Sulfolobus tokodaii strain 7. Its molecular function remains obscure, and sequence similarities for obtaining functional remarks are not available. In this study, the crystal structure of STK_08120 was determined at 1.79-Å resolution to predict its probable function using structure similarity searches. The structure adopts an α/β structure of a helix-grip fold, which is found in the START domain proteins with cavities for hydrophobic substrates or ligands. The detailed structural features implied that fatty acids are the primary ligand candidates for STK_08120, and binding assays revealed that the protein bound long-chain saturated fatty acids (>C14) and their trans-unsaturated types with an affinity equal to that for major fatty acid binding proteins in mammals and plants. Moreover, the structure of an STK_08120-myristic acid complex revealed a unique binding mode among fatty acid binding proteins. These results suggest that the thermoacidophile-specific protein family DUF3211 functions as a fatty acid carrier with a novel binding mode. PMID:23836863
In silico search for functionally similar proteins involved in meiosis and recombination in evolutionarily distant organisms.

PubMed

Bogdanov, Yuri F; Dadashev, Sergei Y; Grishaeva, Tatiana M

2003-01-01

Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.
The Widespread Prevalence and Functional Significance of Silk-Like Structural Proteins in Metazoan Biological Materials

PubMed Central

McDougall, Carmel; Woodcroft, Ben J.

2016-01-01

In nature, numerous mechanisms have evolved by which organisms fabricate biological structures with an impressive array of physical characteristics. Some examples of metazoan biological materials include the highly elastic byssal threads by which bivalves attach themselves to rocks, biomineralized structures that form the skeletons of various animals, and spider silks that are renowned for their exceptional strength and elasticity. The remarkable properties of silks, which are perhaps the best studied biological materials, are the result of the highly repetitive, modular, and biased amino acid composition of the proteins that compose them. Interestingly, similar levels of modularity/repetitiveness and similar bias in amino acid compositions have been reported in proteins that are components of structural materials in other organisms, however the exact nature and extent of this similarity, and its functional and evolutionary relevance, is unknown. Here, we investigate this similarity and use sequence features common to silks and other known structural proteins to develop a bioinformatics-based method to identify similar proteins from large-scale transcriptome and whole-genome datasets. We show that a large number of proteins identified using this method have roles in biological material formation throughout the animal kingdom. Despite the similarity in sequence characteristics, most of the silk-like structural proteins (SLSPs) identified in this study appear to have evolved independently and are restricted to a particular animal lineage. Although the exact function of many of these SLSPs is unknown, the apparent independent evolution of proteins with similar sequence characteristics in divergent lineages suggests that these features are important for the assembly of biological materials. The identification of these characteristics enable the generation of testable hypotheses regarding the mechanisms by which these proteins assemble and direct the construction of biological materials with diverse morphologies. The SilkSlider predictor software developed here is available at https://github.com/wwood/SilkSlider. PMID:27415783
Quality assessment of protein model-structures based on structural and functional similarities

PubMed Central

2012-01-01

Background Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models. PMID:22998498
Improving protein complex classification accuracy using amino acid composition profile.

PubMed

Huang, Chien-Hung; Chou, Szu-Yu; Ng, Ka-Lok

2013-09-01

Protein complex prediction approaches are based on the assumptions that complexes have dense protein-protein interactions and high functional similarity between their subunits. We investigated those assumptions by studying the subunits' interaction topology, sequence similarity and molecular function for human and yeast protein complexes. Inclusion of amino acids' physicochemical properties can provide better understanding of protein complex properties. Principal component analysis is carried out to determine the major features. Adopting amino acid composition profile information with the SVM classifier serves as an effective post-processing step for complexes classification. Improvement is based on primary sequence information only, which is easy to obtain. Copyright © 2013 Elsevier Ltd. All rights reserved.
Evolutionary Turnover of Kinetochore Proteins: A Ship of Theseus?

PubMed

Drinnenberg, Ines A; Henikoff, Steven; Malik, Harmit S

2016-07-01

The kinetochore is a multiprotein complex that mediates the attachment of a eukaryotic chromosome to the mitotic spindle. The protein composition of kinetochores is similar across species as divergent as yeast and human. However, recent findings have revealed an unexpected degree of compositional diversity in kinetochores. For example, kinetochore proteins that are essential in some species have been lost in others, whereas new kinetochore proteins have emerged in other lineages. Even in lineages with similar kinetochore composition, individual kinetochore proteins have functionally diverged to acquire either essential or redundant roles. Thus, despite functional conservation, the repertoire of kinetochore proteins has undergone recurrent evolutionary turnover. Copyright © 2016 Elsevier Ltd. All rights reserved.
Proteomics profiling of interactome dynamics by colocalisation analysis (COLA).

PubMed

Mardakheh, Faraz K; Sailem, Heba Z; Kümper, Sandra; Tape, Christopher J; McCully, Ryan R; Paul, Angela; Anjomani-Virmouni, Sara; Jørgensen, Claus; Poulogiannis, George; Marshall, Christopher J; Bakal, Chris

2016-12-20

Localisation and protein function are intimately linked in eukaryotes, as proteins are localised to specific compartments where they come into proximity of other functionally relevant proteins. Significant co-localisation of two proteins can therefore be indicative of their functional association. We here present COLA, a proteomics based strategy coupled with a bioinformatics framework to detect protein-protein co-localisations on a global scale. COLA reveals functional interactions by matching proteins with significant similarity in their subcellular localisation signatures. The rapid nature of COLA allows mapping of interactome dynamics across different conditions or treatments with high precision.
A novel highly divergent protein family identified from a viviparous insect by RNA-seq analysis: a potential target for tsetse fly-specific abortifacients.

PubMed

Benoit, Joshua B; Attardo, Geoffrey M; Michalkova, Veronika; Krause, Tyler B; Bohova, Jana; Zhang, Qirui; Baumann, Aaron A; Mireji, Paul O; Takáč, Peter; Denlinger, David L; Ribeiro, Jose M; Aksoy, Serap

2014-04-01

In tsetse flies, nutrients for intrauterine larval development are synthesized by the modified accessory gland (milk gland) and provided in mother's milk during lactation. Interference with at least two milk proteins has been shown to extend larval development and reduce fecundity. The goal of this study was to perform a comprehensive characterization of tsetse milk proteins using lactation-specific transcriptome/milk proteome analyses and to define functional role(s) for the milk proteins during lactation. Differential analysis of RNA-seq data from lactating and dry (non-lactating) females revealed enrichment of transcripts coding for protein synthesis machinery, lipid metabolism and secretory proteins during lactation. Among the genes induced during lactation were those encoding the previously identified milk proteins (milk gland proteins 1-3, transferrin and acid sphingomyelinase 1) and seven new genes (mgp4-10). The genes encoding mgp2-10 are organized on a 40 kb syntenic block in the tsetse genome, have similar exon-intron arrangements, and share regions of amino acid sequence similarity. Expression of mgp2-10 is female-specific and high during milk secretion. While knockdown of a single mgp failed to reduce fecundity, simultaneous knockdown of multiple variants reduced milk protein levels and lowered fecundity. The genomic localization, gene structure similarities, and functional redundancy of MGP2-10 suggest that they constitute a novel highly divergent protein family. Our data indicates that MGP2-10 function both as the primary amino acid resource for the developing larva and in the maintenance of milk homeostasis, similar to the function of the mammalian casein family of milk proteins. This study underscores the dynamic nature of the lactation cycle and identifies a novel family of lactation-specific proteins, unique to Glossina sp., that are essential to larval development. The specificity of MGP2-10 to tsetse and their critical role during lactation suggests that these proteins may be an excellent target for tsetse-specific population control approaches.
Identification of photoactivated adenylyl cyclases in Naegleria australiensis and BLUF-containing protein in Naegleria fowleri.

PubMed

Yasukawa, Hiro; Sato, Aya; Kita, Ayaka; Kodaira, Ken-Ichi; Iseki, Mineo; Takahashi, Tetsuo; Shibusawa, Mami; Watanabe, Masakatsu; Yagita, Kenji

2013-01-01

Complete genome sequencing of Naegleria gruberi has revealed that the organism encodes polypeptides similar to photoactivated adenylyl cyclases (PACs). Screening in the N. australiensis genome showed that the organism also encodes polypeptides similar to PACs. Each of the Naegleria proteins consists of a "sensors of blue-light using FAD" domain (BLUF domain) and an adenylyl cyclase domain (AC domain). PAC activity of the Naegleria proteins was assayed by comparing sensitivities of Escherichia coli cells heterologously expressing the proteins to antibiotics in a dark condition and a blue light-irradiated condition. Antibiotics used in the assays were fosfomycin and fosmidomycin. E. coli cells expressing the Naegleria proteins showed increased fosfomycin sensitivity and fosmidomycin sensitivity when incubated under blue light, indicating that the proteins functioned as PACs in the bacterial cells. Analysis of the N. fowleri genome revealed that the organism encodes a protein bearing an amino acid sequence similar to that of BLUF. A plasmid expressing a chimeric protein consisting of the BLUF-like sequence found in N. fowleri and the adenylyl cyclase domain of N. gruberi PAC was constructed to determine whether the BLUF-like sequence functioned as a sensor of blue light. E. coli cells expressing a chimeric protein showed increased fosfomycin sensitivity and fosmidomycin sensitivity when incubated under blue light. These experimental results indicated that the sequence similar to the BLUF domain found in N. fowleri functioned as a sensor of blue light.

Using the underlying biological organization of the Mycobacterium tuberculosis functional network for protein function prediction.

PubMed

Mazandu, Gaston K; Mulder, Nicola J

2012-07-01

Despite ever-increasing amounts of sequence and functional genomics data, there is still a deficiency of functional annotation for many newly sequenced proteins. For Mycobacterium tuberculosis (MTB), more than half of its genome is still uncharacterized, which hampers the search for new drug targets within the bacterial pathogen and limits our understanding of its pathogenicity. As for many other genomes, the annotations of proteins in the MTB proteome were generally inferred from sequence homology, which is effective but its applicability has limitations. We have carried out large-scale biological data integration to produce an MTB protein functional interaction network. Protein functional relationships were extracted from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and additional functional interactions from microarray, sequence and protein signature data. The confidence level of protein relationships in the additional functional interaction data was evaluated using a dynamic data-driven scoring system. This functional network has been used to predict functions of uncharacterized proteins using Gene Ontology (GO) terms, and the semantic similarity between these terms measured using a state-of-the-art GO similarity metric. To achieve better trade-off between improvement of quality, genomic coverage and scalability, this prediction is done by observing the key principles driving the biological organization of the functional network. This study yields a new functionally characterized MTB strain CDC1551 proteome, consisting of 3804 and 3698 proteins out of 4195 with annotations in terms of the biological process and molecular function ontologies, respectively. These data can contribute to research into the Development of effective anti-tubercular drugs with novel biological mechanisms of action. Copyright © 2011 Elsevier B.V. All rights reserved.
A plasma membrane sucrose-binding protein that mediates sucrose uptake shares structural and sequence similarity with seed storage proteins but remains functionally distinct.

PubMed

Overvoorde, P J; Chao, W S; Grimes, H D

1997-06-20

Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.
Integrative network alignment reveals large regions of global network similarity in yeast and human.

PubMed

Kuchaiev, Oleksii; Przulj, Natasa

2011-05-15

High-throughput methods for detecting molecular interactions have produced large sets of biological network data with much more yet to come. Analogous to sequence alignment, efficient and reliable network alignment methods are expected to improve our understanding of biological systems. Unlike sequence alignment, network alignment is computationally intractable. Hence, devising efficient network alignment heuristics is currently a foremost challenge in computational biology. We introduce a novel network alignment algorithm, called Matching-based Integrative GRAph ALigner (MI-GRAAL), which can integrate any number and type of similarity measures between network nodes (e.g. proteins), including, but not limited to, any topological network similarity measure, sequence similarity, functional similarity and structural similarity. Hence, we resolve the ties in similarity measures and find a combination of similarity measures yielding the largest contiguous (i.e. connected) and biologically sound alignments. MI-GRAAL exposes the largest functional, connected regions of protein-protein interaction (PPI) network similarity to date: surprisingly, it reveals that 77.7% of proteins in the baker's yeast high-confidence PPI network participate in such a subnetwork that is fully contained in the human high-confidence PPI network. This is the first demonstration that species as diverse as yeast and human contain so large, continuous regions of global network similarity. We apply MI-GRAAL's alignments to predict functions of un-annotated proteins in yeast, human and bacteria validating our predictions in the literature. Furthermore, using network alignment scores for PPI networks of different herpes viruses, we reconstruct their phylogenetic relationship. This is the first time that phylogeny is exactly reconstructed from purely topological alignments of PPI networks. Supplementary files and MI-GRAAL executables: http://bio-nets.doc.ic.ac.uk/MI-GRAAL/.
Predicting multicellular function through multi-layer tissue networks

PubMed Central

Zitnik, Marinka; Leskovec, Jure

2017-01-01

Abstract Motivation: Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet prediction of tissue-specific cellular function remains a critical challenge for biomedicine. Results: Here, we present OhmNet, a hierarchy-aware unsupervised node feature learning approach for multi-layer networks. We build a multi-layer network, where each layer represents molecular interactions in a different human tissue. OhmNet then automatically learns a mapping of proteins, represented as nodes, to a neural embedding-based low-dimensional space of features. OhmNet encourages sharing of similar features among proteins with similar network neighborhoods and among proteins activated in similar tissues. The algorithm generalizes prior work, which generally ignores relationships between tissues, by modeling tissue organization with a rich multiscale tissue hierarchy. We use OhmNet to study multicellular function in a multi-layer protein interaction network of 107 human tissues. In 48 tissues with known tissue-specific cellular functions, OhmNet provides more accurate predictions of cellular function than alternative approaches, and also generates more accurate hypotheses about tissue-specific protein actions. We show that taking into account the tissue hierarchy leads to improved predictive power. Remarkably, we also demonstrate that it is possible to leverage the tissue hierarchy in order to effectively transfer cellular functions to a functionally uncharacterized tissue. Overall, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems. Availability and implementation: Source code and datasets are available at http://snap.stanford.edu/ohmnet. Contact: jure@cs.stanford.edu PMID:28881986
Semantic integration to identify overlapping functional modules in protein interaction networks

PubMed Central

Cho, Young-Rae; Hwang, Woochang; Ramanathan, Murali; Zhang, Aidong

2007-01-01

Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification. PMID:17650343
SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

PubMed

Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

2010-01-01

The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).
Detecting Local Ligand-Binding Site Similarity in Non-Homologous Proteins by Surface Patch Comparison

PubMed Central

Sael, Lee; Kihara, Daisuke

2012-01-01

Functional elucidation of proteins is one of the essential tasks in biology. Function of a protein, specifically, small ligand molecules that bind to a protein, can be predicted by finding similar local surface regions in binding sites of known proteins. Here, we developed an alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets. Patch-Surfer was benchmarked on three datasets, which consist in total of 390 proteins that bind to one of 21 ligands. Patch-Surfer showed superior performance to existing methods including a global pocket comparison method, Pocket-Surfer, which we have previously introduced. Particularly, as intended, the accuracy showed large improvement for flexible ligand molecules, which bind to pockets in different conformations. PMID:22275074
Detecting local ligand-binding site similarity in nonhomologous proteins by surface patch comparison.

PubMed

Sael, Lee; Kihara, Daisuke

2012-04-01

Functional elucidation of proteins is one of the essential tasks in biology. Function of a protein, specifically, small ligand molecules that bind to a protein, can be predicted by finding similar local surface regions in binding sites of known proteins. Here, we developed an alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets. Patch-Surfer was benchmarked on three datasets, which consist in total of 390 proteins that bind to one of 21 ligands. Patch-Surfer showed superior performance to existing methods including a global pocket comparison method, Pocket-Surfer, which we have previously introduced. Particularly, as intended, the accuracy showed large improvement for flexible ligand molecules, which bind to pockets in different conformations. Copyright © 2011 Wiley Periodicals, Inc.
Assessment of protein set coherence using functional annotations

PubMed Central

Chagoyen, Monica; Carazo, Jose M; Pascual-Montano, Alberto

2008-01-01

Background Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. Results In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. Conclusion We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at PMID:18937846
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

PubMed

Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

2011-06-02

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Fast and automated functional classification with MED-SuMo: an application on purine-binding proteins.

PubMed

Doppelt-Azeroual, Olivia; Delfaud, François; Moriaud, Fabrice; de Brevern, Alexandre G

2010-04-01

Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Fast and automated functional classification with MED-SuMo: An application on purine-binding proteins

PubMed Central

Doppelt-Azeroual, Olivia; Delfaud, François; Moriaud, Fabrice; de Brevern, Alexandre G

2010-01-01

Ligand–protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects. PMID:20162627
Exploring Mouse Protein Function via Multiple Approaches.

PubMed

Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong

2016-01-01

Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
Exploring Mouse Protein Function via Multiple Approaches

PubMed Central

Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning

2016-01-01

Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315
Evolutionarily Conserved Linkage between Enzyme Fold, Flexibility, and Catalysis

PubMed Central

Ramanathan, Arvind; Agarwal, Pratul K.

2011-01-01

Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function. Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 Å away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme–substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme–substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design. PMID:22087074
Evolutionarily conserved linkage between enzyme fold, flexibility, and catalysis.

PubMed

Ramanathan, Arvind; Agarwal, Pratul K

2011-11-01

Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function. Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 Å away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme-substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme-substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ramanathan, Arvind; Agarwal, Pratul K

Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function.more » Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design.« less
Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.

PubMed

Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene

2011-01-01

To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.
Centrins in unicellular organisms: functional diversity and specialization.

PubMed

Zhang, Yu; He, Cynthia Y

2012-07-01

Centrins (also known as caltractins) are conserved, EF hand-containing proteins ubiquitously found in eukaryotes. Similar to calmodulins, the calcium-binding EF hands in centrins fold into two structurally similar domains separated by an alpha-helical linker region, shaping like a dumbbell. The small size (15-22 kDa) and domain organization of centrins and their functional diversity/specialization make them an ideal system to study protein structure-function relationship. Here, we review the work on centrins with a focus on their structures and functions characterized in unicellular organisms.
M-Finder: Uncovering functionally associated proteins from interactome data integrated with GO annotations

PubMed Central

2013-01-01

Background Protein-protein interactions (PPIs) play a key role in understanding the mechanisms of cellular processes. The availability of interactome data has catalyzed the development of computational approaches to elucidate functional behaviors of proteins on a system level. Gene Ontology (GO) and its annotations are a significant resource for functional characterization of proteins. Because of wide coverage, GO data have often been adopted as a benchmark for protein function prediction on the genomic scale. Results We propose a computational approach, called M-Finder, for functional association pattern mining. This method employs semantic analytics to integrate the genome-wide PPIs with GO data. We also introduce an interactive web application tool that visualizes a functional association network linked to a protein specified by a user. The proposed approach comprises two major components. First, the PPIs that have been generated by high-throughput methods are weighted in terms of their functional consistency using GO and its annotations. We assess two advanced semantic similarity metrics which quantify the functional association level of each interacting protein pair. We demonstrate that these measures outperform the other existing methods by evaluating their agreement to other biological features, such as sequence similarity, the presence of common Pfam domains, and core PPIs. Second, the information flow-based algorithm is employed to discover a set of proteins functionally associated with the protein in a query and their links efficiently. This algorithm reconstructs a functional association network of the query protein. The output network size can be flexibly determined by parameters. Conclusions M-Finder provides a useful framework to investigate functional association patterns with any protein. This software will also allow users to perform further systematic analysis of a set of proteins for any specific function. It is available online at http://bionet.ecs.baylor.edu/mfinder PMID:24565382

Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks.

PubMed

Li, Min; Li, Qi; Ganegoda, Gamage Upeksha; Wang, JianXin; Wu, FangXiang; Pan, Yi

2014-11-01

Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.
Graph pyramids for protein function prediction

PubMed Central

2015-01-01

Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Graph pyramids for protein function prediction.

PubMed

Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun

2015-01-01

Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic.

PubMed

Brown, Peter; Pullan, Wayne; Yang, Yuedong; Zhou, Yaoqi

2016-02-01

The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org yaoqi.zhou@griffith.edu.au. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Exploring Protein Dynamics Space: The Dynasome as the Missing Link between Protein Structure and Function

PubMed Central

Hensen, Ulf; Meyer, Tim; Haas, Jürgen; Rex, René; Vriend, Gert; Grubmüller, Helmut

2012-01-01

Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics. PMID:22606222
Family-specific scaling laws in bacterial genomes.

PubMed

De Lazzari, Eleonora; Grilli, Jacopo; Maslov, Sergei; Cosentino Lagomarsino, Marco

2017-07-27

Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Rift Valley fever virus NSs protein functions and the similarity to other bunyavirus NSs proteins.

PubMed

Ly, Hoai J; Ikegami, Tetsuro

2016-07-02

Rift Valley fever is a mosquito-borne zoonotic disease that affects both ruminants and humans. The nonstructural (NS) protein, which is a major virulence factor for Rift Valley fever virus (RVFV), is encoded on the S-segment. Through the cullin 1-Skp1-Fbox E3 ligase complex, the NSs protein promotes the degradation of at least two host proteins, the TFIIH p62 and the PKR proteins. NSs protein bridges the Fbox protein with subsequent substrates, and facilitates the transfer of ubiquitin. The SAP30-YY1 complex also bridges the NSs protein with chromatin DNA, affecting cohesion and segregation of chromatin DNA as well as the activation of interferon-β promoter. The presence of NSs filaments in the nucleus induces DNA damage responses and causes cell-cycle arrest, p53 activation, and apoptosis. Despite the fact that NSs proteins have poor amino acid similarity among bunyaviruses, the strategy utilized to hijack host cells are similar. This review will provide and summarize an update of recent findings pertaining to the biological functions of the NSs protein of RVFV as well as the differences from those of other bunyaviruses.
Hidden relationships between metalloproteins unveiled by structural comparison of their metal sites

NASA Astrophysics Data System (ADS)

Valasatava, Yana; Andreini, Claudia; Rosato, Antonio

2015-03-01

Metalloproteins account for a substantial fraction of all proteins. They incorporate metal atoms, which are required for their structure and/or function. Here we describe a new computational protocol to systematically compare and classify metal-binding sites on the basis of their structural similarity. These sites are extracted from the MetalPDB database of minimal functional sites (MFSs) in metal-binding biological macromolecules. Structural similarity is measured by the scoring function of the available MetalS2 program. Hierarchical clustering was used to organize MFSs into clusters, for each of which a representative MFS was identified. The comparison of all representative MFSs provided a thorough structure-based classification of the sites analyzed. As examples, the application of the proposed computational protocol to all heme-binding proteins and zinc-binding proteins of known structure highlighted the existence of structural subtypes, validated known evolutionary links and shed new light on the occurrence of similar sites in systems at different evolutionary distances. The present approach thus makes available an innovative viewpoint on metalloproteins, where the functionally crucial metal sites effectively lead the discovery of structural and functional relationships in a largely protein-independent manner.
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zemla, A; Lang, D; Kostova, T

2010-11-29

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

PubMed Central

Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

2013-01-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704
Claudin Loss-of-Function Disrupts Tight Junctions and Impairs Amelogenesis

PubMed Central

Bardet, Claire; Ribes, Sandy; Wu, Yong; Diallo, Mamadou Tidiane; Salmon, Benjamin; Breiderhoff, Tilman; Houillier, Pascal; Müller, Dominik; Chaussain, Catherine

2017-01-01

Claudins are a family of proteins that forms paracellular barriers and pores determining tight junctions (TJ) permeability. Claudin-16 and -19 are pore forming TJ proteins allowing calcium and magnesium reabsorption in the thick ascending limb of Henle's loop (TAL). Loss-of-function mutations in the encoding genes, initially identified to cause Familial Hypomagnesemia with Hypercalciuria and Nephrocalcinosis (FHHNC), were recently shown to be also involved in Amelogenesis Imperfecta (AI). In addition, both claudins were expressed in the murine tooth germ and Claudin-16 knockout (KO) mice displayed abnormal enamel formation. Claudin-3, an ubiquitous claudin expressed in epithelia including kidney, acts as a barrier-forming tight junction protein. We determined that, similarly to claudin-16 and claudin-19, claudin-3 was expressed in the tooth germ, more precisely in the TJ located at the apical end of secretory ameloblasts. The observation of Claudin-3 KO teeth revealed enamel defects associated to impaired TJ structure at the secretory ends of ameloblasts and accumulation of matrix proteins in the forming enamel. Thus, claudin-3 protein loss-of-function disturbs amelogenesis similarly to claudin-16 loss-of-function, highlighting the importance of claudin proteins for the TJ structure. These findings unravel that loss-of-function of either pore or barrier-forming TJ proteins leads to enamel defects. Hence, the major structural function of claudin proteins appears essential for amelogenesis. PMID:28596736
Claudin Loss-of-Function Disrupts Tight Junctions and Impairs Amelogenesis.

PubMed

Bardet, Claire; Ribes, Sandy; Wu, Yong; Diallo, Mamadou Tidiane; Salmon, Benjamin; Breiderhoff, Tilman; Houillier, Pascal; Müller, Dominik; Chaussain, Catherine

2017-01-01

Claudins are a family of proteins that forms paracellular barriers and pores determining tight junctions (TJ) permeability. Claudin-16 and -19 are pore forming TJ proteins allowing calcium and magnesium reabsorption in the thick ascending limb of Henle's loop (TAL). Loss-of-function mutations in the encoding genes, initially identified to cause Familial Hypomagnesemia with Hypercalciuria and Nephrocalcinosis (FHHNC), were recently shown to be also involved in Amelogenesis Imperfecta (AI). In addition, both claudins were expressed in the murine tooth germ and Claudin-16 knockout (KO) mice displayed abnormal enamel formation. Claudin-3, an ubiquitous claudin expressed in epithelia including kidney, acts as a barrier-forming tight junction protein. We determined that, similarly to claudin-16 and claudin-19, claudin-3 was expressed in the tooth germ, more precisely in the TJ located at the apical end of secretory ameloblasts. The observation of Claudin-3 KO teeth revealed enamel defects associated to impaired TJ structure at the secretory ends of ameloblasts and accumulation of matrix proteins in the forming enamel. Thus, claudin-3 protein loss-of-function disturbs amelogenesis similarly to claudin-16 loss-of-function, highlighting the importance of claudin proteins for the TJ structure. These findings unravel that loss-of-function of either pore or barrier-forming TJ proteins leads to enamel defects. Hence, the major structural function of claudin proteins appears essential for amelogenesis.
NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

PubMed

Wei, Qing; Khan, Ishita K; Ding, Ziyun; Yerneni, Satwica; Kihara, Daisuke

2017-03-20

The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .
A Novel Highly Divergent Protein Family Identified from a Viviparous Insect by RNA-seq Analysis: A Potential Target for Tsetse Fly-Specific Abortifacients

PubMed Central

Benoit, Joshua B.; Attardo, Geoffrey M.; Michalkova, Veronika; Krause, Tyler B.; Bohova, Jana; Zhang, Qirui; Baumann, Aaron A.; Mireji, Paul O.; Takáč, Peter; Denlinger, David L.; Ribeiro, Jose M.; Aksoy, Serap

2014-01-01

In tsetse flies, nutrients for intrauterine larval development are synthesized by the modified accessory gland (milk gland) and provided in mother's milk during lactation. Interference with at least two milk proteins has been shown to extend larval development and reduce fecundity. The goal of this study was to perform a comprehensive characterization of tsetse milk proteins using lactation-specific transcriptome/milk proteome analyses and to define functional role(s) for the milk proteins during lactation. Differential analysis of RNA-seq data from lactating and dry (non-lactating) females revealed enrichment of transcripts coding for protein synthesis machinery, lipid metabolism and secretory proteins during lactation. Among the genes induced during lactation were those encoding the previously identified milk proteins (milk gland proteins 1–3, transferrin and acid sphingomyelinase 1) and seven new genes (mgp4–10). The genes encoding mgp2–10 are organized on a 40 kb syntenic block in the tsetse genome, have similar exon-intron arrangements, and share regions of amino acid sequence similarity. Expression of mgp2–10 is female-specific and high during milk secretion. While knockdown of a single mgp failed to reduce fecundity, simultaneous knockdown of multiple variants reduced milk protein levels and lowered fecundity. The genomic localization, gene structure similarities, and functional redundancy of MGP2–10 suggest that they constitute a novel highly divergent protein family. Our data indicates that MGP2–10 function both as the primary amino acid resource for the developing larva and in the maintenance of milk homeostasis, similar to the function of the mammalian casein family of milk proteins. This study underscores the dynamic nature of the lactation cycle and identifies a novel family of lactation-specific proteins, unique to Glossina sp., that are essential to larval development. The specificity of MGP2–10 to tsetse and their critical role during lactation suggests that these proteins may be an excellent target for tsetse-specific population control approaches. PMID:24763277
The correlation of fragmentation and structure of a protein

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Qinyuan; Cheng, Xueheng; Van Orden, S.

1995-12-31

Characterization of proteins of similar structures is important to understanding the biological function of the proteins and the processes with which they are involved. Cytochrome c variants typically have similar sequences, and have similar conformations in solution with almost identical absorption spectra and redox potentials. The authors chose cytochrome c`s from bovine, tuna, rabbit and horse as a model system in studying large biomolecules using MS{sup n} of multiply charged ions generated from electrospray ionization (ESI).
A new family of β-helix proteins with similarities to the polysaccharide lyases

DOE PAGES

Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.

2014-09-27

Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less
A new family of β-helix proteins with similarities to the polysaccharide lyases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.

Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less
Detecting similarities among distant homologous proteins by comparison of domain flexibilities.

PubMed

Pandini, Alessandro; Mauri, Giancarlo; Bordogna, Annalisa; Bonati, Laura

2007-06-01

Aim of this work is to assess the informativeness of protein dynamics in the detection of similarities among distant homologous proteins. To this end, an approach to perform large-scale comparisons of protein domain flexibilities is proposed. CONCOORD is confirmed as a reliable method for fast conformational sampling. The root mean square fluctuation of alpha carbon positions in the essential dynamics subspace is employed as a measure of local flexibility and a synthetic index of similarity is presented. The dynamics of a large collection of protein domains from ASTRAL/SCOP40 is analyzed and the possibility to identify relationships, at both the family and the superfamily levels, on the basis of the dynamical features is discussed. The obtained picture is in agreement with the SCOP classification, and furthermore suggests the presence of a distinguishable familiar trend in the flexibility profiles. The results support the complementarity of the dynamical and the structural information, suggesting that information from dynamics analysis can arise from functional similarities, often partially hidden by a static comparison. On the basis of this first test, flexibility annotation can be expected to help in automatically detecting functional similarities otherwise unrecoverable.
Optimal network alignment with graphlet degree vectors.

PubMed

Milenković, Tijana; Ng, Weng Leong; Hayes, Wayne; Przulj, Natasa

2010-06-30

Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.
NDE1 and NDEL1: twin neurodevelopmental proteins with similar ‘nature’ but different ‘nurture’

PubMed Central

Bradshaw, Nicholas J.; Hennah, William; Soares, Dinesh C.

2013-01-01

Nuclear distribution element 1 (NDE1, also known as NudE) and NDE-like 1 (NDEL1, also known as Nudel) are paralogous proteins essential for mitosis and neurodevelopment that have been implicated in psychiatric and neurodevelopmental disorders. The two proteins possess high sequence similarity and have been shown to physically interact with one another. Numerous lines of experimental evidence in vivo and in cell culture have demonstrated that these proteins share common functions, although instances of differing functions between the two have recently emerged. We review the key aspects of NDE1 and NDEL1 in terms of recent advances in structure elucidation and cellular function, with an emphasis on their differing mechanisms of post-translational modification. Based on a review of the literature and bioinformatics assessment, we advance the concept that the twin proteins NDE1 and NDEL1, while sharing a similar ‘nature’ in terms of their structure and basic functions, appear to be different in their ‘nurture’, the manner in which they are regulated both in terms of expression and of post-translational modification within the cell. These differences are likely to be of significant importance in understanding the specific roles of NDE1 and NDEL1 in neurodevelopment and disease. PMID:24093049

FunTree: advances in a resource for exploring and contextualising protein function evolution.

PubMed

Sillitoe, Ian; Furnham, Nicholas

2016-01-04

FunTree is a resource that brings together protein sequence, structure and functional information, including overall chemical reaction and mechanistic data, for structurally defined domain superfamilies. Developed in tandem with the CATH database, the original FunTree contained just 276 superfamilies focused on enzymes. Here, we present an update of FunTree that has expanded to include 2340 superfamilies including both enzymes and proteins with non-enzymatic functions annotated by Gene Ontology (GO) terms. This allows the investigation of how novel functions have evolved within a structurally defined superfamily and provides a means to analyse trends across many superfamilies. This is done not only within the context of a protein's sequence and structure but also the relationships of their functions. New measures of functional similarity have been integrated, including for enzymes comparisons of overall reactions based on overall bond changes, reaction centres (the local environment atoms involved in the reaction) and the sub-structure similarities of the metabolites involved in the reaction and for non-enzymes semantic similarities based on the GO. To identify and highlight changes in function through evolution, ancestral character estimations are made and presented. All this is accessible through a new re-designed web interface that can be found at http://www.funtree.info. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily.

PubMed

Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju

2015-01-01

Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shatsky, Maxim; Allen, Simon; Gold, Barbara

Numerous affinity purification – mass-spectrometry (AP-MS) and yeast two hybrid (Y2H) screens have each defined thousands of pairwise protein-protein interactions (PPIs), most between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial Y2H and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli. Compared to the nine published interactomes, our two networks are smaller; are much less highly connected; have significantly lower false discovery rates; and are much moremore » enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays. Lastly, our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested.« less
Structure refinement of membrane proteins via molecular dynamics simulations.

PubMed

Dutagaci, Bercem; Heo, Lim; Feig, Michael

2018-07-01

A refinement protocol based on physics-based techniques established for water soluble proteins is tested for membrane protein structures. Initial structures were generated by homology modeling and sampled via molecular dynamics simulations in explicit lipid bilayer and aqueous solvent systems. Snapshots from the simulations were selected based on scoring with either knowledge-based or implicit membrane-based scoring functions and averaged to obtain refined models. The protocol resulted in consistent and significant refinement of the membrane protein structures similar to the performance of refinement methods for soluble proteins. Refinement success was similar between sampling in the presence of lipid bilayers and aqueous solvent but the presence of lipid bilayers may benefit the improvement of lipid-facing residues. Scoring with knowledge-based functions (DFIRE and RWplus) was found to be as good as scoring using implicit membrane-based scoring functions suggesting that differences in internal packing is more important than orientations relative to the membrane during the refinement of membrane protein homology models. © 2018 Wiley Periodicals, Inc.
Protein function prediction using neighbor relativity in protein-protein interaction network.

PubMed

Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

2013-04-01

There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network. Copyright © 2012 Elsevier Ltd. All rights reserved.
PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

PubMed

Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

2014-01-01

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

PubMed

Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

2017-04-15

Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

PubMed Central

Sinclair, Robert M.; Ravantti, Janne J.

2017-01-01

ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

PubMed Central

2009-01-01

Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

PubMed

Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

2009-10-12

Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.
Origins of Protein Functions in Cells

NASA Technical Reports Server (NTRS)

Seelig, Burchard; Pohorille, Andrzej

2011-01-01

In modern organisms proteins perform a majority of cellular functions, such as chemical catalysis, energy transduction and transport of material across cell walls. Although great strides have been made towards understanding protein evolution, a meaningful extrapolation from contemporary proteins to their earliest ancestors is virtually impossible. In an alternative approach, the origin of water-soluble proteins was probed through the synthesis and in vitro evolution of very large libraries of random amino acid sequences. In combination with computer modeling and simulations, these experiments allow us to address a number of fundamental questions about the origins of proteins. Can functionality emerge from random sequences of proteins? How did the initial repertoire of functional proteins diversify to facilitate new functions? Did this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Did protein evolution start from a pool of proteins defined by a frozen accident and other collections of proteins could start a different evolutionary pathway? Although we do not have definitive answers to these questions yet, important clues have been uncovered. In one example (Keefe and Szostak, 2001), novel ATP binding proteins were identified that appear to be unrelated in both sequence and structure to any known ATP binding proteins. One of these proteins was subsequently redesigned computationally to bind GTP through introducing several mutations that introduce targeted structural changes to the protein, improve its binding to guanine and prevent water from accessing the active center. This study facilitates further investigations of individual evolutionary steps that lead to a change of function in primordial proteins. In a second study (Seelig and Szostak, 2007), novel enzymes were generated that can join two pieces of RNA in a reaction for which no natural enzymes are known. Recently it was found that, as in the previous case, the proteins have a structure unknown among modern enzymes. In this case, in vitro evolution started from a small, non-enzymatic protein. A similar selection process initiated from a library of random polypeptides is in progress. These results not only allow for estimating the occurrence of function in random protein assemblies but also provide evidence for the possibility of alternative protein worlds. Extant proteins might simply represent a frozen accident in the world of possible proteins. Alternative collections of proteins, even with similar functions, could originate alternative evolutionary paths.
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

PubMed

Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

2015-12-01

The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
HIP12 is a non-proapoptotic member of a gene family including HIP1, an interacting protein with huntingtin.

PubMed

Chopra, V S; Metzler, M; Rasper, D M; Engqvist-Goldstein, A E; Singaraja, R; Gan, L; Fichter, K M; McCutcheon, K; Drubin, D; Nicholson, D W; Hayden, M R

2000-11-01

Huntingtin-interacting protein I (HIP1) is a membrane-associated protein that interacts with huntingtin, the protein altered in Huntington disease. HIP1 shows homology to Sla2p, a protein essential for the assembly and function of the cytoskeleton and endocytosis in Saccharomyces cerevisiae. We have determined that the HIP1 gene comprises 32 exons spanning approximately 215 kb of genomic DNA and gives rise to two alternate splice forms termed HIP1-1 and HIP1-2. Additionally, we have identified a novel protein termed HIP12 with significant sequence and biochemical similarities to HIP1 and high sequence similarity to Sla2p. HIP12 differs from HIP1 in its pattern of expression both at the mRNA and protein level. However, HIP1 and HIP12 are both found within the brain and show a similar subcellular distribution pattern. In contrast to HIP1, which is toxic in cell culture, HIP12 does not confer toxicity in the same assay systems. Interestingly, HIP12 does not interact with huntingtin but can interact with HIP1. suggesting a potential interaction in vivo that may influence the function of each respective protein.
Interrogation of Mammalian Protein Complex Structure, Function, and Membership Using Genome-Scale Fitness Screens. | Office of Cancer Genomics

Cancer.gov

Protein complexes are assemblies of subunits that have co-evolved to execute one or many coordinated functions in the cellular environment. Functional annotation of mammalian protein complexes is critical to understanding biological processes, as well as disease mechanisms. Here, we used genetic co-essentiality derived from genome-scale RNAi- and CRISPR-Cas9-based fitness screens performed across hundreds of human cancer cell lines to assign measures of functional similarity.
Crystal structures of MW1337R and lin2004: Representatives of a novel protein family that adopt a four-helical bundle fold

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kozbial, Piotr; Xu, Qingping; Chiu, Hsiu-Ju

2009-08-28

To extend the structural coverage of proteins with unknown functions, we targeted a novel protein family (Pfam accession number PF08807, DUF1798) for which we proposed and determined the structures of two representative members. The MW1337R gene of Staphylococcus aureus subsp. aureus Rosenbach (Wood 46) encodes a protein with a molecular weight of 13.8 kDa (residues 1-116) and a calculated isoelectric point of 5.15. The lin2004 gene of the nonspore-forming bacterium Listeria innocua Clip11262 encodes a protein with a molecular weight of 14.6 kDa (residues 1-121) and a calculated isoelectric point of 5.45. MW1337R and lin2004, as well as their homologs,more » which, so far, have been found only in Bacillus, Staphylococcus, Listeria, and related genera (Geobacillus, Exiguobacterium, and Oceanobacillus), have unknown functions and are annotated as hypothetical proteins. The genomic contexts of MW1337R and lin2004 are similar and conserved in related species. In prokaryotic genomes, most often, functionally interacting proteins are coded by genes, which are colocated in conserved operons. Proteins from the same operon as MW1337R and lin2004 either have unknown functions (i.e., belong to DUF1273, Pfam accession number PF06908) or are similar to ypsB from Bacillus subtilis. The function of ypsB is unclear, although it has a strong similarity to the N-terminal region of DivIVA, which was characterized as a bifunctional protein with distinct roles during vegetative growth and sporulation. In addition, members of the DUF1273 family display distant sequence similarity with the DprA/Smf protein, which acts downstream of the DNA uptake machinery, possibly in conjunction with RecA. The RecA activities in Bacillus subtilis are modulated by RecU Holliday-junction resolvase. In all analyzed cases, the gene coding for RecU is in the vicinity of MW1337R, lin2004, or their orthologs, but on a different operon located in the complementary DNA strand. Here, we report the crystal structures of MW1337R and lin2004, which were determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG), part of the National Institute of General Medical Sciences Protein Structure Initiative.« less
Prediction of protein-protein interaction network using a multi-objective optimization approach.

PubMed

Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit

2016-06-01

Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score.
An integrative approach to inferring biologically meaningful gene modules.

PubMed

Cho, Ji-Hoon; Wang, Kai; Galas, David J

2011-07-26

The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.
Geometrical comparison of two protein structures using Wigner-D functions.

PubMed

Saberi Fathi, S M; White, Diana T; Tuszynski, Jack A

2014-10-01

In this article, we develop a quantitative comparison method for two arbitrary protein structures. This method uses a root-mean-square deviation characterization and employs a series expansion of the protein's shape function in terms of the Wigner-D functions to define a new criterion, which is called a "similarity value." We further demonstrate that the expansion coefficients for the shape function obtained with the help of the Wigner-D functions correspond to structure factors. Our method addresses the common problem of comparing two proteins with different numbers of atoms. We illustrate it with a worked example. © 2014 Wiley Periodicals, Inc.
Revealing protein functions based on relationships of interacting proteins and GO terms.

PubMed

Teng, Zhixia; Guo, Maozu; Liu, Xiaoyan; Tian, Zhen; Che, Kai

2017-09-20

In recent years, numerous computational methods predicted protein function based on the protein-protein interaction (PPI) network. These methods supposed that two proteins share the same function if they interact with each other. However, it is reported by recent studies that the functions of two interacting proteins may be just related. It will mislead the prediction of protein function. Therefore, there is a need for investigating the functional relationship between interacting proteins. In this paper, the functional relationship between interacting proteins is studied and a novel method, called as GoDIN, is advanced to annotate functions of interacting proteins in Gene Ontology (GO) context. It is assumed that the functional difference between interacting proteins can be expressed by semantic difference between GO term and its relatives. Thus, the method uses GO term and its relatives to annotate the interacting proteins separately according to their functional roles in the PPI network. The method is validated by a series of experiments and compared with the concerned method. The experimental results confirm the assumption and suggest that GoDIN is effective on predicting functions of protein. This study demonstrates that: (1) interacting proteins are not equal in the PPI network, and their function may be same or similar, or just related; (2) functional difference between interacting proteins can be measured by their degrees in the PPI network; (3) functional relationship between interacting proteins can be expressed by relationship between GO term and its relatives.
Detection of functionally important regions in "hypothetical proteins" of known structure.

PubMed

Nimrod, Guy; Schushan, Maya; Steinberg, David M; Ben-Tal, Nir

2008-12-10

Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.

Efficient encapsulation of proteins with random copolymers.

PubMed

Nguyen, Trung Dac; Qiao, Baofu; Olvera de la Cruz, Monica

2018-06-12

Membraneless organelles are aggregates of disordered proteins that form spontaneously to promote specific cellular functions in vivo. The possibility of synthesizing membraneless organelles out of cells will therefore enable fabrication of protein-based materials with functions inherent to biological matter. Since random copolymers contain various compositions and sequences of solvophobic and solvophilic groups, they are expected to function in nonbiological media similarly to a set of disordered proteins in membraneless organelles. Interestingly, the internal environment of these organelles has been noted to behave more like an organic solvent than like water. Therefore, an adsorbed layer of random copolymers that mimics the function of disordered proteins could, in principle, protect and enhance the proteins' enzymatic activity even in organic solvents, which are ideal when the products and/or the reactants have limited solubility in aqueous media. Here, we demonstrate via multiscale simulations that random copolymers efficiently incorporate proteins into different solvents with the potential to optimize their enzymatic activity. We investigate the key factors that govern the ability of random copolymers to encapsulate proteins, including the adsorption energy, copolymer average composition, and solvent selectivity. The adsorbed polymer chains have remarkably similar sequences, indicating that the proteins are able to select certain sequences that best reduce their exposure to the solvent. We also find that the protein surface coverage decreases when the fluctuation in the average distance between the protein adsorption sites increases. The results herein set the stage for computational design of random copolymers for stabilizing and delivering proteins across multiple media.
Bioinformatic analysis of microRNA biogenesis and function related proteins in eleven animal genomes.

PubMed

Liu, Xiuying; Luo, GuanZheng; Bai, Xiujuan; Wang, Xiu-Jie

2009-10-01

MicroRNAs are approximately 22 nt long small non-coding RNAs that play important regulatory roles in eukaryotes. The biogenesis and functional processes of microRNAs require the participation of many proteins, of which, the well studied ones are Dicer, Drosha, Argonaute and Exportin 5. To systematically study these four protein families, we screened 11 animal genomes to search for genes encoding above mentioned proteins, and identified some new members for each family. Domain analysis results revealed that most proteins within the same family share identical or similar domains. Alternative spliced transcript variants were found for some proteins. We also examined the expression patterns of these proteins in different human tissues and identified other proteins that could potentially interact with these proteins. These findings provided systematic information on the four key proteins involved in microRNA biogenesis and functional pathways in animals, and will shed light on further functional studies of these proteins.
Clustering and visualizing similarity networks of membrane proteins.

PubMed

Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

2015-08-01

We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.
A statistical physics perspective on alignment-independent protein sequence comparison.

PubMed

Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R

2015-08-01

Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.
Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling

PubMed Central

MacDonald, James T.; Kelley, Lawrence A.; Freemont, Paul S.

2013-01-01

Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using -carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residue-specific /-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/phyre2/PD2/. PMID:23824634
The similarity between N-terminal targeting signals for protein import into different organelles and its evolutionary relevance

PubMed Central

Kunze, Markus; Berger, Johannes

2015-01-01

The proper distribution of proteins between the cytosol and various membrane-bound compartments is crucial for the functionality of eukaryotic cells. This requires the cooperation between protein transport machineries that translocate diverse proteins from the cytosol into these compartments and targeting signal(s) encoded within the primary sequence of these proteins that define their cellular destination. The mechanisms exerting protein translocation differ remarkably between the compartments, but the predominant targeting signals for mitochondria, chloroplasts and the ER share the N-terminal position, an α-helical structural element and the removal from the core protein by intraorganellar cleavage. Interestingly, similar properties have been described for the peroxisomal targeting signal type 2 mediating the import of a fraction of soluble peroxisomal proteins, whereas other peroxisomal matrix proteins encode the type 1 targeting signal residing at the extreme C-terminus. The structural similarity of N-terminal targeting signals poses a challenge to the specificity of protein transport, but allows the generation of ambiguous targeting signals that mediate dual targeting of proteins into different compartments. Dual targeting might represent an advantage for adaptation processes that involve a redistribution of proteins, because it circumvents the hierarchy of targeting signals. Thus, the co-existence of two equally functional import pathways into peroxisomes might reflect a balance between evolutionary constant and flexible transport routes. PMID:26441678
webPIPSA: a web server for the comparison of protein interaction properties

PubMed Central

Richter, Stefan; Wenzel, Anne; Stein, Matthias; Gabdoulline, Razif R.; Wade, Rebecca C.

2008-01-01

Protein molecular interaction fields are key determinants of protein functionality. PIPSA (Protein Interaction Property Similarity Analysis) is a procedure to compare and analyze protein molecular interaction fields, such as the electrostatic potential. PIPSA may assist in protein functional assignment, classification of proteins, the comparison of binding properties and the estimation of enzyme kinetic parameters. webPIPSA is a web server that enables the use of PIPSA to compare and analyze protein electrostatic potentials. While PIPSA can be run with downloadable software (see http://projects.eml.org/mcm/software/pipsa), webPIPSA extends and simplifies a PIPSA run. This allows non-expert users to perform PIPSA for their protein datasets. With input protein coordinates, the superposition of protein structures, as well as the computation and analysis of electrostatic potentials, is automated. The results are provided as electrostatic similarity matrices from an all-pairwise comparison of the proteins which can be subjected to clustering and visualized as epograms (tree-like diagrams showing electrostatic potential differences) or heat maps. webPIPSA is freely available at: http://pipsa.eml.org. PMID:18420653
The Popeye Domain Containing Genes and Their Function as cAMP Effector Proteins in Striated Muscle.

PubMed

Brand, Thomas

2018-03-13

The Popeye domain containing (POPDC) genes encode transmembrane proteins, which are abundantly expressed in striated muscle cells. Hallmarks of the POPDC proteins are the presence of three transmembrane domains and the Popeye domain, which makes up a large part of the cytoplasmic portion of the protein and functions as a cAMP-binding domain. Interestingly, despite the prediction of structural similarity between the Popeye domain and other cAMP binding domains, at the protein sequence level they strongly differ from each other suggesting an independent evolutionary origin of POPDC proteins. Loss-of-function experiments in zebrafish and mouse established an important role of POPDC proteins for cardiac conduction and heart rate adaptation after stress. Loss-of function mutations in patients have been associated with limb-girdle muscular dystrophy and AV-block. These data suggest an important role of these proteins in the maintenance of structure and function of striated muscle cells.
OAS proteins and cGAS: unifying concepts in sensing and responding to cytosolic nucleic acids.

PubMed

Hornung, Veit; Hartmann, Rune; Ablasser, Andrea; Hopfner, Karl-Peter

2014-08-01

Recent discoveries in the field of innate immunity have highlighted the existence of a family of nucleic acid-sensing proteins that have similar structural and functional properties. These include the well-known oligoadenylate synthase (OAS) family proteins and the recently identified OAS homologue cyclic GMP-AMP (cGAMP) synthase (cGAS). The OAS proteins and cGAS are template-independent nucleotidyltransferases that, once activated by double-stranded nucleic acids in the cytosol, produce unique classes of 2'-5'-linked second messenger molecules, which - through distinct mechanisms - have crucial antiviral functions. 2'-5'-linked oligoadenylates limit viral propagation through the activation of the enzyme RNase L, which degrades host and viral RNA, and 2'-5'-linked cGAMP activates downstream signalling pathways to induce de novo antiviral gene expression. In this Progress article, we describe the striking functional and structural similarities between OAS proteins and cGAS, and highlight their roles in antiviral immunity.
Proteomics profiling of interactome dynamics by colocalisation analysis (COLA)† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c6mb00701e Click here for additional data file. Click here for additional data file.

PubMed Central

Sailem, Heba Z.; Kümper, Sandra; Tape, Christopher J.; McCully, Ryan R.; Paul, Angela; Anjomani-Virmouni, Sara; Jørgensen, Claus; Poulogiannis, George; Marshall, Christopher J.

2017-01-01

Localisation and protein function are intimately linked in eukaryotes, as proteins are localised to specific compartments where they come into proximity of other functionally relevant proteins. Significant co-localisation of two proteins can therefore be indicative of their functional association. We here present COLA, a proteomics based strategy coupled with a bioinformatics framework to detect protein–protein co-localisations on a global scale. COLA reveals functional interactions by matching proteins with significant similarity in their subcellular localisation signatures. The rapid nature of COLA allows mapping of interactome dynamics across different conditions or treatments with high precision. PMID:27824369
Protein structure similarity from Principle Component Correlation analysis.

PubMed

Zhou, Xiaobo; Chou, James; Wong, Stephen T C

2006-01-25

Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or topologically similar proteins. We believe that the PCC analysis of interaction matrix is highly flexible in adopting various structural parameters for protein structure comparison.
Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

DOE PAGES

McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; ...

2015-03-09

There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less
GLADIATOR: a global approach for elucidating disease modules.

PubMed

Silberberg, Yael; Kupiec, Martin; Sharan, Roded

2017-05-26

Understanding the genetic basis of disease is an important challenge in biology and medicine. The observation that disease-related proteins often interact with one another has motivated numerous network-based approaches for deciphering disease mechanisms. In particular, protein-protein interaction networks were successfully used to illuminate disease modules, i.e., interacting proteins working in concert to drive a disease. The identification of these modules can further our understanding of disease mechanisms. We devised a global method for the prediction of multiple disease modules simultaneously named GLADIATOR (GLobal Approach for DIsease AssociaTed mOdule Reconstruction). GLADIATOR relies on a gold-standard disease phenotypic similarity to obtain a pan-disease view of the underlying modules. To traverse the search space of potential disease modules, we applied a simulated annealing algorithm aimed at maximizing the correlation between module similarity and the gold-standard phenotypic similarity. Importantly, this optimization is employed over hundreds of diseases simultaneously. GLADIATOR's predicted modules highly agree with current knowledge about disease-related proteins. Furthermore, the modules exhibit high coherence with respect to functional annotations and are highly enriched with known curated pathways, outperforming previous methods. Examination of the predicted proteins shared by similar diseases demonstrates the diverse role of these proteins in mediating related processes across similar diseases. Last, we provide a detailed analysis of the suggested molecular mechanism predicted by GLADIATOR for hyperinsulinism, suggesting novel proteins involved in its pathology. GLADIATOR predicts disease modules by integrating knowledge of disease-related proteins and phenotypes across multiple diseases. The predicted modules are functionally coherent and are more in line with current biological knowledge compared to modules obtained using previous disease-centric methods. The source code for GLADIATOR can be downloaded from http://www.cs.tau.ac.il/~roded/GLADIATOR.zip .
Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction.

PubMed

Boareto, Marcelo; Yamagishi, Michel E B; Caticha, Nestor; Leite, Vitor B P

2012-10-01

In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. Copyright © 2012 Elsevier Ltd. All rights reserved.
Local functional descriptors for surface comparison based binding prediction

PubMed Central

2012-01-01

Background Molecular recognition in proteins occurs due to appropriate arrangements of physical, chemical, and geometric properties of an atomic surface. Similar surface regions should create similar binding interfaces. Effective methods for comparing surface regions can be used in identifying similar regions, and to predict interactions without regard to the underlying structural scaffold that creates the surface. Results We present a new descriptor for protein functional surfaces and algorithms for using these descriptors to compare protein surface regions to identify ligand binding interfaces. Our approach uses descriptors of local regions of the surface, and assembles collections of matches to compare larger regions. Our approach uses a variety of physical, chemical, and geometric properties, adaptively weighting these properties as appropriate for different regions of the interface. Our approach builds a classifier based on a training corpus of examples of binding sites of the target ligand. The constructed classifiers can be applied to a query protein providing a probability for each position on the protein that the position is part of a binding interface. We demonstrate the effectiveness of the approach on a number of benchmarks, demonstrating performance that is comparable to the state-of-the-art, with an approach with more generality than these prior methods. Conclusions Local functional descriptors offer a new method for protein surface comparison that is sufficiently flexible to serve in a variety of applications. PMID:23176080
A Complex Prime Numerical Representation of Amino Acids for Protein Function Comparison.

PubMed

Chen, Duo; Wang, Jiasong; Yan, Ming; Bao, Forrest Sheng

2016-08-01

Computationally assessing the functional similarity between proteins is an important task of bioinformatics research. It can help molecular biologists transfer knowledge on certain proteins to others and hence reduce the amount of tedious and costly benchwork. Representation of amino acids, the building blocks of proteins, plays an important role in achieving this goal. Compared with symbolic representation, representing amino acids numerically can expand our ability to analyze proteins, including comparing the functional similarity of them. Among the state-of-the-art methods, electro-ion interaction pseudopotential (EIIP) is widely adopted for the numerical representation of amino acids. However, it could suffer from degeneracy that two different amino acid sequences have the same numerical representation, due to the design of EIIP. In light of this challenge, we propose a complex prime numerical representation (CPNR) of amino acids, inspired by the similarity between a pattern among prime numbers and the number of codons of amino acids. To empirically assess the effectiveness of the proposed method, we compare CPNR against EIIP. Experimental results demonstrate that the proposed method CPNR always achieves better performance than EIIP. We also develop a framework to combine the advantages of CPNR and EIIP, which enables us to improve the performance and study the unique characteristics of different representations.
Fast protein tertiary structure retrieval based on global surface shape similarity.

PubMed

Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke

2008-09-01

Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.
Analysis of Nuclear Lamina Proteins in Myoblast Differentiation by Functional Complementation.

PubMed

Tapia, Olga; Gerace, Larry

2016-01-01

We describe straightforward methodology for structure-function mapping of nuclear lamina proteins in myoblast differentiation, using populations of C2C12 myoblasts in which the endogenous lamina components are replaced with ectopically expressed mutant versions of the proteins. The procedure involves bulk isolation of C2C12 cell populations expressing the ectopic proteins by lentiviral transduction, followed by depletion of the endogenous proteins using siRNA, and incubation of cells under myoblast differentiation conditions. Similar methodology may be applied to mouse embryo fibroblasts or to other cell types as well, for the identification and characterization of sequences of lamina proteins involved in functions that can be measured biochemically or cytologically.
An integrative approach to inferring biologically meaningful gene modules

PubMed Central

2011-01-01

Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051
Association of papillomavirus E6 proteins with either MAML1 or E6AP clusters E6 proteins by structure, function, and evolutionary relatedness

PubMed Central

Brimer, Nicole

2017-01-01

Papillomavirus E6 proteins bind to LXXLL peptide motifs displayed on targeted cellular proteins. Alpha genus HPV E6 proteins associate with the cellular ubiquitin ligase E6AP (UBE3A), by binding to an LXXLL peptide (ELTLQELLGEE) displayed by E6AP, thereby stimulating E6AP ubiquitin ligase activity. Beta, Gamma, and Delta genera E6 proteins bind a similar LXXLL peptide (WMSDLDDLLGS) on the cellular transcriptional co-activator MAML1 and thereby repress Notch signaling. We expressed 45 different animal and human E6 proteins from diverse papillomavirus genera to ascertain the overall preference of E6 proteins for E6AP or MAML1. E6 proteins from all HPV genera except Alpha preferentially interacted with MAML1 over E6AP. Among animal papillomaviruses, E6 proteins from certain ungulate (SsPV1 from pigs) and cetacean (porpoises and dolphins) hosts functionally resembled Alpha genus HPV by binding and targeting the degradation of E6AP. Beta genus HPV E6 proteins functionally clustered with Delta, Pi, Tau, Gamma, Chi, Mu, Lambda, Iota, Dyokappa, Rho, and Dyolambda E6 proteins to bind and repress MAML1. None of the tested E6 proteins physically and functionally interacted with both MAML1 and E6AP, indicating an evolutionary split. Further, interaction of an E6 protein was insufficient to activate degradation of E6AP, indicating that E6 proteins that target E6AP co-evolved to separately acquire both binding and triggering of ubiquitin ligase activation. E6 proteins with similar biological function clustered together in phylogenetic trees and shared structural features. This suggests that the divergence of E6 proteins from either MAML1 or E6AP binding preference is a major event in papillomavirus evolution. PMID:29281732

Studies of the structure-activity relationships of peptides and proteins involved in growth and development based on their three-dimensional structures.

PubMed

Nagata, Koji

2010-01-01

Peptides and proteins with similar amino acid sequences can have different biological functions. Knowledge of their three-dimensional molecular structures is critically important in identifying their functional determinants. In this review, I describe the results of our and other groups' structure-based functional characterization of insect insulin-like peptides, a crustacean hyperglycemic hormone-family peptide, a mammalian epidermal growth factor-family protein, and an intracellular signaling domain that recognizes proline-rich sequence.
Mechanism of Nucleic Acid Chaperone Function of Retroviral Nuceleocapsid (NC) Proteins

NASA Astrophysics Data System (ADS)

Rouzina, Ioulia; Vo, My-Nuong; Stewart, Kristen; Musier-Forsyth, Karin; Cruceanu, Margareta; Williams, Mark

2006-03-01

Recent studies have highlighted two main activities of HIV-1 NC protein contributing to its function as a universal nucleic acid chaperone. Firstly, it is the ability of NC to weakly destabilize all nucleic acid,(NA), secondary structures, thus resolving the kinetic traps for NA refolding, while leaving the annealed state stable. Secondly, it is the ability of NC to aggregate NA, facilitating the nucleation step of bi-molecular annealing by increasing the local NA concentration. In this work we use single molecule DNA stretching and gel-based annealing assays to characterize these two chaperone activities of NC by using various HIV-1 NC mutants and several other retroviral NC proteins. Our results suggest that two NC functions are associated with its zinc fingers and cationic residues, respectively. NC proteins from other retroviruses have similar activities, although expressed to a different degree. Thus, NA aggregating ability improves, and NA duplex destabilizing activity decreases in the sequence: MLV NC, HIV NC, RSV NC. In contrast, HTLV NC protein works very differently from other NC proteins, and similarly to typical single stranded NA binding proteins. These features of retroviral NCs co-evolved with the structure of their genomes.
Protein-protein interface detection using the energy centrality relationship (ECR) characteristic of proteins.

PubMed

Sudarshan, Sanjana; Kodathala, Sasi B; Mahadik, Amruta C; Mehta, Isha; Beck, Brian W

2014-01-01

Specific protein interactions are responsible for most biological functions. Distinguishing Functionally Linked Interfaces of Proteins (FLIPs), from Functionally uncorrelated Contacts (FunCs), is therefore important to characterizing these interactions. To achieve this goal, we have created a database of protein structures called FLIPdb, containing proteins belonging to various functional sub-categories. Here, we use geometric features coupled with Kortemme and Baker's computational alanine scanning method to calculate the energetic sensitivity of each amino acid at the interface to substitution, identify hotspots, and identify other factors that may contribute towards an interface being FLIP or FunC. Using Principal Component Analysis and K-means clustering on a training set of 160 interfaces, we could distinguish FLIPs from FunCs with an accuracy of 76%. When these methods were applied to two test sets of 18 and 170 interfaces, we achieved similar accuracies of 78% and 80%. We have identified that FLIP interfaces have a stronger central organizing tendency than FunCs, due, we suggest, to greater specificity. We also observe that certain functional sub-categories, such as enzymes, antibody-heavy-light, antibody-antigen, and enzyme-inhibitors form distinct sub-clusters. The antibody-antigen and enzyme-inhibitors interfaces have patterns of physical characteristics similar to those of FunCs, which is in agreement with the fact that the selection pressures of these interfaces is differently evolutionarily driven. As such, our ECR model also successfully describes the impact of evolution and natural selection on protein-protein interfaces. Finally, we indicate how our ECR method may be of use in reducing the false positive rate of docking calculations.
Protein-Protein Interface Detection Using the Energy Centrality Relationship (ECR) Characteristic of Proteins

PubMed Central

Sudarshan, Sanjana; Kodathala, Sasi B.; Mahadik, Amruta C.; Mehta, Isha; Beck, Brian W.

2014-01-01

Specific protein interactions are responsible for most biological functions. Distinguishing Functionally Linked Interfaces of Proteins (FLIPs), from Functionally uncorrelated Contacts (FunCs), is therefore important to characterizing these interactions. To achieve this goal, we have created a database of protein structures called FLIPdb, containing proteins belonging to various functional sub-categories. Here, we use geometric features coupled with Kortemme and Baker's computational alanine scanning method to calculate the energetic sensitivity of each amino acid at the interface to substitution, identify hotspots, and identify other factors that may contribute towards an interface being FLIP or FunC. Using Principal Component Analysis and K-means clustering on a training set of 160 interfaces, we could distinguish FLIPs from FunCs with an accuracy of 76%. When these methods were applied to two test sets of 18 and 170 interfaces, we achieved similar accuracies of 78% and 80%. We have identified that FLIP interfaces have a stronger central organizing tendency than FunCs, due, we suggest, to greater specificity. We also observe that certain functional sub-categories, such as enzymes, antibody-heavy-light, antibody-antigen, and enzyme-inhibitors form distinct sub-clusters. The antibody-antigen and enzyme-inhibitors interfaces have patterns of physical characteristics similar to those of FunCs, which is in agreement with the fact that the selection pressures of these interfaces is differently evolutionarily driven. As such, our ECR model also successfully describes the impact of evolution and natural selection on protein-protein interfaces. Finally, we indicate how our ECR method may be of use in reducing the false positive rate of docking calculations. PMID:24830938
Identification of two p23 co-chaperone isoforms in Leishmania braziliensis exhibiting similar structures and Hsp90 interaction properties despite divergent stabilities.

PubMed

Batista, Fernanda A H; Almeida, Glessler S; Seraphim, Thiago V; Silva, Kelly P; Murta, Silvane M F; Barbosa, Leandro R S; Borges, Júlio C

2015-01-01

The small acidic protein called p23 acts as a co-chaperone for heat-shock protein of 90 kDa (Hsp90) during its ATPase cycle. p23 proteins inhibit Hsp90 ATPase activity and show intrinsic chaperone activity. A search for p23 in protozoa, especially trypanosomatids, led us to identify two putative proteins in the Leishmania braziliensis genome that share approximately 30% identity with each other and with the human p23. To understand the presence of two p23 isoforms in trypanosomatids, we obtained the recombinant p23 proteins of L. braziliensis (named Lbp23A and Lbp23B) and performed structural and functional studies. The recombinant proteins share similar solution structures; however, temperature- and chemical-induced unfolding experiments showed that Lbp23A is more stable than Lbp23B, suggesting that they may have different functions. Lbp23B prevented the temperature-induced aggregation of malic dehydrogenase more efficiently than did Lbp23A, whereas the two proteins had equivalent efficiencies with respect to preventing the temperature-induced aggregation of luciferase. Both proteins interacted with L. braziliensis Hsp90 (LbHsp90) and inhibited its ATPase activity, although their efficiencies differed. In vivo identification studies suggested that both proteins are present in L. braziliensis cells grown under different conditions, although Lbp23B may undergo post-translation modifications. Interaction studies indicated that both Lbp23 proteins interact with LbHsp90. Taken together, our data suggest that the two protozoa p23 isoforms act similarly when regulating Hsp90 function. However, they also have some differences, indicating that the L. braziliensis Hsp90 machine has features providing an opportunity for novel forms of selective inhibition of protozoan Hsp90. © 2014 FEBS.
Computational modeling of Repeat1 region of INI1/hSNF5: An evolutionary link with ubiquitin.

PubMed

Bhutoria, Savita; Kalpana, Ganjam V; Acharya, Seetharama A

2016-09-01

The structure of a protein can be very informative of its function. However, determining protein structures experimentally can often be very challenging. Computational methods have been used successfully in modeling structures with sufficient accuracy. Here we have used computational tools to predict the structure of an evolutionarily conserved and functionally significant domain of Integrase interactor (INI)1/hSNF5 protein. INI1 is a component of the chromatin remodeling SWI/SNF complex, a tumor suppressor and is involved in many protein-protein interactions. It belongs to SNF5 family of proteins that contain two conserved repeat (Rpt) domains. Rpt1 domain of INI1 binds to HIV-1 Integrase, and acts as a dominant negative mutant to inhibit viral replication. Rpt1 domain also interacts with oncogene c-MYC and modulates its transcriptional activity. We carried out an ab initio modeling of a segment of INI1 protein containing the Rpt1 domain. The structural model suggested the presence of a compact and well defined ββαα topology as core structure in the Rpt1 domain of INI1. This topology in Rpt1 was similar to PFU domain of Phospholipase A2 Activating Protein, PLAA. Interestingly, PFU domain shares similarity with Ubiquitin and has ubiquitin binding activity. Because of the structural similarity between Rpt1 domain of INI1 and PFU domain of PLAA, we propose that Rpt1 domain of INI1 may participate in ubiquitin recognition or binding with ubiquitin or ubiquitin related proteins. This modeling study may shed light on the mode of interactions of Rpt1 domain of INI1 and is likely to facilitate future functional studies of INI1. © 2016 The Protein Society.
MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks

USDA-ARS?s Scientific Manuscript database

Motivation: Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary h...
The Sla2p/HIP1/HIP1R family: similar structure, similar function in endocytosis?

PubMed

Gottfried, Irit; Ehrlich, Marcelo; Ashery, Uri

2010-02-01

HIP1 (huntingtin interacting protein 1) has two close relatives: HIP1R (HIP1-related) and yeast Sla2p. All three members of the family have a conserved domain structure, suggesting a common function. Over the past decade, a number of studies have characterized these proteins using a combination of biochemical, imaging, structural and genetic techniques. These studies provide valuable information on binding partners, structure and dynamics of HIP1/HIP1R/Sla2p. In general, all suggest a role in CME (clathrin-mediated endocytosis) for the three proteins, though some differences have emerged. In this mini-review we summarize the current views on the roles of these proteins, while emphasizing the unique attributes of each family member.
Sirius PSB: a generic system for analysis of biological sequences.

PubMed

Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

2009-12-01

Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.
MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.

PubMed

Keel, Brittney N; Deng, Bo; Moriyama, Etsuko N

2018-04-15

Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. emoriyama2@unl.edu. Supplementary data are available at Bioinformatics online.
A structural-alphabet-based strategy for finding structural motifs across protein families

PubMed Central

Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay

2010-01-01

Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797
Efficient conformational space exploration in ab initio protein folding simulation.

PubMed

Ullah, Ahammed; Ahmed, Nasif; Pappu, Subrata Dey; Shatabda, Swakkhar; Ullah, A Z M Dayem; Rahman, M Sohel

2015-08-01

Ab initio protein folding simulation largely depends on knowledge-based energy functions that are derived from known protein structures using statistical methods. These knowledge-based energy functions provide us with a good approximation of real protein energetics. However, these energy functions are not very informative for search algorithms and fail to distinguish the types of amino acid interactions that contribute largely to the energy function from those that do not. As a result, search algorithms frequently get trapped into the local minima. On the other hand, the hydrophobic-polar (HP) model considers hydrophobic interactions only. The simplified nature of HP energy function makes it limited only to a low-resolution model. In this paper, we present a strategy to derive a non-uniform scaled version of the real 20×20 pairwise energy function. The non-uniform scaling helps tackle the difficulty faced by a real energy function, whereas the integration of 20×20 pairwise information overcomes the limitations faced by the HP energy function. Here, we have applied a derived energy function with a genetic algorithm on discrete lattices. On a standard set of benchmark protein sequences, our approach significantly outperforms the state-of-the-art methods for similar models. Our approach has been able to explore regions of the conformational space which all the previous methods have failed to explore. Effectiveness of the derived energy function is presented by showing qualitative differences and similarities of the sampled structures to the native structures. Number of objective function evaluation in a single run of the algorithm is used as a comparison metric to demonstrate efficiency.
Crystal Structure Analysis and the Identification of Distinctive Functional Regions of the Protein Elicitor Mohrip2.

PubMed

Liu, Mengjie; Duan, Liangwei; Wang, Meifang; Zeng, Hongmei; Liu, Xinqi; Qiu, Dewen

2016-01-01

The protein elicitor MoHrip2, which was extracted from Magnaporthe oryzae as an exocrine protein, triggers the tobacco immune system and enhances blast resistance in rice. However, the detailed mechanisms by which MoHrip2 acts as an elicitor remain unclear. Here, we investigated the structure of MoHrip2 to elucidate its functions based on molecular structure. The three-dimensional structure of MoHrip2 was obtained. Overall, the crystal structure formed a β-barrel structure and showed high similarity to the pathogenesis-related (PR) thaumatin superfamily protein thaumatin-like xylanase inhibitor (TL-XI). To investigate the functional regions responsible for MoHrip2 elicitor activities, the full length and eight truncated proteins were expressed in Escherichia coli and were evaluated for elicitor activity in tobacco. Biological function analysis showed that MoHrip2 triggered the defense system against Botrytis cinerea in tobacco. Moreover, only MoHrip2M14 and other fragments containing the 14 amino acids residues in the middle region of the protein showed the elicitor activity of inducing a hypersensitive response and resistance related pathways, which were similar to that of full-length MoHrip2. These results revealed that the central 14 amino acid residues were essential for anti-pathogenic activity.
The limits of protein sequence comparison?

PubMed Central

Pearson, William R; Sierk, Michael L

2010-01-01

Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194
Local-global alignment for finding 3D similarities in protein structures

DOEpatents

Zemla, Adam T [Brentwood, CA

2011-09-20

A method of finding 3D similarities in protein structures of a first molecule and a second molecule. The method comprises providing preselected information regarding the first molecule and the second molecule. Comparing the first molecule and the second molecule using Longest Continuous Segments (LCS) analysis. Comparing the first molecule and the second molecule using Global Distance Test (GDT) analysis. Comparing the first molecule and the second molecule using Local Global Alignment Scoring function (LGA_S) analysis. Verifying constructed alignment and repeating the steps to find the regions of 3D similarities in protein structures.
Query3d: a new method for high-throughput analysis of functional residues in protein structures.

PubMed

Ausiello, Gabriele; Via, Allegra; Helmer-Citterich, Manuela

2005-12-01

The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface.
Query3d: a new method for high-throughput analysis of functional residues in protein structures

PubMed Central

Ausiello, Gabriele; Via, Allegra; Helmer-Citterich, Manuela

2005-01-01

Background The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Results Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. Conclusion With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface. PMID:16351754
Functional inclusion bodies produced in the yeast Pichia pastoris.

PubMed

Rueda, Fabián; Gasser, Brigitte; Sánchez-Chardi, Alejandro; Roldán, Mònica; Villegas, Sandra; Puxbaum, Verena; Ferrer-Miralles, Neus; Unzueta, Ugutz; Vázquez, Esther; Garcia-Fruitós, Elena; Mattanovich, Diethard; Villaverde, Antonio

2016-10-01

Bacterial inclusion bodies (IBs) are non-toxic protein aggregates commonly produced in recombinant bacteria. They are formed by a mixture of highly stable amyloid-like fibrils and releasable protein species with a significant extent of secondary structure, and are often functional. As nano structured materials, they are gaining biomedical interest because of the combination of submicron size, mechanical stability and biological activity, together with their ability to interact with mammalian cell membranes for subsequent cell penetration in absence of toxicity. Since essentially any protein species can be obtained as IBs, these entities, as well as related protein clusters (e.g., aggresomes), are being explored in biocatalysis and in biomedicine as mechanically stable sources of functional protein. One of the major bottlenecks for uses of IBs in biological interfaces is their potential contamination with endotoxins from producing bacteria. To overcome this hurdle, we have explored here the controlled production of functional IBs in the yeast Pichia pastoris (Komagataella spp.), an endotoxin-free host system for recombinant protein production, and determined the main physicochemical and biological traits of these materials. Quantitative and qualitative approaches clearly indicate the formation of IBs inside yeast, similar in morphology, size and biological activity to those produced in E. coli, that once purified, interact with mammalian cell membranes and penetrate cultured mammalian cells in absence of toxicity. Structurally and functionally similar from those produced in E. coli, the controlled production of IBs in P. pastoris demonstrates that yeasts can be used as convenient platforms for the biological fabrication of self-organizing protein materials in absence of potential endotoxin contamination and with additional advantages regarding, among others, post-translational modifications often required for protein functionality.
The complete genome sequence and genetic analysis of ΦCA82 a novel uncultured microphage from the turkey gastrointestinal system

PubMed Central

2011-01-01

The genomic DNA sequence of a novel enteric uncultured microphage, ΦCA82 from a turkey gastrointestinal system was determined utilizing metagenomics techniques. The entire circular, single-stranded nucleotide sequence of the genome was 5,514 nucleotides. The ΦCA82 genome is quite different from other microviruses as indicated by comparisons of nucleotide similarity, predicted protein similarity, and functional classifications. Only three genes showed significant similarity to microviral proteins as determined by local alignments using BLAST analysis. ORF1 encoded a predicted phage F capsid protein that was phylogenetically most similar to the Microviridae ΦMH2K member's major coat protein. The ΦCA82 genome also encoded a predicted minor capsid protein (ORF2) and putative replication initiation protein (ORF3) most similar to the microviral bacteriophage SpV4. The distant evolutionary relationship of ΦCA82 suggests that the divergence of this novel turkey microvirus from other microviruses may reflect unique evolutionary pressures encountered within the turkey gastrointestinal system. PMID:21714899
RAPSearch: a fast protein similarity search tool for short reads

PubMed Central

2011-01-01

Background Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated. PMID:21575167

Phytomonas: A non-pathogenic trypanosomatid model for functional expression of proteins.

PubMed

Miranda, Mariana R; Sayé, Melisa; Reigada, Chantal; Carrillo, Carolina; Pereira, Claudio A

2015-10-01

Phytomonas are protozoan parasites from the Trypanosomatidae family which infect a wide variety of plants. Herein, Phytomonas Jma was tested as a model for functional expression of heterologous proteins. Green fluorescent protein expression was evaluated in Phytomonas and compared with Trypanosoma cruzi, the etiological agent of Chagas' disease. Phytomonas was able to express GFP at levels similar to T. cruzi although the transgenic selection time was higher. It was possible to establish an efficient transfection and selection protocol for protein expression. These results demonstrate that Phytomonas can be a good model for functional expression of proteins from other trypanosomatids, presenting the advantage of being completely safe for humans. Copyright © 2015 Elsevier Inc. All rights reserved.
Structural Determination of Functional Domains in Early B-cell Factor (EBF) Family of Transcription Factors Reveals Similarities to Rel DNA-binding Proteins and a Novel Dimerization Motif*

PubMed Central

Siponen, Marina I.; Wisniewska, Magdalena; Lehtiö, Lari; Johansson, Ida; Svensson, Linda; Raszewski, Grzegorz; Nilsson, Lennart; Sigvardsson, Mikael; Berglund, Helena

2010-01-01

The early B-cell factor (EBF) transcription factors are central regulators of development in several organs and tissues. This protein family shows low sequence similarity to other protein families, which is why structural information for the functional domains of these proteins is crucial to understand their biochemical features. We have used a modular approach to determine the crystal structures of the structured domains in the EBF family. The DNA binding domain reveals a striking resemblance to the DNA binding domains of the Rel homology superfamily of transcription factors but contains a unique zinc binding structure, termed zinc knuckle. Further the EBF proteins contain an IPT/TIG domain and an atypical helix-loop-helix domain with a novel type of dimerization motif. The data presented here provide insights into unique structural features of the EBF proteins and open possibilities for detailed molecular investigations of this important transcription factor family. PMID:20592035
Inferring the Functions of Proteins from the Interrelationships between Functional Categories.

PubMed

Taha, Kamal

2018-01-01

This study proposes a new method to determine the functions of an unannotated protein. The proteins and amino acid residues mentioned in biomedical texts associated with an unannotated protein can be considered as characteristics terms for , which are highly predictive of the potential functions of . Similarly, proteins and amino acid residues mentioned in biomedical texts associated with proteins annotated with a functional category can be considered as characteristics terms of . We introduce in this paper an information extraction system called IFP_IFC that predicts the functions of an unannotated protein by representing and each functional category by a vector of weights. Each weight reflects the degree of association between a characteristic term and (or a characteristic term and ). First, IFP_IFC constructs a network, whose nodes represent the different functional categories, and its edges the interrelationships between the nodes. Then, it determines the functions of by employing random walks with restarts on the mentioned network. The walker is the vector of . Finally, is assigned to the functional categories of the nodes in the network that are visited most by the walker. We evaluated the quality of IFP_IFC by comparing it experimentally with two other systems. Results showed marked improvement.
Deterministic protein inference for shotgun proteomics data provides new insights into Arabidopsis pollen development and function

PubMed Central

Grobei, Monica A.; Qeli, Ermir; Brunner, Erich; Rehrauer, Hubert; Zhang, Runxuan; Roschitzki, Bernd; Basler, Konrad; Ahrens, Christian H.; Grossniklaus, Ueli

2009-01-01

Pollen, the male gametophyte of flowering plants, represents an ideal biological system to study developmental processes, such as cell polarity, tip growth, and morphogenesis. Upon hydration, the metabolically quiescent pollen rapidly switches to an active state, exhibiting extremely fast growth. This rapid switch requires relevant proteins to be stored in the mature pollen, where they have to retain functionality in a desiccated environment. Using a shotgun proteomics approach, we unambiguously identified ∼3500 proteins in Arabidopsis pollen, including 537 proteins that were not identified in genetic or transcriptomic studies. To generate this comprehensive reference data set, which extends the previously reported pollen proteome by a factor of 13, we developed a novel deterministic peptide classification scheme for protein inference. This generally applicable approach considers the gene model–protein sequence–protein accession relationships. It allowed us to classify and eliminate ambiguities inherently associated with any shotgun proteomics data set, to report a conservative list of protein identifications, and to seamlessly integrate data from previous transcriptomics studies. Manual validation of proteins unambiguously identified by a single, information-rich peptide enabled us to significantly reduce the false discovery rate, while keeping valuable identifications of shorter and lower abundant proteins. Bioinformatic analyses revealed a higher stability of pollen proteins compared to those of other tissues and implied a protein family of previously unknown function in vesicle trafficking. Interestingly, the pollen proteome is most similar to that of seeds, indicating physiological similarities between these developmentally distinct tissues. PMID:19546170
Biophysical models of protein evolution: Understanding the patterns of evolutionary sequence divergence

PubMed Central

Echave, Julian; Wilke, Claus O.

2018-01-01

For decades, rates of protein evolution have been interpreted in terms of the vague concept of “functional importance”. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating them has large impacts on protein structure and stability. Here, we review the studies of the emergent field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field. PMID:28301766
Rebelling for a Reason: Protein Structural “Outliers”

PubMed Central

Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini

2013-01-01

Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

PubMed

Dong, Zheng; Zhou, Hongyu; Tao, Peng

2018-02-01

PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.
Bacterial Ice Crystal Controlling Proteins

PubMed Central

Lorv, Janet S. H.; Rose, David R.; Glick, Bernard R.

2014-01-01

Across the world, many ice active bacteria utilize ice crystal controlling proteins for aid in freezing tolerance at subzero temperatures. Ice crystal controlling proteins include both antifreeze and ice nucleation proteins. Antifreeze proteins minimize freezing damage by inhibiting growth of large ice crystals, while ice nucleation proteins induce formation of embryonic ice crystals. Although both protein classes have differing functions, these proteins use the same ice binding mechanisms. Rather than direct binding, it is probable that these protein classes create an ice surface prior to ice crystal surface adsorption. Function is differentiated by molecular size of the protein. This paper reviews the similar and different aspects of bacterial antifreeze and ice nucleation proteins, the role of these proteins in freezing tolerance, prevalence of these proteins in psychrophiles, and current mechanisms of protein-ice interactions. PMID:24579057
Bacterial ice crystal controlling proteins.

PubMed

Lorv, Janet S H; Rose, David R; Glick, Bernard R

2014-01-01

Across the world, many ice active bacteria utilize ice crystal controlling proteins for aid in freezing tolerance at subzero temperatures. Ice crystal controlling proteins include both antifreeze and ice nucleation proteins. Antifreeze proteins minimize freezing damage by inhibiting growth of large ice crystals, while ice nucleation proteins induce formation of embryonic ice crystals. Although both protein classes have differing functions, these proteins use the same ice binding mechanisms. Rather than direct binding, it is probable that these protein classes create an ice surface prior to ice crystal surface adsorption. Function is differentiated by molecular size of the protein. This paper reviews the similar and different aspects of bacterial antifreeze and ice nucleation proteins, the role of these proteins in freezing tolerance, prevalence of these proteins in psychrophiles, and current mechanisms of protein-ice interactions.
A Score of the Ability of a Three-Dimensional Protein Model to Retrieve Its Own Sequence as a Quantitative Measure of Its Quality and Appropriateness

PubMed Central

Martínez-Castilla, León P.; Rodríguez-Sotres, Rogelio

2010-01-01

Background Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. Principal Findings The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449–460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. Conclusion Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone. PMID:20830209
Functional assay for T4 lysozyme-engineered G protein-coupled receptors with an ion channel reporter.

PubMed

Niescierowicz, Katarzyna; Caro, Lydia; Cherezov, Vadim; Vivaudou, Michel; Moreau, Christophe J

2014-01-07

Structural studies of G protein-coupled receptors (GPCRs) extensively use the insertion of globular soluble protein domains to facilitate their crystallization. However, when inserted in the third intracellular loop (i3 loop), the soluble protein domain disrupts their coupling to G proteins and impedes the GPCRs functional characterization by standard G protein-based assays. Therefore, activity tests of crystallization-optimized GPCRs are essentially limited to their ligand binding properties using radioligand binding assays. Functional characterization of additional thermostabilizing mutations requires the insertion of similar mutations in the wild-type receptor to allow G protein-activation tests. We demonstrate that ion channel-coupled receptor technology is a complementary approach for a comprehensive functional characterization of crystallization-optimized GPCRs and potentially of any engineered GPCR. Ligand-induced conformational changes of the GPCRs are translated into electrical signal and detected by simple current recordings, even though binding of G proteins is sterically blocked by the added soluble protein domain. Copyright © 2014 Elsevier Ltd. All rights reserved.
Using Variable-Length Aligned Fragment Pairs and an Improved Transition Function for Flexible Protein Structure Alignment.

PubMed

Cao, Hu; Lu, Yonggang

2017-01-01

With the rapid growth of known protein 3D structures in number, how to efficiently compare protein structures becomes an essential and challenging problem in computational structural biology. At present, many protein structure alignment methods have been developed. Among all these methods, flexible structure alignment methods are shown to be superior to rigid structure alignment methods in identifying structure similarities between proteins, which have gone through conformational changes. It is also found that the methods based on aligned fragment pairs (AFPs) have a special advantage over other approaches in balancing global structure similarities and local structure similarities. Accordingly, we propose a new flexible protein structure alignment method based on variable-length AFPs. Compared with other methods, the proposed method possesses three main advantages. First, it is based on variable-length AFPs. The length of each AFP is separately determined to maximally represent a local similar structure fragment, which reduces the number of AFPs. Second, it uses local coordinate systems, which simplify the computation at each step of the expansion of AFPs during the AFP identification. Third, it decreases the number of twists by rewarding the situation where nonconsecutive AFPs share the same transformation in the alignment, which is realized by dynamic programming with an improved transition function. The experimental data show that compared with FlexProt, FATCAT, and FlexSnap, the proposed method can achieve comparable results by introducing fewer twists. Meanwhile, it can generate results similar to those of the FATCAT method in much less running time due to the reduced number of AFPs.
Essential role of the HMG domain in the function of yeast mitochondrial histone HM: functional complementation of HM by the nuclear nonhistone protein NHP6A.

PubMed

Kao, L R; Megraw, T L; Chae, C B

1993-06-15

The yeast mitochondrial histone protein HM is required for maintenance of the mitochondrial genome, and disruption of the gene encoding HM (HIM1/ABF2) results in formation of a respiration-deficient petite mutant phenotype. HM contains two homologous regions, which share sequence similarity with the eukaryotic nuclear nonhistone protein, HMG-1. Experiments with various deletion mutants of HM show that a single HMG domain of HM is functional and can restore respiration competency to cells that lack HM protein (him1 mutant cells). The gene encoding the putative yeast nuclear HMG-1 homolog, the NHP6A protein, can functionally complement the him1 mutation. These results suggest that the HMG domain is the basic unit for the function of HM in mitochondria and that the function of HMG-1 proteins in the nucleus and HM in the mitochondrion may be equivalent.
Toxicological evaluation of proteins introduced into food crops

PubMed Central

Kough, John; Herouet-Guicheney, Corinne; Jez, Joseph M.

2013-01-01

This manuscript focuses on the toxicological evaluation of proteins introduced into GM crops to impart desired traits. In many cases, introduced proteins can be shown to have a history of safe use. Where modifications have been made to proteins, experience has shown that it is highly unlikely that modification of amino acid sequences can make a non-toxic protein toxic. Moreover, if the modified protein still retains its biological function, and this function is found in related proteins that have a history of safe use (HOSU) in food, and the exposure level is similar to functionally related proteins, then the modified protein could also be considered to be “as-safe-as” those that have a HOSU. Within nature, there can be considerable evolutionary changes in the amino acid sequence of proteins within the same family, yet these proteins share the same biological function. In general, food crops such as maize, soy, rice, canola etc. are subjected to a variety of processing conditions to generate different food products. Processing conditions such as cooking, modification of pH conditions, and mechanical shearing can often denature proteins in these crops resulting in a loss of functional activity. These same processing conditions can also markedly lower human dietary exposure to (functionally active) proteins. Safety testing of an introduced protein could be indicated if its biological function was not adequately characterized and/or it was shown to be structurally/functionally related to proteins that are known to be toxic to mammals. PMID:24164515
Complexity of Gene Expression Evolution after Duplication: Protein Dosage Rebalancing

PubMed Central

Rogozin, Igor B.

2014-01-01

Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes. PMID:25197576
Computational modeling of Repeat1 region of INI1/hSNF5: An evolutionary link with ubiquitin

PubMed Central

Bhutoria, Savita

2016-01-01

Abstract The structure of a protein can be very informative of its function. However, determining protein structures experimentally can often be very challenging. Computational methods have been used successfully in modeling structures with sufficient accuracy. Here we have used computational tools to predict the structure of an evolutionarily conserved and functionally significant domain of Integrase interactor (INI)1/hSNF5 protein. INI1 is a component of the chromatin remodeling SWI/SNF complex, a tumor suppressor and is involved in many protein‐protein interactions. It belongs to SNF5 family of proteins that contain two conserved repeat (Rpt) domains. Rpt1 domain of INI1 binds to HIV‐1 Integrase, and acts as a dominant negative mutant to inhibit viral replication. Rpt1 domain also interacts with oncogene c‐MYC and modulates its transcriptional activity. We carried out an ab initio modeling of a segment of INI1 protein containing the Rpt1 domain. The structural model suggested the presence of a compact and well defined ββαα topology as core structure in the Rpt1 domain of INI1. This topology in Rpt1 was similar to PFU domain of Phospholipase A2 Activating Protein, PLAA. Interestingly, PFU domain shares similarity with Ubiquitin and has ubiquitin binding activity. Because of the structural similarity between Rpt1 domain of INI1 and PFU domain of PLAA, we propose that Rpt1 domain of INI1 may participate in ubiquitin recognition or binding with ubiquitin or ubiquitin related proteins. This modeling study may shed light on the mode of interactions of Rpt1 domain of INI1 and is likely to facilitate future functional studies of INI1. PMID:27261671
Residues with similar hexagon neighborhoods share similar side-chain conformations.

PubMed

Li, Shuai Cheng; Bu, Dongbo; Li, Ming

2012-01-01

We present in this study a new approach to code protein side-chain conformations into hexagon substructures. Classical side-chain packing methods consist of two steps: first, side-chain conformations, known as rotamers, are extracted from known protein structures as candidates for each residue; second, a searching method along with an energy function is used to resolve conflicts among residues and to optimize the combinations of side chain conformations for all residues. These methods benefit from the fact that the number of possible side-chain conformations is limited, and the rotamer candidates are readily extracted; however, these methods also suffer from the inaccuracy of energy functions. Inspired by threading and Ab Initio approaches to protein structure prediction, we propose to use hexagon substructures to implicitly capture subtle issues of energy functions. Our initial results indicate that even without guidance from an energy function, hexagon structures alone can capture side-chain conformations at an accuracy of 83.8 percent, higher than 82.6 percent by the state-of-art side-chain packing methods.
Sequence/structural analysis of xylem proteome emphasizes pathogenesis-related proteins, chitinases and β-1, 3-glucanases as key players in grapevine defense against Xylella fastidiosa.

PubMed

Chakraborty, Sandeep; Nascimento, Rafael; Zaini, Paulo A; Gouran, Hossein; Rao, Basuthkar J; Goulart, Luiz R; Dandekar, Abhaya M

2016-01-01

Background. Xylella fastidiosa, the causative agent of various plant diseases including Pierce's disease in the US, and Citrus Variegated Chlorosis in Brazil, remains a continual source of concern and economic losses, especially since almost all commercial varieties are sensitive to this Gammaproteobacteria. Differential expression of proteins in infected tissue is an established methodology to identify key elements involved in plant defense pathways. Methods. In the current work, we developed a methodology named CHURNER that emphasizes relevant protein functions from proteomic data, based on identification of proteins with similar structures that do not necessarily have sequence homology. Such clustering emphasizes protein functions which have multiple copies that are up/down-regulated, and highlights similar proteins which are differentially regulated. As a working example we present proteomic data enumerating differentially expressed proteins in xylem sap from grapevines that were infected with X. fastidiosa. Results. Analysis of this data by CHURNER highlighted pathogenesis related PR-1 proteins, reinforcing this as the foremost protein function in xylem sap involved in the grapevine defense response to X. fastidiosa. β-1, 3-glucanase, which has both anti-microbial and anti-fungal activities, is also up-regulated. Simultaneously, chitinases are found to be both up and down-regulated by CHURNER, and thus the net gain of this protein function loses its significance in the defense response. Discussion. We demonstrate how structural data can be incorporated in the pipeline of proteomic data analysis prior to making inferences on the importance of individual proteins to plant defense mechanisms. We expect CHURNER to be applicable to any proteomic data set.
[Relationships between venomous function and innate immune function].

PubMed

Goyffon, Max; Saul, Frederick; Faure, Grazyna

2015-01-01

Venomous function is investigated in relation to innate immune function in two cases selected from scorpion venom and serpent venom. In the first case, structural analysis of scorpion toxins and defensins reveals a close interrelation between both functions (toxic and innate immune system function). In the second case, structural and functional studies of natural inhibitors of toxic snake venom phospholipases A2 reveal homology with components of the innate immune system, leading to a similar conclusion. Although there is a clear functional distinction between neurotoxins, which act by targeting membrane ion channels, and the circulating defensins which protect the organism from pathogens, the scorpion short toxins and defensins share a common protein folding scaffold with a conserved cysteine-stabilized alpha-beta motif of three disulfide bridges linking a short alpha helix and an antiparallel beta sheet. Genomic analysis suggests that these proteins share a common ancestor (long venom toxins were separated from an early gene family which gave rise to separate short toxin and defensin families). Furthermore, a scorpion toxin has been experimentally synthetized from an insect defensin, and an antibacterial scorpion peptide, androctonin (whose structure is similar to that of a cone snail venom toxin), was shown to have a similar high affinity for the postsynaptic acetylcholine receptor of Torpedo sp. Natural inhibitors of phospholipase A2 found in the blood of snakes are associated with the resistance of venomous snakes to their own highly neurotoxic venom proteins. Three classes of phospholipases A2 inhibitors (PLI-α, PLI-β, PLI-γ) have been identified. These inhibitors display diverse structural motifs related to innate immune proteins including carbohydrate recognition domains (CRD), leucine rich repeat domains (found in Toll-like receptors) and three finger domains, which clearly differentiate them from components of the adaptive immune system. Thus, in structure, function and phylogeny, venomous function in both vertebrates and invertebrates are clearly interrelated with innate immune function. © Société de Biologie, 2016.
The evolution of filamin – A protein domain repeat perspective

PubMed Central

Light, Sara; Sagit, Rauan; Ithychanda, Sujay S.; Qin, Jun; Elofsson, Arne

2013-01-01

Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin β3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates. PMID:22414427

The evolution of filamin-a protein domain repeat perspective.

PubMed

Light, Sara; Sagit, Rauan; Ithychanda, Sujay S; Qin, Jun; Elofsson, Arne

2012-09-01

Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin β3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates. Copyright © 2012 Elsevier Inc. All rights reserved.
Fusing literature and full network data improves disease similarity computation.

PubMed

Li, Ping; Nie, Yaling; Yu, Jingkai

2016-08-30

Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http:// www.digintelli.com:8000/ .
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bianchetti, Christopher M.; Bingman, Craig A.; Phillips, Jr., George N.

The thanatos (the Greek god of death)-associated protein (THAP) domain is a sequence-specific DNA-binding domain that contains a C2-CH (Cys-Xaa{sub 2-4}-Cys-Xaa{sub 35-50}-Cys-Xaa{sub 2}-His) zinc finger that is similar to the DNA domain of the P element transposase from Drosophila. THAP-containing proteins have been observed in the proteome of humans, pigs, cows, chickens, zebrafish, Drosophila, C. elegans, and Xenopus. To date, there are no known THAP domain proteins in plants, yeast, or bacteria. There are 12 identified human THAP domain-containing proteins (THAP0-11). In all human THAP protein, the THAP domain is located at the N-terminus and is {approx}90 residues in length.more » Although all of the human THAP-containing proteins have a homologous N-terminus, there is extensive variation in both the predicted structure and length of the remaining protein. Even though the exact function of these THAP proteins is not well defined, there is evidence that they play a role in cell proliferation, apoptosis, cell cycle modulation, chromatin modification, and transcriptional regulation. THAP-containing proteins have also been implicated in a number of human disease states including heart disease, neurological defects, and several types of cancers. Human THAP4 is a 577-residue protein of unknown function that is proposed to bind DNA in a sequence-specific manner similar to THAP1 and has been found to be upregulated in response to heat shock. THAP4 is expressed in a relatively uniform manner in a broad range of tissues and appears to be upregulated in lymphoma cells and highly expressed in heart cells. The C-terminal domain of THAP4 (residues 415-577), designated here as cTHAP4, is evolutionarily conserved and is observed in all known THAP4 orthologs. Several single-domain proteins lacking a THAP domain are found in plants and bacteria and show significant levels of homology to cTHAP4. It appears that cTHAP4 belongs to a large class of proteins that have yet to be fully functionally characterized. On the basis of prior work, we predicted that cTHAP4 is composed of a heme-binding nitrobindin domain, making THAP4 the only human THAP protein predicted to bind a cofactor. Nitrobindin, a recently characterized protein from Arabidopsis thaliana, is structurally similar and exhibits nitric oxide (NO)-binding properties that resemble the heme-binding nitrophorins. Nitrophorins use a heme moiety to store, transport, and release NO in a pH-specific manner. Although the exact function of nitrobindin is not fully known, the similarities between the well-characterized nitrophorins imply a role in NO transport, sensing, or metabolism. To better elucidate the possible function of THAP4, we solved the hemebound structure of cTHAP4 to a resolution of 1.79 {angstrom}.« less
Exploration of Uncharted Regions of the Protein Universe

PubMed Central

Jaroszewski, Lukasz; Li, Zhanwen; Krishna, S. Sri; Bakolitsa, Constantina; Wooley, John; Deacon, Ashley M.; Wilson, Ian A.; Godzik, Adam

2009-01-01

The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies. PMID:19787035
Structural and functional diversity of CLAVATA3/ESR (CLE)-like genes from the potato cyst nematode Globodera rostochiensis.

PubMed

Lu, Shun-Wen; Chen, Shiyan; Wang, Jianying; Yu, Hang; Chronis, Demosthenis; Mitchum, Melissa G; Wang, Xiaohong

2009-09-01

Plant CLAVATA3/ESR-related (CLE) peptides have diverse roles in plant growth and development. Here, we report the isolation and functional characterization of five new CLE genes from the potato cyst nematode Globodera rostochiensis. Unlike typical plant CLE peptides that contain a single CLE motif, four of the five Gr-CLE genes encode CLE proteins with multiple CLE motifs. These Gr-CLE genes were found to be specifically expressed within the dorsal esophageal gland cell of nematode parasitic stages, suggesting a role for their encoded proteins in plant parasitism. Overexpression phenotypes of Gr-CLE genes in Arabidopsis mimicked those of plant CLE genes, and Gr-CLE proteins could rescue the Arabidopsis clv3-2 mutant phenotype when expressed within meristems. A short root phenotype was observed when synthetic GrCLE peptides were exogenously applied to roots of Arabidopsis or potato similar to the overexpression of Gr-CLE genes in Arabidopsis and potato hairy roots. These results reveal that G. rostochiensis CLE proteins with either single or multiple CLE motifs function similarly to plant CLE proteins and that CLE signaling components are conserved in both Arabidopsis and potato roots. Furthermore, our results provide evidence to suggest that the evolution of multiple CLE motifs may be an important mechanism for generating functional diversity in nematode CLE proteins to facilitate parasitism.
Mass Spectrometry Analysis of Spatial Protein Networks by Colocalization Analysis (COLA).

PubMed

Mardakheh, Faraz K

2017-01-01

A major challenge in systems biology is comprehensive mapping of protein interaction networks. Crucially, such interactions are often dynamic in nature, necessitating methods that can rapidly mine the interactome across varied conditions and treatments to reveal change in the interaction networks. Recently, we described a fast mass spectrometry-based method to reveal functional interactions in mammalian cells on a global scale, by revealing spatial colocalizations between proteins (COLA) (Mardakheh et al., Mol Biosyst 13:92-105, 2017). As protein localization and function are inherently linked, significant colocalization between two proteins is a strong indication for their functional interaction. COLA uses rapid complete subcellular fractionation, coupled with quantitative proteomics to generate a subcellular localization profile for each protein quantified by the mass spectrometer. Robust clustering is then applied to reveal significant similarities in protein localization profiles, indicative of colocalization.
Screening and expression of selected taxonomically conserved and unique hypothetical proteins in Burkholderia pseudomallei K96243

NASA Astrophysics Data System (ADS)

Akhir, Nor Azurah Mat; Nadzirin, Nurul; Mohamed, Rahmah; Firdaus-Raih, Mohd

2015-09-01

Hypothetical proteins of bacterial pathogens represent a large numbers of novel biological mechanisms which could belong to essential pathways in the bacteria. They lack functional characterizations mainly due to the inability of sequence homology based methods to detect functional relationships in the absence of detectable sequence similarity. The dataset derived from this study showed 550 candidates conserved in genomes that has pathogenicity information and only present in the Burkholderiales order. The dataset has been narrowed down to taxonomic clusters. Ten proteins were selected for ORF amplification, seven of them were successfully amplified, and only four proteins were successfully expressed. These proteins will be great candidates in determining the true function via structural biology.
An Augmented Pocketome: Detection and Analysis of Small-Molecule Binding Pockets in Proteins of Known 3D Structure.

PubMed

Bhagavat, Raghu; Sankar, Santhosh; Srinivasan, Narayanaswamy; Chandra, Nagasuma

2018-03-06

Protein-ligand interactions form the basis of most cellular events. Identifying ligand binding pockets in proteins will greatly facilitate rationalizing and predicting protein function. Ligand binding sites are unknown for many proteins of known three-dimensional (3D) structure, creating a gap in our understanding of protein structure-function relationships. To bridge this gap, we detect pockets in proteins of known 3D structures, using computational techniques. This augmented pocketome (PocketDB) consists of 249,096 pockets, which is about seven times larger than what is currently known. We deduce possible ligand associations for about 46% of the newly identified pockets. The augmented pocketome, when subjected to clustering based on similarities among pockets, yielded 2,161 site types, which are associated with 1,037 ligand types, together providing fold-site-type-ligand-type associations. The PocketDB resource facilitates a structure-based function annotation, delineation of the structural basis of ligand recognition, and provides functional clues for domains of unknown functions, allosteric proteins, and druggable pockets. Copyright © 2018 Elsevier Ltd. All rights reserved.
Increasing protein production rates can decrease the rate at which functional protein is produced

NASA Astrophysics Data System (ADS)

Sharma, Ajeet; O'Brien, Edward

The rate at which soluble, functional protein is produced by the ribosome has recently been found to vary in complex and unexplained ways as various translation-associated rates are altered through synonymous codon substitutions. We combine a well-established ribosome-traffic model with a master-equation model of co-translational domain folding to explore the scenarios that are possible for the protein production rate, J, and the functional-nascent protein production rate, F, as the rates associated with translation are altered. We find that while J monotonically increases as the rates of translation-initiation, -elongation and -termination increase, F can either increase or decrease. F exhibits non-monotonic behavior because increasing these rates can cause a protein to be synthesized more rapidly but provide less time for nascent-protein domains to co-translationally fold thereby producing less functional nascent protein immediately after synthesis. We further demonstrate that these non-monotonic changes in Faffect the post-translational, steady-state levels of functional protein in a similar manner. Our results provide a possible explanation for recent experimental observations that the specific activity of enzymatic proteins can decrease with increased synthesis rates and can in principle be used to rationally-design transcripts to maximize the production of functional nascent protein.
The Gam protein of bacteriophage Mu is an orthologue of eukaryotic Ku

PubMed Central

di Fagagna, Fabrizio d'Adda; Weller, Geoffrey R.; Doherty, Aidan J.; Jackson, Stephen P.

2003-01-01

Mu bacteriophage inserts its DNA into the genome of host bacteria and is used as a model for DNA transposition events in other systems. The eukaryotic Ku protein has key roles in DNA repair and in certain transposition events. Here we show that the Gam protein of phage Mu is conserved in bacteria, has sequence homology with both subunits of Ku, and has the potential to adopt a similar architecture to the core DNA-binding region of Ku. Through biochemical studies, we demonstrate that Gam and the related protein of Haemophilus influenzae display DNA binding characteristics remarkably similar to those of human Ku. In addition, we show that Gam can interfere with Ty1 retrotransposition in Saccharomyces cerevisiae. These data reveal structural and functional parallels between bacteriophage Gam and eukaryotic Ku and suggest that their functions have been evolutionarily conserved. PMID:12524520
Computational analysis of human and mouse CREB3L4 Protein

PubMed Central

Velpula, Kiran Kumar; Rehman, Azeem Abdul; Chigurupati, Soumya; Sanam, Ramadevi; Inampudi, Krishna Kishore; Akila, Chandra Sekhar

2012-01-01

CREB3L4 is a member of the CREB/ATF transcription factor family, characterized by their regulation of gene expression through the cAMP-responsive element. Previous studies identified this protein in mice and humans. Whereas CREB3L4 in mice (referred to as Tisp40) is found in the testes and functions in spermatogenesis, human CREB3L4 is primarily detected in the prostate and has been implicated in cancer. We conducted computational analyses to compare the structural homology between murine Tisp40α human CREB3L4. Our results reveal that the primary and secondary structures of the two proteins contain high similarity. Additionally, predicted helical transmembrane structure reveals that the proteins likely have similar structure and function. This study offers preliminary findings that support the translation of mouse Tisp40α findings into human models, based on structural homology. PMID:22829733
L-GRAAL: Lagrangian graphlet-based network aligner.

PubMed

Malod-Dognin, Noël; Pržulj, Nataša

2015-07-01

Discovering and understanding patterns in networks of protein-protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. A few methods have been proposed for global PPI network alignments, but because of NP-completeness of underlying sub-graph isomorphism problem, producing topologically and biologically accurate alignments remains a challenge. We introduce a novel global network alignment tool, Lagrangian GRAphlet-based ALigner (L-GRAAL), which directly optimizes both the protein and the interaction functional conservations, using a novel alignment search heuristic based on integer programming and Lagrangian relaxation. We compare L-GRAAL with the state-of-the-art network aligners on the largest available PPI networks from BioGRID and observe that L-GRAAL uncovers the largest common sub-graphs between the networks, as measured by edge-correctness and symmetric sub-structures scores, which allow transferring more functional information across networks. We assess the biological quality of the protein mappings using the semantic similarity of their Gene Ontology annotations and observe that L-GRAAL best uncovers functionally conserved proteins. Furthermore, we introduce for the first time a measure of the semantic similarity of the mapped interactions and show that L-GRAAL also uncovers best functionally conserved interactions. In addition, we illustrate on the PPI networks of baker's yeast and human the ability of L-GRAAL to predict new PPIs. Finally, L-GRAAL's results are the first to show that topological information is more important than sequence information for uncovering functionally conserved interactions. L-GRAAL is coded in C++. Software is available at: http://bio-nets.doc.ic.ac.uk/L-GRAAL/. n.malod-dognin@imperial.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Improving binding mode and binding affinity predictions of docking by ligand-based search of protein conformations: evaluation in D3R grand challenge 2015

NASA Astrophysics Data System (ADS)

Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin

2017-08-01

The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.
Fuzzy measures on the Gene Ontology for gene product similarity.

PubMed

Popescu, Mihail; Keller, James M; Mitchell, Joyce A

2006-01-01

One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.
Versatile multi-functionalization of protein nanofibrils for biosensor applications

NASA Astrophysics Data System (ADS)

Sasso, L.; Suei, S.; Domigan, L.; Healy, J.; Nock, V.; Williams, M. A. K.; Gerrard, J. A.

2014-01-01

Protein nanofibrils offer advantages over other nanostructures due to the ease in their self-assembly and the versatility of surface chemistry available. Yet, an efficient and general methodology for their post-assembly functionalization remains a significant challenge. We introduce a generic approach, based on biotinylation and thiolation, for the multi-functionalization of protein nanofibrils self-assembled from whey proteins. Biochemical characterization shows the effects of the functionalization onto the nanofibrils' surface, giving insights into the changes in surface chemistry of the nanostructures. We show how these methods can be used to decorate whey protein nanofibrils with several components such as fluorescent quantum dots, enzymes, and metal nanoparticles. A multi-functionalization approach is used, as a proof of principle, for the development of a glucose biosensor platform, where the protein nanofibrils act as nanoscaffolds for glucose oxidase. Biotinylation is used for enzyme attachment and thiolation for nanoscaffold anchoring onto a gold electrode surface. Characterization via cyclic voltammetry shows an increase in glucose-oxidase mediated current response due to thiol-metal interactions with the gold electrode. The presented approach for protein nanofibril multi-functionalization is novel and has the potential of being applied to other protein nanostructures with similar surface chemistry.Protein nanofibrils offer advantages over other nanostructures due to the ease in their self-assembly and the versatility of surface chemistry available. Yet, an efficient and general methodology for their post-assembly functionalization remains a significant challenge. We introduce a generic approach, based on biotinylation and thiolation, for the multi-functionalization of protein nanofibrils self-assembled from whey proteins. Biochemical characterization shows the effects of the functionalization onto the nanofibrils' surface, giving insights into the changes in surface chemistry of the nanostructures. We show how these methods can be used to decorate whey protein nanofibrils with several components such as fluorescent quantum dots, enzymes, and metal nanoparticles. A multi-functionalization approach is used, as a proof of principle, for the development of a glucose biosensor platform, where the protein nanofibrils act as nanoscaffolds for glucose oxidase. Biotinylation is used for enzyme attachment and thiolation for nanoscaffold anchoring onto a gold electrode surface. Characterization via cyclic voltammetry shows an increase in glucose-oxidase mediated current response due to thiol-metal interactions with the gold electrode. The presented approach for protein nanofibril multi-functionalization is novel and has the potential of being applied to other protein nanostructures with similar surface chemistry. Electronic supplementary information (ESI) available: Cyclic voltammetry characterization of biosensor platforms including bare Au electrodes (Fig. S1), biosensor response to various glucose concentrations (Fig. S2), and AFM roughness measurements due to WPNF modifications (Fig. S3). See DOI: 10.1039/c3nr05752f
Composition, structure and functional properties of protein concentrates and isolates produced from walnut (Juglans regia L.).

PubMed

Mao, Xiaoying; Hua, Yufei

2012-01-01

In this study, composition, structure and the functional properties of protein concentrate (WPC) and protein isolate (WPI) produced from defatted walnut flour (DFWF) were investigated. The results showed that the composition and structure of walnut protein concentrate (WPC) and walnut protein isolate (WPI) were significantly different. The molecular weight distribution of WPI was uniform and the protein composition of DFWF and WPC was complex with the protein aggregation. H(0) of WPC was significantly higher (p < 0.05) than those of DFWF and WPI, whilst WPI had a higher H(0) compared to DFWF. The secondary structure of WPI was similar to WPC. WPI showed big flaky plate like structures; whereas WPC appeared as a small flaky and more compact structure. The most functional properties of WPI were better than WPC. In comparing most functional properties of WPI and WPC with soybean protein concentrate and isolate, WPI and WPC showed higher fat absorption capacity (FAC). Emulsifying properties and foam properties of WPC and WPI in alkaline pH were comparable with that of soybean protein concentrate and isolate. Walnut protein concentrates and isolates can be considered as potential functional food ingredients.
Protein structural similarity search by Ramachandran codes

PubMed Central

Lo, Wei-Cheng; Huang, Po-Jung; Chang, Chih-Hung; Lyu, Ping-Chiang

2007-01-01

Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. PMID:17716377
Domain architecture conservation in orthologs

PubMed Central

2011-01-01

Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. PMID:21819573
Water dynamics in protein hydration shells: the molecular origins of the dynamical perturbation.

PubMed

Fogarty, Aoife C; Laage, Damien

2014-07-17

Protein hydration shell dynamics play an important role in biochemical processes including protein folding, enzyme function, and molecular recognition. We present here a comparison of the reorientation dynamics of individual water molecules within the hydration shell of a series of globular proteins: acetylcholinesterase, subtilisin Carlsberg, lysozyme, and ubiquitin. Molecular dynamics simulations and analytical models are used to access site-resolved information on hydration shell dynamics and to elucidate the molecular origins of the dynamical perturbation of hydration shell water relative to bulk water. We show that all four proteins have very similar hydration shell dynamics, despite their wide range of sizes and functions, and differing secondary structures. We demonstrate that this arises from the similar local surface topology and surface chemical composition of the four proteins, and that such local factors alone are sufficient to rationalize the hydration shell dynamics. We propose that these conclusions can be generalized to a wide range of globular proteins. We also show that protein conformational fluctuations induce a dynamical heterogeneity within the hydration layer. We finally address the effect of confinement on hydration shell dynamics via a site-resolved analysis and connect our results to experiments via the calculation of two-dimensional infrared spectra.
Water Dynamics in Protein Hydration Shells: The Molecular Origins of the Dynamical Perturbation

PubMed Central

2014-01-01

Protein hydration shell dynamics play an important role in biochemical processes including protein folding, enzyme function, and molecular recognition. We present here a comparison of the reorientation dynamics of individual water molecules within the hydration shell of a series of globular proteins: acetylcholinesterase, subtilisin Carlsberg, lysozyme, and ubiquitin. Molecular dynamics simulations and analytical models are used to access site-resolved information on hydration shell dynamics and to elucidate the molecular origins of the dynamical perturbation of hydration shell water relative to bulk water. We show that all four proteins have very similar hydration shell dynamics, despite their wide range of sizes and functions, and differing secondary structures. We demonstrate that this arises from the similar local surface topology and surface chemical composition of the four proteins, and that such local factors alone are sufficient to rationalize the hydration shell dynamics. We propose that these conclusions can be generalized to a wide range of globular proteins. We also show that protein conformational fluctuations induce a dynamical heterogeneity within the hydration layer. We finally address the effect of confinement on hydration shell dynamics via a site-resolved analysis and connect our results to experiments via the calculation of two-dimensional infrared spectra. PMID:24479585

PDB to AMPL Conversion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anna Johnston, SNL 9215

2002-09-01

PDB to AMPL Conversion was written to convert protein data base files to AMPL files. The protein data bases on the internet contain a wealth of information about the structue and makeup of proteins. Each file contains information derived by one or more experiments and contains information on how the experiment waw performed, the amino acid building blocks of each chain, and often the three-dimensional structure of the protein extracted from the experiments. The way a protein folds determines much about its function. Thus, studying the three-dimensional structure of the protein is of great interest. Analysing the contact maps ismore » one way to examine the structure. A contact map is a graph which has a linear back bone of amino acids for nodes (i.e., adjacent amino acids are always connected) and vertices between non-adjacent nodes if they are close enough to be considered in contact. If the graphs are similar then the folds of the protein and their function should also be similar. This software extracts the contact maps from a protein data base file and puts in into AMPL data format. This format is designed for use in AMPL, a programming language for simplifying linear programming formulations.« less
Unique nonstructural proteins of Pneumonia Virus of Mice (PVM) promote degradation of interferon (IFN) pathway components and IFN-stimulated gene proteins.

PubMed

Dhar, Jayeeta; Barik, Sailen

2016-12-01

Pneumonia Virus of Mice (PVM) is the only virus that shares the Pneumovirus genus of the Paramyxoviridae family with Respiratory Syncytial Virus (RSV). A deadly mouse pathogen, PVM has the potential to serve as a robust animal model of RSV infection, since human RSV does not fully replicate the human pathology in mice. Like RSV, PVM also encodes two nonstructural proteins that have been implicated to suppress the IFN pathway, but surprisingly, they exhibit no sequence similarity with their RSV equivalents. The molecular mechanism of PVM NS function, therefore, remains unknown. Here, we show that recombinant PVM NS proteins degrade the mouse counterparts of the IFN pathway components. Proteasomal degradation appears to be mediated by ubiquitination promoted by PVM NS proteins. Interestingly, NS proteins of PVM lowered the levels of several ISG (IFN-stimulated gene) proteins as well. These results provide a molecular foundation for the mechanisms by which PVM efficiently subverts the IFN response of the murine cell. They also reveal that in spite of their high sequence dissimilarity, the two pneumoviral NS proteins are functionally and mechanistically similar.
Claudins reign: The claudin/EMP/PMP22/γ channel protein family in C. elegans.

PubMed

Simske, Jeffrey S

2013-07-01

The claudin family of integral membrane proteins was identified as the major protein component of the tight junctions in all vertebrates. Since their identification, claudins, and their associated pfam00822 superfamily of proteins have been implicated in a wide variety of cellular processes. Claudin homologs have been identified in invertebrates as well, including Drosophila and C. elegans. Recent studies demonstrate that the C. elegans claudins, clc-1-clc- 5, and similar proteins in the greater PMP22/EMP/claudin/voltage-gated calcium channel γ subunit family, including nsy-4, and vab-9, while highly divergent at a sequence level from each other and from the vertebrate claudins, in many cases play roles similar to those traditionally assigned to their vertebrate homologs. These include regulating cell adhesion and passage of small molecules through the paracellular space, channel activity, protein aggregation, sensitivity to pore-forming toxins, intercellular signaling, cell fate specification and dynamic changes in cell morphology. Study of claudin superfamily proteins in C. elegans should continue to provide clues as to how claudin family protein function has been adapted to perform diverse functions at specialized cell-cell contacts in metazoans.
Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

PubMed

Gerlt, John A

2017-08-22

The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.
Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence–Function Space and Genome Context to Discover Novel Functions

PubMed Central

2017-01-01

The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of “genomic enzymology” web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence–function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems. PMID:28826221
The devil is in the details: comparison between COP9 signalosome (CSN) and the LID of the 26S proteasome.

PubMed

Meister, Cindy; Gulko, Miriam Kolog; Köhler, Anna M; Braus, Gerhard H

2016-02-01

The COP9 signalosome (CSN) and the proteasomal LID are conserved macromolecular complexes composed of at least eight subunits with molecular weights of approximately 350 kDa. CSN and LID are part of the ubiquitin–proteasome pathway and cleave isopeptide linkages of lysine side chains on target proteins. CSN cleaves the isopeptide bond of ubiquitin-like protein Nedd8 from cullins, whereas the LID cleaves ubiquitin from target proteins sentenced for degradation. CSN and LID are structurally and functionally similar but the order of the assembly pathway seems to be different. The assembly differs in at least the last subunit joining the pre-assembled subcomplex. This review addresses the similarities and differences in structure, function and assembly of CSN and LID.
Structure modification and functionality of whey proteins: quantitative structure-activity relationship approach.

PubMed

Nakai, S; Li-Chan, E

1985-10-01

According to the original idea of quantitative structure-activity relationship, electric, hydrophobic, and structural parameters should be taken into consideration for elucidating functionality. Changes in these parameters are reflected in the property of protein solubility upon modification of whey proteins by heating. Although solubility is itself a functional property, it has been utilized to explain other functionalities of proteins. However, better correlations were obtained when hydrophobic parameters of the proteins were used in conjunction with solubility. Various treatments reported in the literature were applied to whey protein concentrate in an attempt to obtain whipping and gelling properties similar to those of egg white. Mapping simplex optimization was used to search for the best results. Improvement in whipping properties by pepsin hydrolysis may have been due to higher protein solubility, and good gelling properties resulting from polyphosphate treatment may have been due to an increase in exposable hydrophobicity. However, the results of angel food cake making were still unsatisfactory.
Construction of a multi-functional extracellular matrix protein that increases number of N1E-115 neuroblast cells having neurites.

PubMed

Nakamura, Makiko; Mie, Masayasu; Mihara, Hisakazu; Nakamura, Makoto; Kobatake, Eiry

2009-10-01

An artificially designed fusion protein, which was designed to have strong cell adhesive activity and an active functional unit that enhances neuronal differentiation of mouse N1E-115 neuroblast cells, was developed. In this study, a laminin-1-derived IKVAV sequence, which stimulates neurite outgrowth in conditions of serum deprivation, was engineered and incorporated into an elastin-derived structural unit. The designed fusion protein also had a cell-adhesive RGD sequence derived from fibronectin. The resultant fusion protein could adsorb efficiently onto hydrophobic culture surfaces and showed cell adhesion activity similar to laminin. N1E-115 cells grown on the fusion protein exhibited more cells with neurites than cells grown on laminin-1. These results indicated that the constructed protein could retain properties of incorporated functional peptides and could provide effective signal transport. The strategy of designing multi-functional fusion proteins has the possibility for supporting current tissue engineering techniques. (c) 2009 Wiley Periodicals, Inc.
Characterizing protein domain associations by Small-molecule ligand binding

PubMed Central

Li, Qingliang; Cheng, Tiejun; Wang, Yanli; Bryant, Stephen H.

2012-01-01

Background Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. Methods We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. Results We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. Conclusions A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing. PMID:23745168
Construction of ontology augmented networks for protein complex prediction.

PubMed

Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

2013-01-01

Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.
Structure and function of homodomain-leucine zipper (HD-Zip) proteins.

PubMed

Elhiti, Mohamed; Stasolla, Claudio

2009-02-01

Homeodomain-leucine zipper (HD-Zip) proteins are transcription factors unique to plants and are encoded by more than 25 genes in Arabidopsis thaliana. Based on sequence analyses these proteins have been classified into four distinct groups: HD-Zip I-IV. HD-Zip proteins are characterized by the presence of two functional domains; a homeodomain (HD) responsible for DNA binding and a leucine zipper domain (Zip) located immediately C-terminal to the homeodomain and involved in protein-protein interaction. Despite sequence similarities HD-ZIP proteins participate in a variety of processes during plant growth and development. HD-Zip I proteins are generally involved in responses related to abiotic stress, abscisic acid (ABA), blue light, de-etiolation and embryogenesis. HD-Zip II proteins participate in light response, shade avoidance and auxin signalling. Members of the third group (HD-Zip III) control embryogenesis, leaf polarity, lateral organ initiation and meristem function. HD-Zip IV proteins play significant roles during anthocyanin accumulation, differentiation of epidermal cells, trichome formation and root development.
Computation-Guided Backbone Grafting of a Discontinuous Motif onto a Protein Scaffold

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azoitei, Mihai L.; Correia, Bruno E.; Ban, Yih-En Andrew

2012-02-07

The manipulation of protein backbone structure to control interaction and function is a challenge for protein engineering. We integrated computational design with experimental selection for grafting the backbone and side chains of a two-segment HIV gp120 epitope, targeted by the cross-neutralizing antibody b12, onto an unrelated scaffold protein. The final scaffolds bound b12 with high specificity and with affinity similar to that of gp120, and crystallographic analysis of a scaffold bound to b12 revealed high structural mimicry of the gp120-b12 complex structure. The method can be generalized to design other functional proteins through backbone grafting.
Regulation of Macrophage Recognition through the Interplay of Nanoparticle Surface Functionality and Protein Corona.

PubMed

Saha, Krishnendu; Rahimi, Mehran; Yazdani, Mahdieh; Kim, Sung Tae; Moyano, Daniel F; Hou, Singyuk; Das, Ridhha; Mout, Rubul; Rezaee, Farhad; Mahmoudi, Morteza; Rotello, Vincent M

2016-04-26

Using a family of cationic gold nanoparticles (NPs) with similar size and charge, we demonstrate that proper surface engineering can control the nature and identity of protein corona in physiological serum conditions. The protein coronas were highly dependent on the hydrophobicity and arrangement of chemical motifs on NP surface. The NPs were uptaken in macrophages in a corona-dependent manner, predominantly through recognition of specific complement proteins in the NP corona. Taken together, this study shows that surface functionality can be used to tune the protein corona formed on NP surface, dictating the interaction of NPs with macrophages.
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.

PubMed

Lee, Hui Sun; Im, Wonpil

2017-01-01

Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
Genome-wide analysis of the homeodomain-leucine zipper (HD-ZIP) gene family in peach (Prunus persica).

PubMed

Zhang, C H; Ma, R J; Shen, Z J; Sun, X; Korir, N K; Yu, M L

2014-04-08

In this study, 33 homeodomain-leucine zipper (HD-ZIP) genes were identified in peach using the HD-ZIP amino acid sequences of Arabidopsis thaliana as a probe. Based on the phylogenetic analysis and the individual gene or protein characteristics, the HD-ZIP gene family in peach can be classified into 4 subfamilies, HD-ZIP I, II, III, and IV, containing 14, 7, 4, and 8 members, respectively. The most closely related peach HD-ZIP members within the same subfamilies shared very similar gene structure in terms of either intron/exon numbers or lengths. Almost all members of the same subfamily shared common motif compositions, thereby implying that the HD-ZIP proteins within the same subfamily may have functional similarity. The 33 peach HD-ZIP genes were distributed across scaffolds 1 to 7. Although the primary structure varied among HD-ZIP family proteins, their tertiary structures were similar. The results from this study will be useful in selecting candidate genes from specific subfamilies for functional analysis.
Activity-Based Protein Profiling Reveals Mitochondrial Oxidative Enzyme Impairment and Restoration in Diet-Induced Obese Mice

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sadler, Natalie C.; Angel, Thomas E.; Lewis, Michael P.

High-fat diet (HFD) induced obesity and concomitant development of insulin resistance (IR) and type 2 diabetes mellitus have been linked to mitochondrial dysfunction. However, it is not clear whether mitochondrial dysfunction is a direct effect of a HFD or if the mitochondrial function is reduced with increased HFD duration. We hypothesized that the function of mitochondrial oxidative and lipid metabolism functions in skeletal muscle mitochondria for HFD mice are similar or elevated relative to standard diet (SD) mice, thereby IR is neither cause nor consequence of mitochondrial dysfunction. We applied a chemical probe approach to identify functionally reactive ATPases andmore » nucleotide-binding proteins in mitochondria isolated from skeletal muscle of C57Bl/6J mice fed HFD or SD chow for 2-, 8-, or 16-weeks; feeding time points known to induce IR. A total of 293 probe-labeled proteins were identified by mass spectrometry-based proteomics, of which 54 differed in abundance between HFD and SD mice. We found proteins associated with the TCA cycle, oxidative phosphorylation (OXPHOS), and lipid metabolism were altered in function when comparing SD to HFD fed mice at 2-weeks, however by 16-weeks HFD mice had TCA cycle, β-oxidation, and respiratory chain function at levels similar to or higher than SD mice.« less
The dynamics of single protein molecules is non-equilibrium and self-similar over thirteen decades in time

NASA Astrophysics Data System (ADS)

Hu, Xiaohu; Hong, Liang; Dean Smith, Micholas; Neusius, Thomas; Cheng, Xiaolin; Smith, Jeremy C.

2016-02-01

Internal motions of proteins are essential to their function. The time dependence of protein structural fluctuations is highly complex, manifesting subdiffusive, non-exponential behaviour with effective relaxation times existing over many decades in time, from ps up to ~102 s (refs ,,,). Here, using molecular dynamics simulations, we show that, on timescales from 10-12 to 10-5 s, motions in single proteins are self-similar, non-equilibrium and exhibit ageing. The characteristic relaxation time for a distance fluctuation, such as inter-domain motion, is observation-time-dependent, increasing in a simple, power-law fashion, arising from the fractal nature of the topology and geometry of the energy landscape explored. Diffusion over the energy landscape follows a non-ergodic continuous time random walk. Comparison with single-molecule experiments suggests that the non-equilibrium self-similar dynamical behaviour persists up to timescales approaching the in vivo lifespan of individual protein molecules.
Evidence for the principle of minimal frustration in the evolution of protein folding landscapes.

PubMed

Tzul, Franco O; Vasilchuk, Daniel; Makhatadze, George I

2017-02-28

Theoretical and experimental studies have firmly established that protein folding can be described by a funneled energy landscape. This funneled energy landscape is the result of foldable protein sequences evolving following the principle of minimal frustration, which allows proteins to rapidly fold to their native biologically functional conformations. For a protein family with a given functional fold, the principle of minimal frustration suggests that, independent of sequence, all proteins within this family should fold with similar rates. However, depending on the optimal living temperature of the organism, proteins also need to modulate their thermodynamic stability. Consequently, the difference in thermodynamic stability should be primarily caused by differences in the unfolding rates. To test this hypothesis experimentally, we performed comprehensive thermodynamic and kinetic analyses of 15 different proteins from the thioredoxin family. Eight of these thioredoxins were extant proteins from psychrophilic, mesophilic, or thermophilic organisms. The other seven protein sequences were obtained using ancestral sequence reconstruction and can be dated back over 4 billion years. We found that all studied proteins fold with very similar rates but unfold with rates that differ up to three orders of magnitude. The unfolding rates correlate well with the thermodynamic stability of the proteins. Moreover, proteins that unfold slower are more resistant to proteolysis. These results provide direct experimental support to the principle of minimal frustration hypothesis.
Automated prediction of protein function and detection of functional sites from structure.

PubMed

Pazos, Florencio; Sternberg, Michael J E

2004-10-12

Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.
Structure-Functional Basis of Ion Transport in Sodium–Calcium Exchanger (NCX) Proteins

PubMed Central

Giladi, Moshe; Shor, Reut; Lisnyansky, Michal; Khananshvili, Daniel

2016-01-01

The membrane-bound sodium–calcium exchanger (NCX) proteins shape Ca2+ homeostasis in many cell types, thus participating in a wide range of physiological and pathological processes. Determination of the crystal structure of an archaeal NCX (NCX_Mj) paved the way for a thorough and systematic investigation of ion transport mechanisms in NCX proteins. Here, we review the data gathered from the X-ray crystallography, molecular dynamics simulations, hydrogen–deuterium exchange mass-spectrometry (HDX-MS), and ion-flux analyses of mutants. Strikingly, the apo NCX_Mj protein exhibits characteristic patterns in the local backbone dynamics at particular helix segments, thereby possessing characteristic HDX profiles, suggesting structure-dynamic preorganization (geometric arrangements of catalytic residues before the transition state) of conserved α1 and α2 repeats at ion-coordinating residues involved in transport activities. Moreover, dynamic preorganization of local structural entities in the apo protein predefines the status of ion-occlusion and transition states, even though Na+ or Ca2+ binding modifies the preceding backbone dynamics nearby functionally important residues. Future challenges include resolving the structural-dynamic determinants governing the ion selectivity, functional asymmetry and ion-induced alternating access. Taking into account the structural similarities of NCX_Mj with the other proteins belonging to the Ca2+/cation exchanger superfamily, the recent findings can significantly improve our understanding of ion transport mechanisms in NCX and similar proteins. PMID:27879668

Structure-Functional Basis of Ion Transport in Sodium-Calcium Exchanger (NCX) Proteins.

PubMed

Giladi, Moshe; Shor, Reut; Lisnyansky, Michal; Khananshvili, Daniel

2016-11-22

The membrane-bound sodium-calcium exchanger (NCX) proteins shape Ca 2+ homeostasis in many cell types, thus participating in a wide range of physiological and pathological processes. Determination of the crystal structure of an archaeal NCX (NCX_Mj) paved the way for a thorough and systematic investigation of ion transport mechanisms in NCX proteins. Here, we review the data gathered from the X-ray crystallography, molecular dynamics simulations, hydrogen-deuterium exchange mass-spectrometry (HDX-MS), and ion-flux analyses of mutants. Strikingly, the apo NCX_Mj protein exhibits characteristic patterns in the local backbone dynamics at particular helix segments, thereby possessing characteristic HDX profiles, suggesting structure-dynamic preorganization (geometric arrangements of catalytic residues before the transition state) of conserved α₁ and α₂ repeats at ion-coordinating residues involved in transport activities. Moreover, dynamic preorganization of local structural entities in the apo protein predefines the status of ion-occlusion and transition states, even though Na⁺ or Ca 2+ binding modifies the preceding backbone dynamics nearby functionally important residues. Future challenges include resolving the structural-dynamic determinants governing the ion selectivity, functional asymmetry and ion-induced alternating access. Taking into account the structural similarities of NCX_Mj with the other proteins belonging to the Ca 2+ /cation exchanger superfamily, the recent findings can significantly improve our understanding of ion transport mechanisms in NCX and similar proteins.
Kangaroo IGF-II is structurally and functionally similar to the human [Ser29]-IGF-II variant.

PubMed

Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

1999-06-01

Kangaroo IGF-II has been purified from western grey kangaroo (Macropus fuliginosus) serum and characterised in a number of in vitro assays. In addition, the complete cDNA sequence of mature IGF-II has been obtained by reverse-transcription polymerase chain reaction. Comparison of the kangaroo IGF-II cDNA sequence with known IGF-II sequences from other species revealed that it is very similar to the human variant, [Ser29]-hIGF-II. Both the variant and kangaroo IGF-II contain an insert of nine nucleotides that encode the amino acids Leu-Pro-Gly at the junction of the B and C domains of the mature protein. The deduced kangaroo IGF-II protein sequence also contains three other amino acid changes that are not observed in human IGF-II. These amino acid differences share similarities with the changes described in many of the IGF-IIs reported for non-mammalian species. Characterisation of human IGF-II, kangaroo IGF-II, chicken IGF-II and [Ser29]-hIGF-II in a number of in vitro assays revealed that all four proteins are functionally very similar. No significant differences were observed in the ability of the IGF-IIs to bind to the bovine IGF-II/cation-independent mannose 6-phosphate receptor or to stimulate protein synthesis in rat L6 myoblasts. However, differences were observed in their abilities to bind to IGF-binding proteins (IGFBPs) present in human serum. Kangaroo, chicken and [Ser29]-hIGF-II had lower apparent affinities for human IGFBPs than did human IGF-II. Thus, it appears that the major circulating form of IGF-II in the kangaroo and a minor form of IGF-II found in human serum are structurally and functionally very similar. This suggests that the splice site that generates both the variant and major form of human IGF-II must have evolved after the divergence of marsupials from placental mammals.
An assessment of catalytic residue 3D ensembles for the prediction of enzyme function.

PubMed

Žváček, Clemens; Friedrichs, Gerald; Heizinger, Leonhard; Merkl, Rainer

2015-11-04

The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D poses are. Both similarities are key criteria for assessing the usability of pose comparison for function prediction. We have analyzed the SCOP database on the superfamily level in order to estimate the number of non-homologous enzymes possessing the same function according to their EC number. 89% of the 873 substrate-specific functions (four digit EC number) assigned to mono-functional, single-domain enzymes were only found in one superfamily. For a reaction-specific grouping (three digit EC number), this value dropped to 35%, indicating that in approximately 65% of all enzymes the same function evolved in two or more non-homologous proteins. For these isofunctional enzymes, structural similarity of the catalytic sites may help to predict function, because neither high sequence similarity nor identical folds are required for a comparison. To assess the specificity of catalytic 3D poses, we compiled the redundancy-free set ENZ_SITES, which comprises 695 sites, whose composition and function are well-defined. We compared their poses with the help of the program Superpose3D and determined classification performance. If the sites were from different superfamilies, the number of true and false positive predictions was similarly high, both for a coarse and a detailed grouping of enzyme function. Moreover, classification performance did not improve drastically, if we additionally used homologous sites to predict function. For a large number of enzymatic functions, dissimilar sites evolved that catalyze the same reaction and it is the individual substrate that determines the arrangement of the catalytic site and its local environment. These substrate-specific requirements turn the comparison of catalytic residues into a weak classifier for the prediction of enzyme function.
Effect of two lipid-lowering strategies on high-density lipoprotein function and some HDL-related proteins: a randomized clinical trial.

PubMed

Lee, Chan Joo; Choi, Seungbum; Cheon, Dong Huey; Kim, Kyeong Yeon; Cheon, Eun Jeong; Ann, Soo-Jin; Noh, Hye-Min; Park, Sungha; Kang, Seok-Min; Choi, Donghoon; Lee, Ji Eun; Lee, Sang-Hak

2017-02-28

The influence of lipid-lowering therapy on high-density lipoprotein (HDL) is incompletely understood. We compared the effect of two lipid-lowering strategies on HDL functions and identified some HDL-related proteins. Thirty two patients were initially screened and HDLs of 21 patients were finally analyzed. Patients were randomized to receive atorvastatin 20 mg (n = 11) or atorvastatin 5 mg/ezetimibe 10 mg combination (n = 10) for 8 weeks. The cholesterol efflux capacity and other anti-inflammatory functions were assessed based on HDLs of the participants before and after treatment. Pre-specified HDL proteins of the same HDL samples were measured. The post-treatment increase in cholesterol efflux capacities was similar between the groups (35.6% and 34.6% for mono-therapy and combination, respectively, p = 0.60). Changes in nitric oxide (NO) production, vascular cell adhesion molecule-1 (VCAM-1) expression, and reactive oxygen species (ROS) production were similar between the groups. The baseline cholesterol efflux capacity correlated positively with apolipoprotein (apo)A1 and C3, whereas apoA1 and apoC1 showed inverse associations with VCAM-1 expression. The changes in the cholesterol efflux capacity were positively correlated with multiple HDL proteins, especially apoA2. Two regimens increased the cholesterol efflux capacity of HDL comparably. Multiple HDL proteins, not limited to apoA1, showed a correlation with HDL functions. These results indicate that conventional lipid therapy may have additional effects on HDL functions with changes in HDL proteins. ClinicalTrials.gov, number NCT02942602 .
Cellular Retinoic Acid Binding Proteins: Genomic and Non-genomic Functions and their Regulation.

PubMed

Wei, Li-Na

Cellular retinoic acid binding proteins (CRABPs) are high-affinity retinoic acid (RA) binding proteins that mainly reside in the cytoplasm. In mammals, this family has two members, CRABPI and II, both highly conserved during evolution. The two proteins share a very similar structure that is characteristic of a "β-clam" motif built up from10-strands. The proteins are encoded by two different genes that share a very similar genomic structure. CRABPI is widely distributed and CRABPII has restricted expression in only certain tissues. The CrabpI gene is driven by a housekeeping promoter, but can be regulated by numerous factors, including thyroid hormones and RA, which engage a specific chromatin-remodeling complex containing either TRAP220 or RIP140 as coactivator and corepressor, respectively. The chromatin-remodeling complex binds the DR4 element in the CrabpI gene promoter to activate or repress this gene in different cellular backgrounds. The CrabpII gene promoter contains a TATA-box and is rapidly activated by RA through an RA response element. Biochemical and cell culture studies carried out in vitro show the two proteins have distinct biological functions. CRABPII mainly functions to deliver RA to the nuclear RA receptors for gene regulation, although recent studies suggest that CRABPII may also be involved in other cellular events, such as RNA stability. In contrast, biochemical and cell culture studies suggest that CRABPI functions mainly in the cytoplasm to modulate intracellular RA availability/concentration and to engage other signaling components such as ERK activity. However, these functional studies remain inconclusive because knocking out one or both genes in mice does not produce definitive phenotypes. Further studies are needed to unambiguously decipher the exact physiological activities of these two proteins.
Reproducing Crystal Binding Modes of Ligand Functional Groups using Site-Identification by Ligand Competitive Saturation (SILCS) Simulations

PubMed Central

Raman, E. Prabhu; Yu, Wenbo; Guvench, Olgun; MacKerell, Alexander D.

2011-01-01

The applicability of a computational method, Site Identification by Ligand Competitive Saturation (SILCS), to identify regions on a protein surface with which different types of functional groups on low-molecular weight inhibitors interact is demonstrated. The method involves molecular dynamics (MD) simulations of a protein in an aqueous solution of chemically diverse small molecules from which probability distributions of fragments types, termed FragMaps, are obtained. In the present application, SILCS simulations are performed with an aqueous solution of 1 M benzene and propane to map the affinity pattern of the protein for aromatic and aliphatic functional groups. In addition, water hydrogen and oxygen atoms serve as probes for hydrogen bond donor and acceptor affinity, respectively. The method is tested using a set of 7 proteins for which crystal structures of complexes with several high affinity inhibitors are known. Good agreement is obtained between FragMaps and the positions of chemically similar functional groups in inhibitors as observed in the X-ray crystallographic structures. Quantitative capabilities of the SILCS approach are demonstrated by converting FragMaps to free energies, termed Grid Free Energies (GFE), and showing correlation between the GFE values and experimental binding affinities. For proteins for which ligand decoy sets are available, GFE values are shown to typically score the crystal conformation and conformations similar to it more favorable than decoys. Additionally, SILCS is tested for its ability to capture the subtle differences in ligand affinity across homologous proteins, information which may be of utility towards specificity-guided drug design. Taken together, our results show that SILCS can recapitulate the known location of functional groups of bound inhibitors for a number of proteins, suggesting that the method may be of utility for rational drug design. PMID:21456594
Functional significance of O-GlcNAc modification in regulating neuronal properties.

PubMed

Hwang, Hongik; Rhim, Hyewhon

2018-03-01

Post-translational modifications (PTMs) covalently modify proteins and diversify protein functions. Along with protein phosphorylation, another common PTM is the addition of O-linked β-N-acetylglucosamine (O-GlcNAc) to serine and/or threonine residues. O-GlcNAc modification is similar to phosphorylation in that it occurs to serine and threonine residues and cycles on and off with a similar time scale. However, a striking difference is that the addition and removal of the O-GlcNAc moiety on all substrates are mediated by the two enzymes regardless of proteins, O-GlcNAc transferase (OGT) and O-GlcNAcase (OGA), respectively. O-GlcNAcylation can interact or potentially compete with phosphorylation on serine and threonine residues, and thus serves as an important molecular mechanism to modulate protein functions and activation. However, it has been challenging to address the role of O-GlcNAc modification in regulating protein functions at the molecular level due to the lack of convenient tools to determine the sites and degrees of O-GlcNAcylation. Studies in this field have only begun to expand significantly thanks to the recent advances in detection and manipulation methods such as quantitative proteomics and highly selective small-molecule inhibitors for OGT and OGA. Interestingly, multiple brain regions, especially hippocampus, express high levels of both OGT and OGA, and a number of neuron-specific proteins have been reported to undergo O-GlcNAcylation. This review aims to discuss the recent updates concerning the impacts of O-GlcNAc modification on neuronal functions at multiple levels ranging from intrinsic neuronal properties to synaptic plasticity and animal behaviors. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantum coupled mutation finder: predicting functionally or structurally important sites in proteins using quantum Jensen-Shannon divergence and CUDA programming.

PubMed

Gültas, Mehmet; Düzgün, Güncel; Herzog, Sebastian; Jäger, Sven Joachim; Meckbach, Cornelia; Wingender, Edgar; Waack, Stephan

2014-04-03

The identification of functionally or structurally important non-conserved residue sites in protein MSAs is an important challenge for understanding the structural basis and molecular mechanism of protein functions. Despite the rich literature on compensatory mutations as well as sequence conservation analysis for the detection of those important residues, previous methods often rely on classical information-theoretic measures. However, these measures usually do not take into account dis/similarities of amino acids which are likely to be crucial for those residues. In this study, we present a new method, the Quantum Coupled Mutation Finder (QCMF) that incorporates significant dis/similar amino acid pair signals in the prediction of functionally or structurally important sites. The result of this study is twofold. First, using the essential sites of two human proteins, namely epidermal growth factor receptor (EGFR) and glucokinase (GCK), we tested the QCMF-method. The QCMF includes two metrics based on quantum Jensen-Shannon divergence to measure both sequence conservation and compensatory mutations. We found that the QCMF reaches an improved performance in identifying essential sites from MSAs of both proteins with a significantly higher Matthews correlation coefficient (MCC) value in comparison to previous methods. Second, using a data set of 153 proteins, we made a pairwise comparison between QCMF and three conventional methods. This comparison study strongly suggests that QCMF complements the conventional methods for the identification of correlated mutations in MSAs. QCMF utilizes the notion of entanglement, which is a major resource of quantum information, to model significant dissimilar and similar amino acid pair signals in the detection of functionally or structurally important sites. Our results suggest that on the one hand QCMF significantly outperforms the previous method, which mainly focuses on dissimilar amino acid signals, to detect essential sites in proteins. On the other hand, it is complementary to the existing methods for the identification of correlated mutations. The method of QCMF is computationally intensive. To ensure a feasible computation time of the QCMF's algorithm, we leveraged Compute Unified Device Architecture (CUDA).The QCMF server is freely accessible at http://qcmf.informatik.uni-goettingen.de/.
Discrepancy between mRNA and protein abundance: Insight from information retrieval process in computers

PubMed Central

Wang, Degeng

2008-01-01

Discrepancy between the abundance of cognate protein and RNA molecules is frequently observed. A theoretical understanding of this discrepancy remains elusive, and it is frequently described as surprises and/or technical difficulties in the literature. Protein and RNA represent different steps of the multi-stepped cellular genetic information flow process, in which they are dynamically produced and degraded. This paper explores a comparison with a similar process in computers - multi-step information flow from storage level to the execution level. Functional similarities can be found in almost every facet of the retrieval process. Firstly, common architecture is shared, as the ribonome (RNA space) and the proteome (protein space) are functionally similar to the computer primary memory and the computer cache memory respectively. Secondly, the retrieval process functions, in both systems, to support the operation of dynamic networks – biochemical regulatory networks in cells and, in computers, the virtual networks (of CPU instructions) that the CPU travels through while executing computer programs. Moreover, many regulatory techniques are implemented in computers at each step of the information retrieval process, with a goal of optimizing system performance. Cellular counterparts can be easily identified for these regulatory techniques. In other words, this comparative study attempted to utilize theoretical insight from computer system design principles as catalysis to sketch an integrative view of the gene expression process, that is, how it functions to ensure efficient operation of the overall cellular regulatory network. In context of this bird’s-eye view, discrepancy between protein and RNA abundance became a logical observation one would expect. It was suggested that this discrepancy, when interpreted in the context of system operation, serves as a potential source of information to decipher regulatory logics underneath biochemical network operation. PMID:18757239
Reduced expression of the NMDA receptor-interacting protein SynGAP causes behavioral abnormalities that model symptoms of Schizophrenia.

PubMed

Guo, Xiaochuan; Hamilton, Peter J; Reish, Nicholas J; Sweatt, J David; Miller, Courtney A; Rumbaugh, Gavin

2009-06-01

Abnormal function of NMDA receptors is believed to be a contributing factor to the pathophysiology of schizophrenia. NMDAR subunits and postsynaptic-interacting proteins of these channels are abnormally expressed in some patients with this illness. In mice, reduced NMDAR expression leads to behaviors analogous to symptoms of schizophrenia, but reports of animals with mutations in core postsynaptic density proteins having similar a phenotype have yet to be reported. Here we show that reduced expression of the neuronal RasGAP and NMDAR-associated protein, SynGAP, results in abnormal behaviors strikingly similar to that reported in mice with reduced NMDAR function. SynGAP mutant mice exhibited nonhabituating and persistent hyperactivity that was ameliorated by the antipsychotic clozapine. An NMDAR antagonist, MK-801, induced hyperactivity in normal mice but SynGAP mutants were less responsive, suggesting that NMDAR hypofunction contributes to this behavioral abnormality. SynGAP mutants exhibited enhanced startle reactivity and impaired sensory-motor gating. These mice also displayed a complete lack of social memory and a propensity toward social isolation. Finally, SynGAP mutants had deficits in cued fear conditioning and working memory, indicating abnormal function of circuits that control emotion and choice. Our results demonstrate that SynGAP mutant mice have gross neurological deficits similar to other mouse models of schizophrenia. Because SynGAP interacts with NMDARs, and the signaling activity of this protein is regulated by these channels, our data in dicate that SynGAP lies downstream of NMDARs and is a required intermediate for normal neural circuit function and behavior. Taken together, these data support the idea that schizophrenia may arise from abnormal signaling pathways that are mediated by NMDA receptors.
Discrepancy between mRNA and protein abundance: insight from information retrieval process in computers.

PubMed

Wang, Degeng

2008-12-01

Discrepancy between the abundance of cognate protein and RNA molecules is frequently observed. A theoretical understanding of this discrepancy remains elusive, and it is frequently described as surprises and/or technical difficulties in the literature. Protein and RNA represent different steps of the multi-stepped cellular genetic information flow process, in which they are dynamically produced and degraded. This paper explores a comparison with a similar process in computers-multi-step information flow from storage level to the execution level. Functional similarities can be found in almost every facet of the retrieval process. Firstly, common architecture is shared, as the ribonome (RNA space) and the proteome (protein space) are functionally similar to the computer primary memory and the computer cache memory, respectively. Secondly, the retrieval process functions, in both systems, to support the operation of dynamic networks-biochemical regulatory networks in cells and, in computers, the virtual networks (of CPU instructions) that the CPU travels through while executing computer programs. Moreover, many regulatory techniques are implemented in computers at each step of the information retrieval process, with a goal of optimizing system performance. Cellular counterparts can be easily identified for these regulatory techniques. In other words, this comparative study attempted to utilize theoretical insight from computer system design principles as catalysis to sketch an integrative view of the gene expression process, that is, how it functions to ensure efficient operation of the overall cellular regulatory network. In context of this bird's-eye view, discrepancy between protein and RNA abundance became a logical observation one would expect. It was suggested that this discrepancy, when interpreted in the context of system operation, serves as a potential source of information to decipher regulatory logics underneath biochemical network operation.
Potential functions of LEA proteins from the brine shrimp Artemia franciscana - anhydrobiosis meets bioinformatics.

PubMed

Janis, Brett; Uversky, Vladimir N; Menze, Michael A

2017-10-23

Late embryogenesis abundant (LEA) proteins are a large group of anhydrobiosis-associated intrinsically disordered proteins, which are commonly found in plants and some animals. The brine shrimp Artemia franciscana is the only known animal that expresses LEA proteins from three, and not only one, different groups in its anhydrobiotic life stage. The reason for the higher complexity in the A. franciscana LEA proteome (LEAome), compared with other anhydrobiotic animals, remains mostly unknown. To address this issue, we have employed a suite of bioinformatics tools to evaluate the disorder status of the Artemia LEAome and to analyze the roles of intrinsic disorder in functioning of brine shrimp LEA proteins. We show here that A. franciscana LEA proteins from different groups are more similar to each other than one originally expected, while functional differences among members of group three are possibly larger than commonly anticipated. Our data show that although these proteins are characterized by a large variety of forms and possible functions, as a general strategy, A. franciscana utilizes glassy matrix forming LEAs concurrently with proteins that more readily interact with binding partners. It is likely that the function(s) of both types, the matrix-forming and partner-binding LEA proteins, are regulated by changing water availability during desiccation.
ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures.

PubMed

Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka

2012-02-27

ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.
DNA repair and recombination in higher plants: insights from comparative genomics of Arabidopsis and rice.

PubMed

Singh, Sanjay K; Roy, Sujit; Choudhury, Swarup Roy; Sengupta, Dibyendu N

2010-07-21

The DNA repair and recombination (DRR) proteins protect organisms against genetic damage, caused by environmental agents and other genotoxic agents, by removal of DNA lesions or helping to abide them. We identified genes potentially involved in DRR mechanisms in Arabidopsis and rice using similarity searches and conserved domain analysis against proteins known to be involved in DRR in human, yeast and E. coli. As expected, many of DRR genes are very similar to those found in other eukaryotes. Beside these eukaryotes specific genes, several prokaryotes specific genes were also found to be well conserved in plants. In Arabidopsis, several functionally important DRR gene duplications are present, which do not occur in rice. Among DRR proteins, we found that proteins belonging to the nucleotide excision repair pathway were relatively more conserved than proteins needed for the other DRR pathways. Sub-cellular localization studies of DRR gene suggests that these proteins are mostly reside in nucleus while gene drain in between nucleus and cell organelles were also found in some cases. The similarities and dissimilarities in between plants and other organisms' DRR pathways are discussed. The observed differences broaden our knowledge about DRR in the plants world, and raises the potential question of whether differentiated functions have evolved in some cases. These results, altogether, provide a useful framework for further experimental studies in these organisms.
Big domains are novel Ca²+-binding modules: evidences from big domains of Leptospira immunoglobulin-like (Lig) proteins.

PubMed

Raman, Rajeev; Rajanikanth, V; Palaniappan, Raghavan U M; Lin, Yi-Pin; He, Hongxuan; McDonough, Sean P; Sharma, Yogendra; Chang, Yung-Fu

2010-12-29

Many bacterial surface exposed proteins mediate the host-pathogen interaction more effectively in the presence of Ca²+. Leptospiral immunoglobulin-like (Lig) proteins, LigA and LigB, are surface exposed proteins containing Bacterial immunoglobulin like (Big) domains. The function of proteins which contain Big fold is not known. Based on the possible similarities of immunoglobulin and βγ-crystallin folds, we here explore the important question whether Ca²+ binds to a Big domains, which would provide a novel functional role of the proteins containing Big fold. We selected six individual Big domains for this study (three from the conserved part of LigA and LigB, denoted as Lig A3, Lig A4, and LigBCon5; two from the variable region of LigA, i.e., 9(th) (Lig A9) and 10(th) repeats (Lig A10); and one from the variable region of LigB, i.e., LigBCen2. We have also studied the conserved region covering the three and six repeats (LigBCon1-3 and LigCon). All these proteins bind the calcium-mimic dye Stains-all. All the selected four domains bind Ca²+ with dissociation constants of 2-4 µM. Lig A9 and Lig A10 domains fold well with moderate thermal stability, have β-sheet conformation and form homodimers. Fluorescence spectra of Big domains show a specific doublet (at 317 and 330 nm), probably due to Trp interaction with a Phe residue. Equilibrium unfolding of selected Big domains is similar and follows a two-state model, suggesting the similarity in their fold. We demonstrate that the Lig are Ca²+-binding proteins, with Big domains harbouring the binding motif. We conclude that despite differences in sequence, a Big motif binds Ca²+. This work thus sets up a strong possibility for classifying the proteins containing Big domains as a novel family of Ca²+-binding proteins. Since Big domain is a part of many proteins in bacterial kingdom, we suggest a possible function these proteins via Ca²+ binding.
Big Domains Are Novel Ca2+-Binding Modules: Evidences from Big Domains of Leptospira Immunoglobulin-Like (Lig) Proteins

PubMed Central

Palaniappan, Raghavan U. M.; Lin, Yi-Pin; He, Hongxuan; McDonough, Sean P.; Sharma, Yogendra; Chang, Yung-Fu

2010-01-01

Background Many bacterial surface exposed proteins mediate the host-pathogen interaction more effectively in the presence of Ca2+. Leptospiral immunoglobulin-like (Lig) proteins, LigA and LigB, are surface exposed proteins containing Bacterial immunoglobulin like (Big) domains. The function of proteins which contain Big fold is not known. Based on the possible similarities of immunoglobulin and βγ-crystallin folds, we here explore the important question whether Ca2+ binds to a Big domains, which would provide a novel functional role of the proteins containing Big fold. Principal Findings We selected six individual Big domains for this study (three from the conserved part of LigA and LigB, denoted as Lig A3, Lig A4, and LigBCon5; two from the variable region of LigA, i.e., 9th (Lig A9) and 10th repeats (Lig A10); and one from the variable region of LigB, i.e., LigBCen2. We have also studied the conserved region covering the three and six repeats (LigBCon1-3 and LigCon). All these proteins bind the calcium-mimic dye Stains-all. All the selected four domains bind Ca2+ with dissociation constants of 2–4 µM. Lig A9 and Lig A10 domains fold well with moderate thermal stability, have β-sheet conformation and form homodimers. Fluorescence spectra of Big domains show a specific doublet (at 317 and 330 nm), probably due to Trp interaction with a Phe residue. Equilibrium unfolding of selected Big domains is similar and follows a two-state model, suggesting the similarity in their fold. Conclusions We demonstrate that the Lig are Ca2+-binding proteins, with Big domains harbouring the binding motif. We conclude that despite differences in sequence, a Big motif binds Ca2+. This work thus sets up a strong possibility for classifying the proteins containing Big domains as a novel family of Ca2+-binding proteins. Since Big domain is a part of many proteins in bacterial kingdom, we suggest a possible function these proteins via Ca2+ binding. PMID:21206924
Comparative interactomics: analysis of arabidopsis 14-3-3 complexes reveals highly conserved 14-3-3 interactions between humans and plants.

PubMed

Paul, Anna-Lisa; Liu, Li; McClung, Scott; Laughner, Beth; Chen, Sixue; Ferl, Robert J

2009-04-01

As a first step in the broad characterization of plant 14-3-3 multiprotein complexes in vivo, stringent and specific antibody affinity purification was used to capture 14-3-3s together with their interacting proteins from extracts of Arabidopsis cell suspension cultures. Approximately 120 proteins were identified as potential in vivo 14-3-3 interacting proteins by mass spectrometry of the recovered complexes. Comparison of the proteins in this data set with the 14-3-3 interacting proteins from a similar study in human embryonic kidney cell cultures revealed eight interacting proteins that likely represent reasonably abundant, fundamental 14-3-3 interaction complexes that are highly conserved across all eukaryotes. The Arabidopsis 14-3-3 interaction data set was also compared to a yeast in vivo 14-3-3 interaction data set. Four 14-3-3 interacting proteins are conserved in yeast, humans, and Arabidopsis. Comparisons of the data sets based on biochemical function revealed many additional similarities in the human and Arabidopsis data sets that represent conserved functional interactions, while also leaving many proteins uniquely identified in either Arabidopsis or human cells. In particular, the Arabidopsis interaction data set is enriched for proteins involved in metabolism.
Protein classification using probabilistic chain graphs and the Gene Ontology structure.

PubMed

Carroll, Steven; Pavlovic, Vladimir

2006-08-01

Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed. C/C++/Perl implementation is available from authors upon request.
Basonuclin 2 has a function in the multiplication of embryonic craniofacial mesenchymal cells and is orthologous to disco proteins

PubMed Central

Vanhoutteghem, Amandine; Maciejewski-Duval, Anna; Bouche, Cyril; Delhomme, Brigitte; Hervé, Françoise; Daubigney, Fabrice; Soubigou, Guillaume; Araki, Masatake; Araki, Kimi; Yamamura, Ken-ichi; Djian, Philippe

2009-01-01

Basonuclin 2 is a recently discovered zinc finger protein of unknown function. Its paralog, basonuclin 1, is associated with the ability of keratinocytes to multiply. The basonuclin zinc fingers are closely related to those of the Drosophila proteins disco and discorelated, but the relation between disco proteins and basonuclins has remained elusive because the function of the disco proteins in larval head development seems to have no relation to that of basonuclin 1 and because the amino acid sequence of disco, apart from the zinc fingers, also has no similarity to that of the basonuclins. We have generated mice lacking basonuclin 2. These mice die within 24 h of birth with a cleft palate and abnormalities of craniofacial bones and tongue. In the embryonic head, expression of the basonuclin 2 gene is restricted to mesenchymal cells in the palate, at the periphery of the tongue, and in the mesenchymal sheaths that surround the brain and the osteocartilagineous structures. In late embryos, the rate of multiplication of these mesenchymal cells is greatly diminished. Therefore, basonuclin 2 is essential for the multiplication of craniofacial mesenchymal cells during embryogenesis. Non-Drosophila insect databases available since 2008 reveal that the basonuclins and the disco proteins share much more extensive sequence and gene structure similarity than noted when only Drosophila sequences were examined. We conclude that basonuclin 2 is both structurally and functionally the vertebrate ortholog of the disco proteins. We also note the possibility that some human craniofacial abnormalities are due to a lack of basonuclin 2. PMID:19706529
Conserved properties of Drosophila Insomniac link sleep regulation and synaptic function.

PubMed

Li, Qiuling; Kellner, David A; Hatch, Hayden A M; Yumita, Tomohiro; Sanchez, Sandrine; Machold, Robert P; Frank, C Andrew; Stavropoulos, Nicholas

2017-05-01

Sleep is an ancient animal behavior that is regulated similarly in species ranging from flies to humans. Various genes that regulate sleep have been identified in invertebrates, but whether the functions of these genes are conserved in mammals remains poorly explored. Drosophila insomniac (inc) mutants exhibit severely shortened and fragmented sleep. Inc protein physically associates with the Cullin-3 (Cul3) ubiquitin ligase, and neuronal depletion of Inc or Cul3 strongly curtails sleep, suggesting that Inc is a Cul3 adaptor that directs the ubiquitination of neuronal substrates that impact sleep. Three proteins similar to Inc exist in vertebrates-KCTD2, KCTD5, and KCTD17-but are uncharacterized within the nervous system and their functional conservation with Inc has not been addressed. Here we show that Inc and its mouse orthologs exhibit striking biochemical and functional interchangeability within Cul3 complexes. Remarkably, KCTD2 and KCTD5 restore sleep to inc mutants, indicating that they can substitute for Inc in vivo and engage its neuronal targets relevant to sleep. Inc and its orthologs localize similarly within fly and mammalian neurons and can traffic to synapses, suggesting that their substrates may include synaptic proteins. Consistent with such a mechanism, inc mutants exhibit defects in synaptic structure and physiology, indicating that Inc is essential for both sleep and synaptic function. Our findings reveal that molecular functions of Inc are conserved through ~600 million years of evolution and support the hypothesis that Inc and its orthologs participate in an evolutionarily conserved ubiquitination pathway that links synaptic function and sleep regulation.

Two novel Mesocestoides vogae fatty acid binding proteins--functional and evolutionary implications.

PubMed

Alvite, Gabriela; Canclini, Lucía; Corvo, Ileana; Esteves, Adriana

2008-01-01

This work describes two new fatty acid binding proteins (FABPs) identified in the parasite platyhelminth Mesocestoides vogae (syn. corti). The corresponding polypeptide chains share 62% identical residues and overall 90% similarity according to CLUSTALX default conditions. Compared with Cestoda FABPs, these proteins share the highest similarity score with the Taenia solium protein. M. vogae FABPs are also phylogenetically related to the FABP3/FABP4 mammalian FABP subfamilies. The native proteins were purified by chromatographical procedures, and apparent molecular mass and isoelectric point were determined. Immunolocalization studies determined the localization of the expression of these proteins in the larval form of the parasite. The genomic exon-intron organization of both genes is also reported, and supports new insights on intron evolution. Consensus motifs involved in splicing were identified.
Beta-propellers: associated functions and their role in human diseases.

PubMed

Pons, Tirso; Gómez, Raú; Chinea, Glay; Valencia, Alfonso

2003-03-01

The beta-propeller fold appears as a very fascinating architecture based on four-stranded antiparallel and twisted beta-sheets, radially arranged around a central tunnel. Similar to the alpha/beta-barrel (TIM-barrel) fold, the beta-propeller has a wide range of different functions, and is gaining substantial attention. Some proteins containing beta-propeller domains have been implicated in the pathogenesis of a variety of diseases such as cancer, Alzheimer, Huntington, arthritis, familial hypercholesterolemia, retinitis pigmentosa, osteogenesis, hypertension, and microbial and viral infections. This article reviews some aspects of 3D structure, amino acids sequence regularities, and biological functions of the proteins containing beta-propeller domains. Major emphasis has been laid on beta-propellers whose functions are associated to human diseases. Recent research efforts reported in the fields of protein engineering, drug design, and protein structure-function relationship studies, concerning the beta-propeller architecture, have also been discussed.
Zebrafish Meis functions to stabilize Pbx proteins and regulate hindbrain patterning.

PubMed

Waskiewicz, A J; Rikhof, H A; Hernandez, R E; Moens, C B

2001-11-01

Homeodomain-containing Hox proteins regulate segmental identity in Drosophila in concert with two partners known as Extradenticle (Exd) and Homothorax (Hth). These partners are themselves DNA-binding, homeodomain proteins, and probably function by revealing the intrinsic specificity of Hox proteins. Vertebrate orthologs of Exd and Hth, known as Pbx and Meis (named for a myeloid ecotropic leukemia virus integration site), respectively, are encoded by multigene families and are present in multimeric complexes together with vertebrate Hox proteins. Previous results have demonstrated that the zygotically encoded Pbx4/Lazarus (Lzr) protein is required for segmentation of the zebrafish hindbrain and proper expression and function of Hox genes. We demonstrate that Meis functions in the same pathway as Pbx in zebrafish hindbrain development, as expression of a dominant-negative mutant Meis results in phenotypes that are remarkably similar to that of lzr mutants. Surprisingly, expression of Meis protein partially rescues the lzr(-) phenotype. Lzr protein levels are increased in embryos overexpressing Meis and are reduced for lzr mutants that cannot bind to Meis. This implies a mechanism whereby Meis rescues lzr mutants by stabilizing maternally encoded Lzr. Our results define two functions of Meis during zebrafish hindbrain segmentation: that of a DNA-binding partner of Pbx proteins, and that of a post-transcriptional regulator of Pbx protein levels.
Multilineage potential and proteomic profiling of human dental stem cells derived from a single donor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patil, Rajreddy; Kumar, B. Mohana; Lee, Won-Jae

Dental tissues provide an alternative autologous source of mesenchymal stem cells (MSCs) for regenerative medicine. In this study, we isolated human dental MSCs of follicle, pulp and papilla tissue from a single donor tooth after impacted third molar extraction by excluding the individual differences. We then compared the morphology, proliferation rate, expression of MSC-specific and pluripotency markers, and in vitro differentiation ability into osteoblasts, adipocytes, chondrocytes and functional hepatocyte-like cells (HLCs). Finally, we analyzed the protein expression profiles of undifferentiated dental MSCs using 2DE coupled with MALDI-TOF-MS. Three types of dental MSCs largely shared similar morphology, proliferation potential, expression ofmore » surface markers and pluripotent transcription factors, and differentiation ability into osteoblasts, adipocytes, and chondrocytes. Upon hepatogenic induction, all MSCs were transdifferentiated into functional HLCs, and acquired hepatocyte functions by showing their ability for glycogen storage and urea production. Based on the proteome profiling results, we identified nineteen proteins either found commonly or differentially expressed among the three types of dental MSCs. In conclusion, three kinds of dental MSCs from a single donor tooth possessed largely similar cellular properties and multilineage potential. Further, these dental MSCs had similar proteomic profiles, suggesting their interchangeable applications for basic research and call therapy. - Highlights: • Isolated and characterized three types of human dental MSCs from a single donor. • MSCs of dental follicle, pulp and papilla had largely similar biological properties. • All MSCs were capable of transdifferentiating into functional hepatocyte-like cells. • 2DE proteomics with MALDI-TOF/MS identified 19 proteins in three types of MSCs. • Similar proteomic profiles suggest interchangeable applications of dental MSCs.« less
Identification of functional modules using network topology and high-throughput data.

PubMed

Ulitsky, Igor; Shamir, Ron

2007-01-26

With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data. We describe a novel algorithmic framework for this challenge. We first transform the high-throughput data into similarity values, (e.g., by computing pairwise similarity of gene expression patterns from microarray data). Then, given a network of genes or proteins and similarity values between some of them, we seek connected sub-networks (or modules) that manifest high similarity. We develop algorithms for this problem and evaluate their performance on the osmotic shock response network in S. cerevisiae and on the human cell cycle network. We demonstrate that focused, biologically meaningful and relevant functional modules are obtained. In comparison with extant algorithms, our approach has higher sensitivity and higher specificity. We have demonstrated that our method can accurately identify functional modules. Hence, it carries the promise to be highly useful in analysis of high throughput data.
Arabidopsis leucine-rich repeat extensin (LRX) proteins modify cell wall composition and influence plant growth.

PubMed

Draeger, Christian; Ndinyanka Fabrice, Tohnyui; Gineau, Emilie; Mouille, Grégory; Kuhn, Benjamin M; Moller, Isabel; Abdou, Marie-Therese; Frey, Beat; Pauly, Markus; Bacic, Antony; Ringli, Christoph

2015-06-24

Leucine-rich repeat extensins (LRXs) are extracellular proteins consisting of an N-terminal leucine-rich repeat (LRR) domain and a C-terminal extensin domain containing the typical features of this class of structural hydroxyproline-rich glycoproteins (HRGPs). The LRR domain is likely to bind an interaction partner, whereas the extensin domain has an anchoring function to insolubilize the protein in the cell wall. Based on the analysis of the root hair-expressed LRX1 and LRX2 of Arabidopsis thaliana, LRX proteins are important for cell wall development. The importance of LRX proteins in non-root hair cells and on the structural changes induced by mutations in LRX genes remains elusive. The LRX gene family of Arabidopsis consists of eleven members, of which LRX3, LRX4, and LRX5 are expressed in aerial organs, such as leaves and stem. The importance of these LRX genes for plant development and particularly cell wall formation was investigated. Synergistic effects of mutations with gradually more severe growth retardation phenotypes in double and triple mutants suggest a similar function of the three genes. Analysis of cell wall composition revealed a number of changes to cell wall polysaccharides in the mutants. LRX3, LRX4, and LRX5, and most likely LRX proteins in general, are important for cell wall development. Due to the complexity of changes in cell wall structures in the lrx mutants, the exact function of LRX proteins remains to be determined. The increasingly strong growth-defect phenotypes in double and triple mutants suggests that the LRX proteins have similar functions and that they are important for proper plant development.
Evolution and Conservation of Plant NLR Functions

PubMed Central

Jacob, Florence; Vernaldi, Saskia; Maekawa, Takaki

2013-01-01

In plants and animals, nucleotide-binding domain and leucine-rich repeats (NLR)-containing proteins play pivotal roles in innate immunity. Despite their similar biological functions and protein architecture, comparative genome-wide analyses of NLRs and genes encoding NLR-like proteins suggest that plant and animal NLRs have independently arisen in evolution. Furthermore, the demonstration of interfamily transfer of plant NLR functions from their original species to phylogenetically distant species implies evolutionary conservation of the underlying immune principle across plant taxonomy. In this review we discuss plant NLR evolution and summarize recent insights into plant NLR-signaling mechanisms, which might constitute evolutionarily conserved NLR-mediated immune mechanisms. PMID:24093022
VASP-E: Specificity Annotation with a Volumetric Analysis of Electrostatic Isopotentials

PubMed Central

Chen, Brian Y.

2014-01-01

Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins. PMID:25166865
In silico peptide-binding predictions of passerine MHC class I reveal similarities across distantly related species, suggesting convergence on the level of protein function.

PubMed

Follin, Elna; Karlsson, Maria; Lundegaard, Claus; Nielsen, Morten; Wallin, Stefan; Paulsson, Kajsa; Westerdahl, Helena

2013-04-01

The major histocompatibility complex (MHC) genes are the most polymorphic genes found in the vertebrate genome, and they encode proteins that play an essential role in the adaptive immune response. Many songbirds (passerines) have been shown to have a large number of transcribed MHC class I genes compared to most mammals. To elucidate the reason for this large number of genes, we compared 14 MHC class I alleles (α1-α3 domains), from great reed warbler, house sparrow and tree sparrow, via phylogenetic analysis, homology modelling and in silico peptide-binding predictions to investigate their functional and genetic relationships. We found more pronounced clustering of the MHC class I allomorphs (allele specific proteins) in regards to their function (peptide-binding specificities) compared to their genetic relationships (amino acid sequences), indicating that the high number of alleles is of functional significance. The MHC class I allomorphs from house sparrow and tree sparrow, species that diverged 10 million years ago (MYA), had overlapping peptide-binding specificities, and these similarities across species were also confirmed in phylogenetic analyses based on amino acid sequences. Notably, there were also overlapping peptide-binding specificities in the allomorphs from house sparrow and great reed warbler, although these species diverged 30 MYA. This overlap was not found in a tree based on amino acid sequences. Our interpretation is that convergent evolution on the level of the protein function, possibly driven by selection from shared pathogens, has resulted in allomorphs with similar peptide-binding repertoires, although trans-species evolution in combination with gene conversion cannot be ruled out.
Finding Protein and Nucleotide Similarities with FASTA

PubMed Central

Pearson, William R.

2016-01-01

The FASTA programs provide a comprehensive set of rapid similarity searching tools ( fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local and global similarity searches ( ssearch36, ggsearch36) and for searching with short peptides and oligonucleotides ( fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity (Unit 3.5). The FASTA programs can produce “BLAST-like” alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases (Unit 9.4). The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. PMID:27010337
Finding Protein and Nucleotide Similarities with FASTA.

PubMed

Pearson, William R

2016-03-24

The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. Copyright © 2016 John Wiley & Sons, Inc.
Evolution, functions, and mysteries of plant ARGONAUTE proteins.

PubMed

Zhang, Han; Xia, Rui; Meyers, Blake C; Walbot, Virginia

2015-10-01

ARGONAUTE (AGO) proteins bind small RNAs (sRNAs) to form RNA-induced silencing complexes for transcriptional and post-transcriptional gene silencing. Genomes of primitive plants encode only a few AGO proteins. The Arabidopsis thaliana genome encodes ten AGO proteins, designated AGO1 to AGO10. Most early studies focused on these ten proteins and their interacting sRNAs. AGOs in other flowering plant species have duplicated and diverged from this set, presumably corresponding to new, diverged or specific functions. Among these, the grass-specific AGO18 family has been discovered and implicated as playing important roles during plant reproduction and viral defense. This review covers our current knowledge about functions and features of AGO proteins in both eudicots and monocots and compares their similarities and differences. On the basis of these features, we propose a new nomenclature for some plant AGOs. Copyright © 2015 Elsevier Ltd. All rights reserved.
Peroxisome biogenesis, protein targeting mechanisms and PEX gene functions in plants.

PubMed

Cross, Laura L; Ebeed, Heba Talat; Baker, Alison

2016-05-01

Peroxisomes play diverse and important roles in plants. The functions of peroxisomes are dependent upon their steady state protein composition which in turn reflects the balance of formation and turnover of the organelle. Protein import and turnover of constituent peroxisomal proteins are controlled by the state of cell growth and environment. The evolutionary origin of the peroxisome and the role of the endoplasmic reticulum in peroxisome biogenesis are discussed, as informed by studies of the trafficking of peroxisome membrane proteins. The process of matrix protein import in plants and its similarities and differences with peroxisomes in other organisms is presented and discussed in the context of peroxin distribution across the green plants. Copyright © 2015 Elsevier B.V. All rights reserved.
Functional and rheological properties of proteins in frozen turkey breast meat with different ultimate pH.

PubMed

Chan, J T Y; Omana, D A; Betti, M

2011-05-01

Functional and rheological properties of proteins from frozen turkey breast meat with different ultimate pH at 24 h postmortem (pH(24)) have been studied. Sixteen breast fillets from Hybrid Tom turkeys were initially selected based on lightness (L*) values for each color group (pale, normal, and dark), with a total of 48 breast fillets. Further selection of 8 breast samples was made within each class of meat according to the pH(24). The average L* and pH values of the samples were within the following range: pale (L* >52; pH ≤5.7), normal (46 < L* < 52; 5.9 < pH <6.1), and dark (L* <46; pH ≥6.3), referred to as low, normal, and high pH meat, respectively. Ultimate pH did not cause major changes in the emulsifying and foaming properties of the extracted sarcoplasmic and myofibrillar proteins. An SDS-PAGE profile of proteins from low and normal pH meat was similar, which revealed that the extent of protein denaturation was the same. Low pH meat had the lowest water-holding capacity compared with normal and high pH meat as shown by the increase in cooking loss, which can be explained by factors other than protein denaturation. Gel strength analysis and folding test revealed that gel-forming ability was better for high pH meat compared with low and normal pH meat.Dynamic viscoelastic behavior showed that myosin denaturation temperature was independent of pH(24). Normal and high pH meat had similar hardness, springiness, and chewiness values as revealed by texture profile analysis. The results from this study indicate that high pH meat had similar or better functional properties than normal pH meat. Therefore, high pH meat is suitable for further processed products, whereas low pH meat may need additional treatment or ingredient formulations to improve its functionality.
What's in the Gift? Towards a Molecular Dissection of Nuptial Feeding in a Cricket.

PubMed

Pauchet, Yannick; Wielsch, Natalie; Wilkinson, Paul A; Sakaluk, Scott K; Svatoš, Aleš; ffrench-Constant, Richard H; Hunt, John; Heckel, David G

2015-01-01

Nuptial gifts produced by males and transferred to females during copulation are common in insects. Yet, their precise composition and subsequent physiological effects on the female recipient remain unresolved. Male decorated crickets Gryllodes sigillatus transfer a spermatophore to the female during copulation that is composed of an edible gift, the spermatophylax, and the ampulla that contains the ejaculate. After transfer of the spermatophore, the female detaches the spermatophylax and starts to eat it while sperm from the ampulla are evacuated into the female reproductive tract. When the female has finished consuming the spermatophylax, she detaches the ampulla and terminates sperm transfer. Hence, one simple function of the spermatophylax is to ensure complete sperm transfer by distracting the female from prematurely removing the ampulla. However, the majority of orally active components of the spermatophylax itself and their subsequent effects on female behavior have not been identified. Here, we report the first analysis of the proteome of the G. sigillatus spermatophylax and the transcriptome of the male accessory glands that make these proteins. The accessory gland transcriptome was assembled into 17,691 transcripts whilst about 30 proteins were detected within the mature spermatophylax itself. Of these 30 proteins, 18 were encoded by accessory gland encoded messages. Most spermatophylax proteins show no similarity to proteins with known biological functions and are therefore largely novel. A spermatophylax protein shows similarity to protease inhibitors suggesting that it may protect the biologically active components from digestion within the gut of the female recipient. Another protein shares similarity with previously characterized insect polypeptide growth factors suggesting that it may play a role in altering female reproductive physiology concurrent with fertilization. Characterization of the spermatophylax proteome provides the first step in identifying the genes encoding these proteins in males and in understanding their biological functions in the female recipient.
What’s in the Gift? Towards a Molecular Dissection of Nuptial Feeding in a Cricket

PubMed Central

Pauchet, Yannick; Wielsch, Natalie; Wilkinson, Paul A.; Sakaluk, Scott K.; Svatoš, Aleš

2015-01-01

Nuptial gifts produced by males and transferred to females during copulation are common in insects. Yet, their precise composition and subsequent physiological effects on the female recipient remain unresolved. Male decorated crickets Gryllodes sigillatus transfer a spermatophore to the female during copulation that is composed of an edible gift, the spermatophylax, and the ampulla that contains the ejaculate. After transfer of the spermatophore, the female detaches the spermatophylax and starts to eat it while sperm from the ampulla are evacuated into the female reproductive tract. When the female has finished consuming the spermatophylax, she detaches the ampulla and terminates sperm transfer. Hence, one simple function of the spermatophylax is to ensure complete sperm transfer by distracting the female from prematurely removing the ampulla. However, the majority of orally active components of the spermatophylax itself and their subsequent effects on female behavior have not been identified. Here, we report the first analysis of the proteome of the G. sigillatus spermatophylax and the transcriptome of the male accessory glands that make these proteins. The accessory gland transcriptome was assembled into 17,691 transcripts whilst about 30 proteins were detected within the mature spermatophylax itself. Of these 30 proteins, 18 were encoded by accessory gland encoded messages. Most spermatophylax proteins show no similarity to proteins with known biological functions and are therefore largely novel. A spermatophylax protein shows similarity to protease inhibitors suggesting that it may protect the biologically active components from digestion within the gut of the female recipient. Another protein shares similarity with previously characterized insect polypeptide growth factors suggesting that it may play a role in altering female reproductive physiology concurrent with fertilization. Characterization of the spermatophylax proteome provides the first step in identifying the genes encoding these proteins in males and in understanding their biological functions in the female recipient. PMID:26439494
Mechanisms of EHD/RME-1 Protein Function in Endocytic Transport

PubMed Central

Grant, Barth D.; Caplan, Steve

2009-01-01

The evolutionarily conserved Eps15 homology domain (EHD)/receptor-mediated endocytosis (RME)-1 family of C-terminal EH domain proteins has recently come under intense scrutiny because of its importance in intracellular membrane transport, especially with regard to the recycling of receptors from endosomes to the plasma membrane. Recent studies have shed new light on the mode by which these adenosine triphosphatases function on endosomal membranes in mammals and Caenorhabditis elegans. This review highlights our current understanding of the physiological roles of these proteins in vivo, discussing conserved features as well as emerging functional differences between individual mammalian paralogs. In addition, these findings are discussed in light of the identification of novel EHD/RME-1 protein and lipid interactions and new structural data for proteins in this family, indicating intriguing similarities to the Dynamin superfamily of large guanosine triphosphatases. PMID:18801062
Molecular switch-like regulation in motor proteins.

PubMed

Tafoya, Sara; Bustamante, Carlos

2018-06-19

Motor proteins are powered by nucleotide hydrolysis and exert mechanical work to carry out many fundamental biological tasks. To ensure their correct and efficient performance, the motors' activities are allosterically regulated by additional factors that enhance or suppress their NTPase activity. Here, we review two highly conserved mechanisms of ATP hydrolysis activation and repression operating in motor proteins-the glutamate switch and the arginine finger-and their associated regulatory factors. We examine the implications of these regulatory mechanisms in proteins that are formed by multiple ATPase subunits. We argue that the regulatory mechanisms employed by motor proteins display features similar to those described in small GTPases, which require external regulatory elements, such as dissociation inhibitors, exchange factors and activating proteins, to switch the protein's function 'on' and 'off'. Likewise, similar regulatory roles are taken on by the motor's substrate, additional binding factors, and even adjacent subunits in multimeric complexes. However, in motor proteins, more than one regulatory factor and the two mechanisms described here often underlie the machine's operation. Furthermore, ATPase regulation takes place throughout the motor's cycle, which enables a more complex function than the binary 'active' and 'inactive' states.This article is part of a discussion meeting issue 'Allostery and molecular machines'. © 2018 The Author(s).
Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR).

PubMed

Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J; Laclette, Juan P; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

2015-05-19

Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.
Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR)

PubMed Central

Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J.; Laclette, Juan P.; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

2015-01-01

Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest. PMID:25989346

Functional properties of protein from frozen mantle and fin of jumbo squid Dosidicus gigas in function of pH and ionic strength.

PubMed

Rocha-Estrada, J G; Córdova-Murueta, J H; García-Carreño, F L

2010-10-01

Functional properties of protein from mantle and fin of the jumbo squid Dosidicus gigas were explained based on microscopic muscle fiber and protein fractions profiles as observed in SDS-PAGE. Fin has higher content of connective tissue and complex fiber arrangement, and we observed higher hardness of fin gels as expected. Myosin heavy chain (MHC) was found in sarcoplasmic, myofibril and soluble-in-alkali fractions of mantle and only in sarcoplasmic and soluble-in-alkali fractions of fin. An additive effect of salt concentration and pH affected the solubility and foaming properties. Fin and mantle proteins yielded similar results in solubility tests, but significant differences occurred for specific pH and concentrations of salt. Foaming capacity was proportional to solubility; foam stability was also affected by pH and salt concentration. Hardness and fracture strength of fin gels were significantly higher than mantle gels; gels from proteins of both tissues reached the highest level in the folding test. Structural and molecular properties, such as MHC and paramyosin solubility, arrangement of muscle fibers and the content of connective tissue were useful to explain the differences observed in these protein properties. High-strength gels can be formed from squid mantle or fin muscle. Fin displayed similar or better properties than mantle in all tests.
Cube - an online tool for comparison and contrasting of protein sequences.

PubMed

Zhang, Zong Hong; Khoo, Aik Aun; Mihalek, Ivana

2013-01-01

When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection. http://eopsf.org/cube.
Interactome INSIDER: a structural interactome browser for genomic studies.

PubMed

Meyer, Michael J; Beltrán, Juan Felipe; Liang, Siqi; Fragoza, Robert; Rumack, Aaron; Liang, Jin; Wei, Xiaomu; Yu, Haiyuan

2018-01-01

We present Interactome INSIDER, a tool to link genomic variant information with structural protein-protein interactomes. Underlying this tool is the application of machine learning to predict protein interaction interfaces for 185,957 protein interactions with previously unresolved interfaces in human and seven model organisms, including the entire experimentally determined human binary interactome. Predicted interfaces exhibit functional properties similar to those of known interfaces, including enrichment for disease mutations and recurrent cancer mutations. Through 2,164 de novo mutagenesis experiments, we show that mutations of predicted and known interface residues disrupt interactions at a similar rate and much more frequently than mutations outside of predicted interfaces. To spur functional genomic studies, Interactome INSIDER (http://interactomeinsider.yulab.org) enables users to identify whether variants or disease mutations are enriched in known and predicted interaction interfaces at various resolutions. Users may explore known population variants, disease mutations, and somatic cancer mutations, or they may upload their own set of mutations for this purpose.
Characterization of a Smad motif similar to Drosophila mad in the mouse Msx 1 promoter.

PubMed

Alvarez Martinez, Cristina E; Binato, Renata; Gonzalez, Sayonara; Pereira, Monica; Robert, Benoit; Abdelhay, Eliana

2002-03-01

Mouse Msx 1 gene, orthologous of the Drosophila msh, is involved in several developmental processes. BMP family members are major proteins in the regulation of Msx 1 expression. BMP signaling activates Smad 1/5/8 proteins, which associate to Smad 4 before translocating to the nucleus. Analysis of Msx 1 promoter revealed the presence of three elements similar to the consensus established for Mad, the Smad 1 Drosophila counterpart. Notably, such an element was identified in an enhancer important for Msx 1 regulation. Gel shift analysis demonstrated that proteins from 13.5 dpc embryo associate to this enhancer. Remarkably, supershift assays showed that Smad proteins are present in the complex. Purified Smad 1 and 4 also bind to this fragment. We demonstrate that functional binding sites in this enhancer are confined to the Mad motif and flanking region. Our data suggest that this Mad motif may be functional in response to BMP signaling. ©2002 Elsevier Science (USA).
Functional classification of protein structures by local structure matching in graph representation.

PubMed

Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo

2018-03-31

As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction

PubMed Central

Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J.

2018-01-01

It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future. PMID:29538331
The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

PubMed

Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J

2018-03-14

It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.
Yeast syntaxins Sso1p and Sso2p belong to a family of related membrane proteins that function in vesicular transport.

PubMed Central

Aalto, M K; Ronne, H; Keränen, S

1993-01-01

The yeast SEC1 gene encodes a hydrophilic protein that functions at the terminal stage in secretion. We have cloned two yeast genes, SSO1 and SSO2, which in high copy number can suppress sec1 mutations and also mutations in several other late acting SEC genes, such as SEC3, SEC5, SEC9 and SEC15. SSO1 and SSO2 encode small proteins with N-terminal hydrophilic domains and C-terminal hydrophobic tails. The two proteins are 72% identical in sequence and together perform an essential function late in secretion. Sso1p and Sso2p show significant sequence similarity to six other proteins. Two of these, Sed5p and Pep12p, are yeast proteins that function in transport from ER to Golgi and from Golgi to the vacuole, respectively. Also related to Sso1p and Sso2p are three mammalian proteins: epimorphin, syntaxin A/HPC-1 and syntaxin B. A nematode cDNA product also belongs to the new protein family. The new protein family is thus present in a wide variety of eukaryotic cells, where its members function at different stages in vesicular transport. Images PMID:8223426
RNA aptamers that functionally interact with green fluorescent protein and its derivatives

PubMed Central

Shui, Bo; Ozer, Abdullah; Zipfel, Warren; Sahu, Nevedita; Singh, Avtar; Lis, John T.; Shi, Hua; Kotlikoff, Michael I.

2012-01-01

Green Fluorescent Protein (GFP) and related fluorescent proteins (FPs) have been widely used to tag proteins, allowing their expression and subcellular localization to be examined in real time in living cells and animals. Similar fluorescent methods are highly desirable to detect and track RNA and other biological molecules in living cells. For this purpose, we have developed a group of RNA aptamers that bind GFP and related proteins, which we term Fluorescent Protein-Binding Aptamers (FPBA). These aptamers bind GFP, YFP and CFP with low nanomolar affinity and binding decreases GFP fluorescence, whereas slightly augmenting YFP and CFP brightness. Aptamer binding results in an increase in the pKa of EGFP, decreasing the 475 nm excited green fluorescence at a given pH. We report the secondary structure of FPBA and the ability to synthesize functional multivalent dendrimers. FPBA expressed in live cells decreased GFP fluorescence in a valency-dependent manner, indicating that the RNA aptamers function within cells. The development of aptamers that bind fluorescent proteins with high affinity and alter their function, markedly expands their use in the study of biological pathways. PMID:22189104
Functional microdomains in bacterial membranes.

PubMed

López, Daniel; Kolter, Roberto

2010-09-01

The membranes of eukaryotic cells harbor microdomains known as lipid rafts that contain a variety of signaling and transport proteins. Here we show that bacterial membranes contain microdomains functionally similar to those of eukaryotic cells. These membrane microdomains from diverse bacteria harbor homologs of Flotillin-1, a eukaryotic protein found exclusively in lipid rafts, along with proteins involved in signaling and transport. Inhibition of lipid raft formation through the action of zaragozic acid--a known inhibitor of squalene synthases--impaired biofilm formation and protein secretion but not cell viability. The orchestration of physiological processes in microdomains may be a more widespread feature of membranes than previously appreciated.
Transcriptional Activity and Nuclear Localization of Cabut, the Drosophila Ortholog of Vertebrate TGF-β-Inducible Early-Response Gene (TIEG) Proteins

PubMed Central

Belacortu, Yaiza; Weiss, Ron; Kadener, Sebastian; Paricio, Nuria

2012-01-01

Background Cabut (Cbt) is a C2H2-class zinc finger transcription factor involved in embryonic dorsal closure, epithelial regeneration and other developmental processes in Drosophila melanogaster. Cbt orthologs have been identified in other Drosophila species and insects as well as in vertebrates. Indeed, Cbt is the Drosophila ortholog of the group of vertebrate proteins encoded by the TGF-ß-inducible early-response genes (TIEGs), which belong to Sp1-like/Krüppel-like family of transcription factors. Several functional domains involved in transcriptional control and subcellular localization have been identified in the vertebrate TIEGs. However, little is known of whether these domains and functions are also conserved in the Cbt protein. Methodology/Principal Findings To determine the transcriptional regulatory activity of the Drosophila Cbt protein, we performed Gal4-based luciferase assays in S2 cells and showed that Cbt is a transcriptional repressor and able to regulate its own expression. Truncated forms of Cbt were then generated to identify its functional domains. This analysis revealed a sequence similar to the mSin3A-interacting repressor domain found in vertebrate TIEGs, although located in a different part of the Cbt protein. Using β-Galactosidase and eGFP fusion proteins, we also showed that Cbt contains the bipartite nuclear localization signal (NLS) previously identified in TIEG proteins, although it is non-functional in insect cells. Instead, a monopartite NLS, located at the amino terminus of the protein and conserved across insects, is functional in Drosophila S2 and Spodoptera exigua Sec301 cells. Last but not least, genetic interaction and immunohistochemical assays suggested that Cbt nuclear import is mediated by Importin-α2. Conclusions/Significance Our results constitute the first characterization of the molecular mechanisms of Cbt-mediated transcriptional control as well as of Cbt nuclear import, and demonstrate the existence of similarities and differences in both aspects of Cbt function between the insect and the vertebrate TIEG proteins. PMID:22359651
Systematic Differences in Signal Emitting and Receiving Revealed by PageRank Analysis of a Human Protein Interactome

PubMed Central

Li, Xiu-Qing

2012-01-01

Most protein PageRank studies do not use signal flow direction information in protein interactions because this information was not readily available in large protein databases until recently. Therefore, four questions have yet to be answered: A) What is the general difference between signal emitting and receiving in a protein interactome? B) Which proteins are among the top ranked in directional ranking? C) Are high ranked proteins more evolutionarily conserved than low ranked ones? D) Do proteins with similar ranking tend to have similar subcellular locations? In this study, we address these questions using the forward, reverse, and non-directional PageRank approaches to rank an information-directional network of human proteins and study their evolutionary conservation. The forward ranking gives credit to information receivers, reverse ranking to information emitters, and non-directional ranking mainly to the number of interactions. The protein lists generated by the forward and non-directional rankings are highly correlated, but those by the reverse and non-directional rankings are not. The results suggest that the signal emitting/receiving system is characterized by key-emittings and relatively even receivings in the human protein interactome. Signaling pathway proteins are frequent in top ranked ones. Eight proteins are both informational top emitters and top receivers. Top ranked proteins, except a few species-related novel-function ones, are evolutionarily well conserved. Protein-subunit ranking position reflects subunit function. These results demonstrate the usefulness of different PageRank approaches in characterizing protein networks and provide insights to protein interaction in the cell. PMID:23028653
Chlamydia trachomatis protein CT009 is a structural and functional homolog to the key morphogenesis component RodZ and interacts with division septal plane localized MreB

DOE PAGES

Kemege, Kyle E.; Hickey, John M.; Barta, Michael L.; ...

2014-11-10

Cell division in Chlamydiae is poorly understood as apparent homologs to most conserved bacterial cell division proteins are lacking and presence of elongation (rod shape) associated proteins indicate non-canonical mechanisms may be employed. The rod-shape determining protein MreB has been proposed as playing a unique role in chlamydial cell division. In other organisms, MreB is part of an elongation complex that requires RodZ for proper function. A recent study reported that the protein encoded by ORF CT009 interacts with MreB despite low sequence similarity to RodZ. The studies in this paper expand on those observations through protein structure, mutagenesis andmore » cellular localization analyses. Structural analysis indicated that CT009 shares high level of structural similarity to RodZ, revealing the conserved orientation of two residues critical for MreB interaction. Substitutions eliminated MreB protein interaction and partial complementation provided by CT009 in RodZ deficient Escherichia coli. Cellular localization analysis of CT009 showed uniform membrane staining in Chlamydia. This was in contrast to the localization of MreB, which was restricted to predicted septal planes. Finally, MreB localization to septal planes provides direct experimental observation for the role of MreB in cell division and supports the hypothesis that it serves as a functional replacement for FtsZ in Chlamydia.« less
Chlamydia trachomatis protein CT009 is a structural and functional homolog to the key morphogenesis component RodZ and interacts with division septal plane localized MreB

PubMed Central

Kemege, Kyle E.; Hickey, John M.; Barta, Michael L.; Wickstrum, Jason; Balwalli, Namita; Lovell, Scott; Battaile, Kevin P.; Hefty, P. Scott

2015-01-01

Summary Cell division in Chlamydiae is poorly understood as apparent homologs to most conserved bacterial cell division proteins are lacking and presence of elongation (rod shape) associated proteins indicate non-canonical mechanisms may be employed. The rod-shape determining protein MreB has been proposed as playing a unique role in chlamydial cell division. In other organisms, MreB is part of an elongation complex that requires RodZ for proper function. A recent study reported that the protein encoded by ORF CT009 interacts with MreB despite low sequence similarity to RodZ. The studies herein expand on those observations through protein structure, mutagenesis, and cellular localization analyses. Structural analysis indicated that CT009 shares high level of structural similarity to RodZ, revealing the conserved orientation of two residues critical for MreB interaction. Substitutions eliminated MreB protein interaction and partial complementation provided by CT009 in RodZ deficient E. coli. Cellular localization analysis of CT009 showed uniform membrane staining in Chlamydia. This was in contrast to the localization of MreB, which was restricted to predicted septal planes. MreB localization to septal planes provides direct experimental observation for the role of MreB in cell division and supports the hypothesis that it serves as a functional replacement for FtsZ in Chlamydia. PMID:25382739
Chlamydia trachomatis protein CT009 is a structural and functional homolog to the key morphogenesis component RodZ and interacts with division septal plane localized MreB.

PubMed

Kemege, Kyle E; Hickey, John M; Barta, Michael L; Wickstrum, Jason; Balwalli, Namita; Lovell, Scott; Battaile, Kevin P; Hefty, P Scott

2015-02-01

Cell division in Chlamydiae is poorly understood as apparent homologs to most conserved bacterial cell division proteins are lacking and presence of elongation (rod shape) associated proteins indicate non-canonical mechanisms may be employed. The rod-shape determining protein MreB has been proposed as playing a unique role in chlamydial cell division. In other organisms, MreB is part of an elongation complex that requires RodZ for proper function. A recent study reported that the protein encoded by ORF CT009 interacts with MreB despite low sequence similarity to RodZ. The studies herein expand on those observations through protein structure, mutagenesis and cellular localization analyses. Structural analysis indicated that CT009 shares high level of structural similarity to RodZ, revealing the conserved orientation of two residues critical for MreB interaction. Substitutions eliminated MreB protein interaction and partial complementation provided by CT009 in RodZ deficient Escherichia coli. Cellular localization analysis of CT009 showed uniform membrane staining in Chlamydia. This was in contrast to the localization of MreB, which was restricted to predicted septal planes. MreB localization to septal planes provides direct experimental observation for the role of MreB in cell division and supports the hypothesis that it serves as a functional replacement for FtsZ in Chlamydia. © 2014 John Wiley & Sons Ltd.
Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation.

PubMed

Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias

2014-08-01

Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. © 2014 Wiley Periodicals, Inc.
Physiological functions of MTA family of proteins.

PubMed

Sen, Nirmalya; Gui, Bin; Kumar, Rakesh

2014-12-01

Although the functional significance of the metastasic tumor antigen (MTA) family of chromatin remodeling proteins in the pathobiology of cancer is fairly well recognized, the physiological role of MTA proteins continues to be an understudied research area and is just beginning to be recognized. Similar to cancer cells, MTA1 also modulates the expression of target genes in normal cells either by acting as a corepressor or coactivator. In addition, physiological functions of MTA proteins are likely to be influenced by its differential expression, subcellular localization, and regulation by upstream modulators and extracellular signals. This review summarizes our current understanding of the physiological functions of the MTA proteins in model systems. In particular, we highlight recent advances of the role MTA proteins play in the brain, eye, circadian rhythm, mammary gland biology, spermatogenesis, liver, immunomodulation and inflammation, cellular radio-sensitivity, and hematopoiesis and differentiation. Based on the growth of knowledge regarding the exciting new facets of the MTA family of proteins in biology and medicine, we speculate that the next burst of findings in this field may reveal further molecular regulatory insights of non-redundant functions of MTA coregulators in the normal physiology as well as in pathological conditions outside cancer.
Solution NMR structure of hypothetical protein CV_2116 encoded by a viral prophage element in Chromobacterium violaceum.

PubMed

Yang, Yunhuang; Ramelot, Theresa A; Cort, John R; Garcia, Maite; Yee, Adelinda; Arrowsmith, Cheryl H; Kennedy, Michael A

2012-01-01

CV_2116 is a small hypothetical protein of 82 amino acids from the Gram-negative coccobacillus Chromobacterium violaceum. A PSI-BLAST search using the CV_2116 sequence as a query identified only one hit (E = 2e(-07)) corresponding to a hypothetical protein OR16_04617 from Cupriavidus basilensis OR16, which failed to provide insight into the function of CV_2116. The CV_2116 gene was cloned into the p15TvLic expression plasmid, transformed into E. coli, and (13)C- and (15)N-labeled NMR samples of CV_2116 were overexpressed in E. coli and purified for structure determination using NMR spectroscopy. The resulting high-quality solution NMR structure of CV_2116 revealed a novel α + β fold containing two anti-parallel β-sheets in the N-terminal two-thirds of the protein and one α-helix in the C-terminal third of the protein. CV_2116 does not belong to any known protein sequence family and a Dali search indicated that no similar structures exist in the protein data bank. Although no function of CV_2116 could be derived from either sequence or structural similarity searches, the neighboring genes of CV_2116 encode various proteins annotated as similar to bacteriophage tail assembly proteins. Interestingly, C. violaceum exhibits an extensive network of bacteriophage tail-like structures that likely result from lateral gene transfer by incorporation of viral DNA into its genome (prophages) due to bacteriophage infection. Indeed, C. violaceum has been shown to contain four prophage elements and CV_2116 resides in the fourth of these elements. Analysis of the putative operon in which CV_2116 resides indicates that CV_2116 might be a component of the bacteriophage tail-like assembly that occurs in C. violaceum.
Structural and functional effects of nucleotide variation on the human TB drug metabolizing enzyme arylamine N-acetyltransferase 1.

PubMed

Cloete, Ruben; Akurugu, Wisdom A; Werely, Cedric J; van Helden, Paul D; Christoffels, Alan

2017-08-01

The human arylamine N-acetyltransferase 1 (NAT1) enzyme plays a vital role in determining the duration of action of amine-containing drugs such as para-aminobenzoic acid (PABA) by influencing the balance between detoxification and metabolic activation of these drugs. Recently, four novel single nucleotide polymorphisms (SNPs) were identified within a South African mixed ancestry population. Modeling the effects of these SNPs within the structural protein was done to assess possible structure and function changes in the enzyme. The use of molecular dynamics simulations and stability predictions indicated less thermodynamically stable protein structures containing E264K and V231G, while the N245I change showed a stabilizing effect. Coincidently the N245I change displayed a similar free energy landscape profile to the known R64W amino acid substitution (slow acetylator), while the R242M displayed a similar profile to the published variant, I263V (proposed fast acetylator), and the wild type protein structure. Similarly, principal component analysis indicated that two amino acid substitutions (E264K and V231G) occupied less conformational clusters of folded states as compared to the WT and were found to be destabilizing (may affect protein function). However, two of the four novel SNPs that result in amino acid changes: (V231G and N245I) were predicted by both SIFT and POLYPHEN-2 algorithms to affect NAT1 protein function, while two other SNPs that result in R242M and E264K substitutions showed contradictory results based on SIFT and POLYPHEN-2 analysis. In conclusion, the structural methods were able to verify that two non-synonymous substitutions (E264K and V231G) can destabilize the protein structure, and are in agreement with mCSM predictions, and should therefore be experimentally tested for NAT1 activity. These findings could inform a strategy of incorporating genotypic data (i.e., functional SNP alleles) with phenotypic information (slow or fast acetylator) to better prescribe effective treatment using drugs metabolized by NAT1. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Genome-Wide Protein Interaction Screens Reveal Functional Networks Involving Sm-Like Proteins

PubMed Central

Fromont-Racine, Micheline; Mayes, Andrew E.; Brunet-Simon, Adeline; Rain, Jean-Christophe; Colley, Alan; Dix, Ian; Decourty, Laurence; Joly, Nicolas; Ricard, Florence; Beggs, Jean D.

2000-01-01

A set of seven structurally related Sm proteins forms the core of the snRNP particles containing the spliceosomal U1, U2, U4 and U5 snRNAs. A search of the genomic sequence of Saccharomyces cerevisiae has identified a number of open reading frames that potentially encode structurally similar proteins termed Lsm (Like Sm) proteins. With the aim of analysing all possible interactions between the Lsm proteins and any protein encoded in the yeast genome, we performed exhaustive and iterative genomic two-hybrid screens, starting with the Lsm proteins as baits. Indeed, extensive interactions amongst eight Lsm proteins were found that suggest the existence of a Lsm complex or complexes. These Lsm interactions apparently involve the conserved Sm domain that also mediates interactions between the Sm proteins. The screens also reveal functionally significant interactions with splicing factors, in particular with Prp4 and Prp24, compatible with genetic studies and with the reported association of Lsm proteins with spliceosomal U6 and U4/U6 particles. In addition, interactions with proteins involved in mRNA turnover, such as Mrt1, Dcp1, Dcp2 and Xrn1, point to roles for Lsm complexes in distinct RNA metabolic processes, that are confirmed in independent functional studies. These results provide compelling evidence that two-hybrid screens yield functionally meaningful information about protein–protein interactions and can suggest functions for uncharacterized proteins, especially when they are performed on a genome-wide scale. PMID:10900456

Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

PubMed

Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

2017-01-01

The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.
LC3/GABARAP family proteins: autophagy-(un)related functions.

PubMed

Schaaf, Marco B E; Keulers, Tom G; Vooijs, Marc A; Rouschop, Kasper M A

2016-12-01

From yeast to mammals, autophagy is an important mechanism for sustaining cellular homeostasis through facilitating the degradation and recycling of aged and cytotoxic components. During autophagy, cargo is captured in double-membraned vesicles, the autophagosomes, and degraded through lysosomal fusion. In yeast, autophagy initiation, cargo recognition, cargo engulfment, and vesicle closure is Atg8 dependent. In higher eukaryotes, Atg8 has evolved into the LC3/GABARAP protein family, consisting of 7 family proteins [LC3A (2 splice variants), LC3B, LC3C, GABARAP, GABARAPL1, and GABARAPL2]. LC3B, the most studied family protein, is associated with autophagosome development and maturation and is used to monitor autophagic activity. Given the high homology, the other LC3/GABARAP family proteins are often presumed to fulfill similar functions. Nevertheless, substantial evidence shows that the LC3/GABARAP family proteins are unique in function and important in autophagy-independent mechanisms. In this review, we discuss the current knowledge and functions of the LC3/GABARAP family proteins. We focus on processing of the individual family proteins and their role in autophagy initiation, cargo recognition, vesicle closure, and trafficking, a complex and tightly regulated process that requires selective presentation and recruitment of these family proteins. In addition, functions unrelated to autophagy of the LC3/GABARAP protein family members are discussed.-Schaaf, M. B. E., Keulers, T. G, Vooijs, M. A., Rouschop, K. M. A. LC3/GABARAP family proteins: autophagy-(un)related functions. © FASEB.
Characterization of a translation inhibitory protein from Luffa aegyptiaca.

PubMed

Ramakrishnan, S; Enghlid, J J; Bryant, H L; Xu, F J

1989-04-28

A protein with a molecular weight of about 30,000 was purified from the seeds of Luffa aegyptiaca. This protein inhibited cell free translation at pM concentrations. In spite of functional similarity to other ribosomal inhibitory proteins, the NH2-terminal analysis did not show any significant homology. Competitive inhibition studies indicate no immunological crossreactivity between the inhibitory protein from Luffa aegyptiaca, pokeweed antiviral protein (PAP) and recombinant ricin A chain. Chemical linkage of the protein to a monoclonal antibody reactive to transferrin receptor resulted in a highly cytotoxic conjugate.
Structural basis for host membrane remodeling induced by protein 2B of hepatitis A virus.

PubMed

Vives-Adrián, Laia; Garriga, Damià; Buxaderas, Mònica; Fraga, Joana; Pereira, Pedro José Barbosa; Macedo-Ribeiro, Sandra; Verdaguer, Núria

2015-04-01

The complexity of viral RNA synthesis and the numerous participating factors require a mechanism to topologically coordinate and concentrate these multiple viral and cellular components, ensuring a concerted function. Similarly to all other positive-strand RNA viruses, picornaviruses induce rearrangements of host intracellular membranes to create structures that act as functional scaffolds for genome replication. The membrane-targeting proteins 2B and 2C, their precursor 2BC, and protein 3A appear to be primarily involved in membrane remodeling. Little is known about the structure of these proteins and the mechanisms by which they induce massive membrane remodeling. Here we report the crystal structure of the soluble region of hepatitis A virus (HAV) protein 2B, consisting of two domains: a C-terminal helical bundle preceded by an N-terminally curved five-stranded antiparallel β-sheet that displays striking structural similarity to the β-barrel domain of enteroviral 2A proteins. Moreover, the helicoidal arrangement of the protein molecules in the crystal provides a model for 2B-induced host membrane remodeling during HAV infection. No structural information is currently available for the 2B protein of any picornavirus despite it being involved in a critical process in viral factory formation: the rearrangement of host intracellular membranes. Here we present the structure of the soluble domain of the 2B protein of hepatitis A virus (HAV). Its arrangement, both in crystals and in solution under physiological conditions, can help to understand its function and sheds some light on the membrane rearrangement process, a putative target of future antiviral drugs. Moreover, this first structure of a picornaviral 2B protein also unveils a closer evolutionary relationship between the hepatovirus and enterovirus genera within the Picornaviridae family. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Structural Basis for Host Membrane Remodeling Induced by Protein 2B of Hepatitis A Virus

PubMed Central

Vives-Adrián, Laia; Garriga, Damià; Buxaderas, Mònica; Fraga, Joana; Pereira, Pedro José Barbosa

2015-01-01

ABSTRACT The complexity of viral RNA synthesis and the numerous participating factors require a mechanism to topologically coordinate and concentrate these multiple viral and cellular components, ensuring a concerted function. Similarly to all other positive-strand RNA viruses, picornaviruses induce rearrangements of host intracellular membranes to create structures that act as functional scaffolds for genome replication. The membrane-targeting proteins 2B and 2C, their precursor 2BC, and protein 3A appear to be primarily involved in membrane remodeling. Little is known about the structure of these proteins and the mechanisms by which they induce massive membrane remodeling. Here we report the crystal structure of the soluble region of hepatitis A virus (HAV) protein 2B, consisting of two domains: a C-terminal helical bundle preceded by an N-terminally curved five-stranded antiparallel β-sheet that displays striking structural similarity to the β-barrel domain of enteroviral 2A proteins. Moreover, the helicoidal arrangement of the protein molecules in the crystal provides a model for 2B-induced host membrane remodeling during HAV infection. IMPORTANCE No structural information is currently available for the 2B protein of any picornavirus despite it being involved in a critical process in viral factory formation: the rearrangement of host intracellular membranes. Here we present the structure of the soluble domain of the 2B protein of hepatitis A virus (HAV). Its arrangement, both in crystals and in solution under physiological conditions, can help to understand its function and sheds some light on the membrane rearrangement process, a putative target of future antiviral drugs. Moreover, this first structure of a picornaviral 2B protein also unveils a closer evolutionary relationship between the hepatovirus and enterovirus genera within the Picornaviridae family. PMID:25589659
Sel1-like repeat proteins in signal transduction.

PubMed

Mittl, Peer R E; Schneider-Brachert, Wulf

2007-01-01

Solenoid proteins, which are distinguished from general globular proteins by their modular architectures, are frequently involved in signal transduction pathways. Proteins from the tetratricopeptide repeat (TPR) and Sel1-like repeat (SLR) families share similar alpha-helical conformations but different consensus sequence lengths and superhelical topologies. Both families are characterized by low sequence similarity levels, rendering the identification of functional homologous difficult. Therefore current knowledge of the molecular and cellular functions of the SLR proteins Sel1, Hrd3, Chs4, Nif1, PodJ, ExoR, AlgK, HcpA, Hsp12, EnhC, LpnE, MotX, and MerG has been reviewed. Although SLR proteins possess different cellular functions they all seem to serve as adaptor proteins for the assembly of macromolecular complexes. Sel1, Hrd3, Hsp12 and LpnE are activated under cellular stress. The eukaryotic Sel1 and Hrd3 proteins are involved in the ER-associated protein degradation, whereas the bacterial LpnE, EnhC, HcpA, ExoR, and AlgK proteins mediate the interactions between bacterial and eukaryotic host cells. LpnE and EnhC are responsible for the entry of L. pneumophila into epithelial cells and macrophages. ExoR from the symbiotic microorganism S. melioti and AlgK from the pathogen P. aeruginosa regulate exopolysaccaride synthesis. Nif1 and Chs4 from yeast are responsible for the regulation of mitosis and septum formation during cell division, respectively, and PodJ guides the cellular differentiation during the cell cycle of the bacterium C. crescentus. Taken together the SLR motif establishes a link between signal transduction pathways from eukaryotes and bacteria. The SLR motif is so far absent from archaea. Therefore the SLR could have developed in the last common ancestor between eukaryotes and bacteria.
A functional role of Rv1738 in Mycobacterium tuberculosis persistence suggested by racemic protein crystallography

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bunker, Richard D.; Mandal, Kalyaneswar; Bashiri, Ghader

Racemic protein crystallography was used to determine the X-ray structure of the predicted Mycobacterium tuberculosis protein Rv1738, which had been completely recalcitrant to crystallization in its natural L-form. Native chemical ligation was used to synthesize both L-protein and D-protein enantiomers of Rv1738. Crystallization of the racemic {D-protein + L-protein} mixture was immediately successful. The resulting crystals diffracted to high resolution and also enabled facile structure determination because of the quantized phases of the data from centrosymmetric crystals. The X-ray structure of Rv1738 revealed striking similarity with bacterial hibernation factors, despite minimal sequence similarity. As a result, we predict that Rv1738,more » which is highly up-regulated in conditions that mimic the onset of persistence, helps trigger dormancy by association with the bacterial ribosome.« less
A functional role of Rv1738 in Mycobacterium tuberculosis persistence suggested by racemic protein crystallography

DOE PAGES

Bunker, Richard D.; Mandal, Kalyaneswar; Bashiri, Ghader; ...

2015-04-07

Racemic protein crystallography was used to determine the X-ray structure of the predicted Mycobacterium tuberculosis protein Rv1738, which had been completely recalcitrant to crystallization in its natural L-form. Native chemical ligation was used to synthesize both L-protein and D-protein enantiomers of Rv1738. Crystallization of the racemic {D-protein + L-protein} mixture was immediately successful. The resulting crystals diffracted to high resolution and also enabled facile structure determination because of the quantized phases of the data from centrosymmetric crystals. The X-ray structure of Rv1738 revealed striking similarity with bacterial hibernation factors, despite minimal sequence similarity. As a result, we predict that Rv1738,more » which is highly up-regulated in conditions that mimic the onset of persistence, helps trigger dormancy by association with the bacterial ribosome.« less
Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.

PubMed

Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W

2017-02-01

Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Comparison of structure, function and regulation of plant cold shock domain proteins to bacterial and animal cold shock domain proteins.

PubMed

Chaikam, Vijay; Karlson, Dale T

2010-01-01

The cold shock domain (CSD) is among the most ancient and well conserved nucleic acid binding domains from bacteria to higher animals and plants. The CSD facilitates binding to RNA, ssDNA and dsDNA and most functions attributed to cold shock domain proteins are mediated by this nucleic acid binding activity. In prokaryotes, cold shock domain proteins only contain a single CSD and are termed cold shock proteins (Csps). In animal model systems, various auxiliary domains are present in addition to the CSD and are commonly named Y-box proteins. Similar to animal CSPs, plant CSPs contain auxiliary C-terminal domains in addition to their N-terminal CSD. Cold shock domain proteins have been shown to play important roles in development and stress adaptation in wide variety of organisms. In this review, the structure, function and regulation of plant CSPs are compared and contrasted to the characteristics of bacterial and animal CSPs. [BMB reports 2010; 43(1): 1-8].
Designer amphiphilic proteins as building blocks for the intracellular formation of organelle-like compartments

NASA Astrophysics Data System (ADS)

Huber, Matthias C.; Schreiber, Andreas; von Olshausen, Philipp; Varga, Balázs R.; Kretz, Oliver; Joch, Barbara; Barnert, Sabine; Schubert, Rolf; Eimer, Stefan; Kele, Péter; Schiller, Stefan M.

2015-01-01

Nanoscale biological materials formed by the assembly of defined block-domain proteins control the formation of cellular compartments such as organelles. Here, we introduce an approach to intentionally ‘program’ the de novo synthesis and self-assembly of genetically encoded amphiphilic proteins to form cellular compartments, or organelles, in Escherichia coli. These proteins serve as building blocks for the formation of artificial compartments in vivo in a similar way to lipid-based organelles. We investigated the formation of these organelles using epifluorescence microscopy, total internal reflection fluorescence microscopy and transmission electron microscopy. The in vivo modification of these protein-based de novo organelles, by means of site-specific incorporation of unnatural amino acids, allows the introduction of artificial chemical functionalities. Co-localization of membrane proteins results in the formation of functionalized artificial organelles combining artificial and natural cellular function. Adding these protein structures to the cellular machinery may have consequences in nanobiotechnology, synthetic biology and materials science, including the constitution of artificial cells and bio-based metamaterials.
Biochemical and functional analysis of CTR1, a protein kinase that negatively regulates ethylene signaling in Arabidopsis

NASA Technical Reports Server (NTRS)

Huang, Yafan; Li, Hui; Hutchison, Claire E.; Laskey, James; Kieber, Joseph J.

2003-01-01

CTR1 encodes a negative regulator of the ethylene response pathway in Arabidopsis thaliana. The C-terminal domain of CTR1 is similar to the Raf family of protein kinases, but its first two-thirds encodes a novel protein domain. We used a variety of approaches to investigate the function of these two CTR1 domains. Recombinant CTR1 protein was purified from a baculoviral expression system, and shown to possess intrinsic Ser/Thr protein kinase activity with enzymatic properties similar to Raf-1. Deletion of the N-terminal domain did not elevate the kinase activity of CTR1, indicating that, at least in vitro, this domain does not autoinhibit kinase function. Molecular analysis of loss-of-function ctr1 alleles indicated that several mutations disrupt the kinase catalytic domain, and in vitro studies confirmed that at least one of these eliminates kinase activity, which indicates that kinase activity is required for CTR1 function. One missense mutation, ctr1-8, was found to result from an amino acid substitution within a new conserved motif within the N-terminal domain. Ctr1-8 has no detectable effect on the kinase activity of CTR1 in vitro, but rather disrupts the interaction with the ethylene receptor ETR1. This mutation also disrupts the dominant negative effect that results from overexpression of the CTR1 amino-terminal domain in transgenic Arabidopsis. These results suggest that CTR1 interacts with ETR1 in vivo, and that this association is required to turn off the ethylene-signaling pathway.
A new multi-scale method to reveal hierarchical modular structures in biological networks.

PubMed

Jiao, Qing-Ju; Huang, Yan; Shen, Hong-Bin

2016-11-15

Biological networks are effective tools for studying molecular interactions. Modular structure, in which genes or proteins may tend to be associated with functional modules or protein complexes, is a remarkable feature of biological networks. Mining modular structure from biological networks enables us to focus on a set of potentially important nodes, which provides a reliable guide to future biological experiments. The first fundamental challenge in mining modular structure from biological networks is that the quality of the observed network data is usually low owing to noise and incompleteness in the obtained networks. The second problem that poses a challenge to existing approaches to the mining of modular structure is that the organization of both functional modules and protein complexes in networks is far more complicated than was ever thought. For instance, the sizes of different modules vary considerably from each other and they often form multi-scale hierarchical structures. To solve these problems, we propose a new multi-scale protocol for mining modular structure (named ISIMB) driven by a node similarity metric, which works in an iteratively converged space to reduce the effects of the low data quality of the observed network data. The multi-scale node similarity metric couples both the local and the global topology of the network with a resolution regulator. By varying this resolution regulator to give different weightings to the local and global terms in the metric, the ISIMB method is able to fit the shape of modules and to detect them on different scales. Experiments on protein-protein interaction and genetic interaction networks show that our method can not only mine functional modules and protein complexes successfully, but can also predict functional modules from specific to general and reveal the hierarchical organization of protein complexes.
The dynamics of single protein molecules is non-equilibrium and self-similar over thirteen decades in time

DOE PAGES

Hu, Xiaohu; Hong, Liang; Smith, Micholas Dean; ...

2015-11-23

Here, internal motions of proteins are essential to their function. The time dependence of protein structural fluctuations is highly complex, manifesting subdiffusive, non-exponential behavior with effective relaxation times existing over many decades in time, from ps up to ~10 2s (refs 1-4). Here, using molecular dynamics simulations, we show that, on timescales from 10 –12 to 10 –5s, motions in single proteins are self-similar, non-equilibrium and exhibit ageing. The characteristic relaxation time for a distance fluctuation, such as inter-domain motion, is observation-time-dependent, increasing in a simple, power-law fashion, arising from the fractal nature of the topology and geometry of themore » energy landscape explored. Diffusion over the energy landscape follows a non-ergodic continuous time random walk. Comparison with single-molecule experiments suggests that the non-equilibrium self-similar dynamical behavior persists up to timescales approaching the in vivo lifespan of individual protein molecules.« less
Similarity Measures for Protein Ensembles

PubMed Central

Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

2009-01-01

Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement. PMID:19145244
Comparative analysis of human milk and infant formula derived peptides following in vitro digestion.

PubMed

Su, M-Y; Broadhurst, M; Liu, C-P; Gathercole, J; Cheng, W-L; Qi, X-Y; Clerens, S; Dyer, J M; Day, L; Haigh, B

2017-04-15

It has long been recognised that there are differences between human milk and infant formulas which lead to differences in health and nutrition for the neonate. In this study we examine and compare the peptide profile of human milk and an exemplar infant formula. The study identifies both similarities and differences in the endogenous and postdigestion peptide profiles of human milk and infant formula. This includes differences in the protein source of these peptides but also with the region within the protein producing the dominant proteins. Clustering of similar peptides around regions of high sequence identity and known bioactivity was also observed. Together the data may explain some of the functional differences between human milk and infant formula, while identifying some aspects of conserved function between bovine and human milks which contribute to the effectiveness of modern infant formula as a substitute for human milk. Copyright © 2016 Elsevier Ltd. All rights reserved.
The crystal structure of a bacterial Sufu-like protein defines a novel group of bacterial proteins that are similar to the N-terminal domain of human Sufu

PubMed Central

Das, Debanu; Finn, Robert D; Abdubek, Polat; Astakhova, Tamara; Axelrod, Herbert L; Bakolitsa, Constantina; Cai, Xiaohui; Carlton, Dennis; Chen, Connie; Chiu, Hsiu-Ju; Chiu, Michelle; Clayton, Thomas; Deller, Marc C; Duan, Lian; Ellrott, Kyle; Farr, Carol L; Feuerhelm, Julie; Grant, Joanna C; Grzechnik, Anna; Han, Gye Won; Jaroszewski, Lukasz; Jin, Kevin K; Klock, Heath E; Knuth, Mark W; Kozbial, Piotr; Sri Krishna, S; Kumar, Abhinav; Lam, Winnie W; Marciano, David; Miller, Mitchell D; Morse, Andrew T; Nigoghossian, Edward; Nopakun, Amanda; Okach, Linda; Puckett, Christina; Reyes, Ron; Tien, Henry J; Trame, Christine B; van den Bedem, Henry; Weekes, Dana; Wooten, Tiffany; Xu, Qingping; Yeh, Andrew; Zhou, Jiadong; Hodgson, Keith O; Wooley, John; Elsliger, Marc-André; Deacon, Ashley M; Godzik, Adam; Lesley, Scott A; Wilson, Ian A

2010-01-01

Sufu (Suppressor of Fused), a two-domain protein, plays a critical role in regulating Hedgehog signaling and is conserved from flies to humans. A few bacterial Sufu-like proteins have previously been identified based on sequence similarity to the N-terminal domain of eukaryotic Sufu proteins, but none have been structurally or biochemically characterized and their function in bacteria is unknown. We have determined the crystal structure of a more distantly related Sufu-like homolog, NGO1391 from Neisseria gonorrhoeae, at 1.4 Å resolution, which provides the first biophysical characterization of a bacterial Sufu-like protein. The structure revealed a striking similarity to the N-terminal domain of human Sufu (r.m.s.d. of 2.6 Å over 93% of the NGO1391 protein), despite an extremely low sequence identity of ∼15%. Subsequent sequence analysis revealed that NGO1391 defines a new subset of smaller, Sufu-like proteins that are present in ∼200 bacterial species and has resulted in expansion of the SUFU (PF05076) family in Pfam. PMID:20836087
Retrieval of Enterobacteriaceae drug targets using singular value decomposition.

PubMed

Silvério-Machado, Rita; Couto, Bráulio R G M; Dos Santos, Marcos A

2015-04-15

The identification of potential drug target proteins in bacteria is important in pharmaceutical research for the development of new antibiotics to combat bacterial agents that cause diseases. A new model that combines the singular value decomposition (SVD) technique with biological filters composed of a set of protein properties associated with bacterial drug targets and similarity to protein-coding essential genes of Escherichia coli (strain K12) has been created to predict potential antibiotic drug targets in the Enterobacteriaceae family. This model identified 99 potential drug target proteins in the studied family, which exhibit eight different functions and are protein-coding essential genes or similar to protein-coding essential genes of E.coli (strain K12), indicating that the disruption of the activities of these proteins is critical for cells. Proteins from bacteria with described drug resistance were found among the retrieved candidates. These candidates have no similarity to the human proteome, therefore exhibiting the advantage of causing no adverse effects or at least no known adverse effects on humans. rita_silverio@hotmail.com. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Paleo-Immunology: Evidence Consistent with Insertion of a Primordial Herpes Virus-Like Element in the Origins of Acquired Immunity

PubMed Central

Dreyfus, David H.

2009-01-01

Background The RAG encoded proteins, RAG-1 and RAG-2 regulate site-specific recombination events in somatic immune B- and T-lymphocytes to generate the acquired immune repertoire. Catalytic activities of the RAG proteins are related to the recombinase functions of a pre-existing mobile DNA element in the DDE recombinase/RNAse H family, sometimes termed the “RAG transposon”. Methodology/Principal Findings Novel to this work is the suggestion that the DDE recombinase responsible for the origins of acquired immunity was encoded by a primordial herpes virus, rather than a “RAG transposon.” A subsequent “arms race” between immunity to herpes infection and the immune system obscured primary amino acid similarities between herpes and immune system proteins but preserved regulatory, structural and functional similarities between the respective recombinase proteins. In support of this hypothesis, evidence is reviewed from previous published data that a modern herpes virus protein family with properties of a viral recombinase is co-regulated with both RAG-1 and RAG-2 by closely linked cis-acting co-regulatory sequences. Structural and functional similarity is also reviewed between the putative herpes recombinase and both DDE site of the RAG-1 protein and another DDE/RNAse H family nuclease, the Argonaute protein component of RISC (RNA induced silencing complex). Conclusions/Significance A “co-regulatory” model of the origins of V(D)J recombination and the acquired immune system can account for the observed linked genomic structure of RAG-1 and RAG-2 in non-vertebrate organisms such as the sea urchin that lack an acquired immune system and V(D)J recombination. Initially the regulated expression of a viral recombinase in immune cells may have been positively selected by its ability to stimulate innate immunity to herpes virus infection rather than V(D)J recombination Unlike the “RAG-transposon” hypothesis, the proposed model can be readily tested by comparative functional analysis of herpes virus replication and V(D)J recombination. PMID:19492059
Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

PubMed Central

Meng, Shaowu; Brown, Douglas E; Ebbole, Daniel J; Torto-Alalibo, Trudy; Oh, Yeon Yee; Deng, Jixin; Mitchell, Thomas K; Dean, Ralph A

2009-01-01

Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 . However, a comprehensive manual curation remains to be performed. Gene Ontology (GO) annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational) GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO). In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57%) being annotated with 1,957 distinct and specific GO terms. Unannotated proteins were assigned to the 3 root terms. The Version 5 GO annotation is publically queryable via the GO site . Additionally, the genome of M. oryzae is constantly being refined and updated as new information is incorporated. For the latest GO annotation of Version 6 genome, please visit our website . The preliminary GO annotation of Version 6 genome is placed at a local MySql database that is publically queryable via a user-friendly interface Adhoc Query System. Conclusion Our analysis provides comprehensive and robust GO annotations of the M. oryzae genome assemblies that will be solid foundations for further functional interrogation of M. oryzae. PMID:19278556

Structural and functional analysis of PucM, a hydrolase in the ureide pathway and a member of the transthyretin-related protein family

PubMed Central

Jung, Du-Kyo; Lee, Youra; Park, Sung Goo; Park, Byoung Chul; Kim, Ghyung-Hwa; Rhee, Sangkee

2006-01-01

The ureide pathway, which produces ureides from uric acid, is an essential purine catabolic process for storing and transporting the nitrogen fixed in leguminous plants and some bacteria. PucM from Bacillus subtilis was recently characterized and found to catalyze the second reaction of the pathway, hydrolyzing 5-hydroxyisourate (HIU), a product of uricase in the first step. PucM has 121 amino acid residues and shows high sequence similarity to the functionally unrelated protein transthyretin (TTR), a thyroid hormone-binding protein. Therefore, PucM belongs to the TTR-related proteins (TRP) family. The crystal structures of PucM at 2.0 Å and its complexes with the substrate analogs 8-azaxanthine and 5,6-diaminouracil reveal that even with their overall structure similarity, homotetrameric PucM and TTR are completely different, both in their electrostatic potential and in the size of the active sites located at the dimeric interface. Nevertheless, the absolutely conserved residues across the TRP family, including His-14, Arg-49, His-105, and the C-terminal Tyr-118–Arg-119–Gly-120–Ser-121, indeed form the active site of PucM. Based on the results of site-directed mutagenesis of these residues, we propose a possible mechanism for HIU hydrolysis. The PucM structure determined for the TRP family leads to the conclusion that diverse members of the TRP family would function similarly to PucM as HIU hydrolase. PMID:16782815
Geomfinder: a multi-feature identifier of similar three-dimensional protein patterns: a ligand-independent approach.

PubMed

Núñez-Vivanco, Gabriel; Valdés-Jiménez, Alejandro; Besoaín, Felipe; Reyes-Parada, Miguel

2016-01-01

Since the structure of proteins is more conserved than the sequence, the identification of conserved three-dimensional (3D) patterns among a set of proteins, can be important for protein function prediction, protein clustering, drug discovery and the establishment of evolutionary relationships. Thus, several computational applications to identify, describe and compare 3D patterns (or motifs) have been developed. Often, these tools consider a 3D pattern as that described by the residues surrounding co-crystallized/docked ligands available from X-ray crystal structures or homology models. Nevertheless, many of the protein structures stored in public databases do not provide information about the location and characteristics of ligand binding sites and/or other important 3D patterns such as allosteric sites, enzyme-cofactor interaction motifs, etc. This makes necessary the development of new ligand-independent methods to search and compare 3D patterns in all available protein structures. Here we introduce Geomfinder, an intuitive, flexible, alignment-free and ligand-independent web server for detailed estimation of similarities between all pairs of 3D patterns detected in any two given protein structures. We used around 1100 protein structures to form pairs of proteins which were assessed with Geomfinder. In these analyses each protein was considered in only one pair (e.g. in a subset of 100 different proteins, 50 pairs of proteins can be defined). Thus: (a) Geomfinder detected identical pairs of 3D patterns in a series of monoamine oxidase-B structures, which corresponded to the effectively similar ligand binding sites at these proteins; (b) we identified structural similarities among pairs of protein structures which are targets of compounds such as acarbose, benzamidine, adenosine triphosphate and pyridoxal phosphate; these similar 3D patterns are not detected using sequence-based methods; (c) the detailed evaluation of three specific cases showed the versatility of Geomfinder, which was able to discriminate between similar and different 3D patterns related to binding sites of common substrates in a range of diverse proteins. Geomfinder allows detecting similar 3D patterns between any two pair of protein structures, regardless of the divergency among their amino acids sequences. Although the software is not intended for simultaneous multiple comparisons in a large number of proteins, it can be particularly useful in cases such as the structure-based design of multitarget drugs, where a detailed analysis of 3D patterns similarities between a few selected protein targets is essential.
PROFESS: a PROtein Function, Evolution, Structure and Sequence database

PubMed Central

Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

2010-01-01

The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718
SANSparallel: interactive homology search against Uniprot

PubMed Central

Somervuo, Panu; Holm, Liisa

2015-01-01

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811
Isolation and characterization of a dual function protein from Allium sativum bulbs which exhibits proteolytic and hemagglutinating activities.

PubMed

Parisi, Mónica G; Moreno, Silvia; Fernández, Graciela

2008-04-01

A dual function protein was isolated from Allium sativum bulbs and was characterized. The protein had a molecular mass of 25-26 kDa under non-reducing conditions, whereas two polypeptide chains of 12.5+/-0.5 kDa were observed under reducing conditions. E-64 and leupeptin inhibited the proteolytic activity of the protein, which exhibited characteristics similar to cysteine peptidase. The enzyme exhibited substrate specificity and hydrolyzed natural substrates such as alpha-casein (K(m): 23.0 microM), azocasein, haemoglobin and gelatin. It also showed a high affinity for synthetic peptides such as Cbz-Ala-Arg-Arg-OMe-beta-Nam (K(m): 55.24 microM, k(cat): 0.92 s(-1)). The cysteine peptidase activity showed a remarkable stability after incubation at moderate temperatures (40-50 degrees C) over a pH range of 5.5-6.5. The N-terminus of the protein displayed a 100% sequence similarity to the sequences of a mannose-binding lectin isolated from garlic bulbs. Moreover, the purified protein was retained in the chromatographic column when Con-A Sepharose affinity chromatography was performed and the protein was able to agglutinate trypsin-treated rabbit red cells. Therefore, our results indicate the presence of an additional cysteine peptidase activity on a lectin previously described.
fusionDB: assessing microbial diversity and environmental preferences via functional similarity networks

PubMed Central

Zhu, Chengsheng; Miller, Maximilian

2018-01-01

Abstract Microbial functional diversification is driven by environmental factors, i.e. microorganisms inhabiting the same environmental niche tend to be more functionally similar than those from different environments. In some cases, even closely phylogenetically related microbes differ more across environments than across taxa. While microbial similarities are often reported in terms of taxonomic relationships, no existing databases directly link microbial functions to the environment. We previously developed a method for comparing microbial functional similarities on the basis of proteins translated from their sequenced genomes. Here, we describe fusionDB, a novel database that uses our functional data to represent 1374 taxonomically distinct bacteria annotated with available metadata: habitat/niche, preferred temperature, and oxygen use. Each microbe is encoded as a set of functions represented by its proteome and individual microbes are connected via common functions. Users can search fusionDB via combinations of organism names and metadata. Moreover, the web interface allows mapping new microbial genomes to the functional spectrum of reference bacteria, rendering interactive similarity networks that highlight shared functionality. fusionDB provides a fast means of comparing microbes, identifying potential horizontal gene transfer events, and highlighting key environment-specific functionality. PMID:29112720
Erwinia amylovora effector protein Eop1 suppresses PAMP-triggered immunity in Malus

USDA-ARS?s Scientific Manuscript database

Erwinia amylovora (Ea) utilizes a type three secretion system (T3SS) to deliver effector proteins into plant host cells. Several Ea effectors have been identified based on their sequence similarity to plant and animal bacterial pathogen effectors; however, the function of the majority of Ea effecto...
Comparison of the Heme Iron Utilization Systems of Pathogenic Vibrios

PubMed Central

O’Malley, S. M.; Mouton, S. L.; Occhino, D. A.; Deanda, M. T.; Rashidi, J. R.; Fuson, K. L.; Rashidi, C. E.; Mora, M. Y.; Payne, S. M.; Henderson, D. P.

1999-01-01

Vibrio alginolyticus, Vibrio fluvialis, and Vibrio parahaemolyticus utilized heme and hemoglobin as iron sources and contained chromosomal DNA similar to several Vibrio cholerae heme iron utilization genes. A V. parahaemolyticus gene that performed the function of V. cholerae hutA was isolated. A portion of the tonB1 locus of V. parahaemolyticus was sequenced and found to encode proteins similar in amino acid sequence to V. cholerae HutW, TonB1, and ExbB1. A recombinant plasmid containing the V. cholerae tonB1 and exbB1D1 genes complemented a V. alginolyticus heme utilization mutant. These data suggest that the heme iron utilization systems of the pathogenic vibrios tested, particularly V. parahaemolyticus and V. alginolyticus, are similar at the DNA level, the functional level, and, in the case of V. parahaemolyticus, the amino acid sequence or protein level to that of V. cholerae. PMID:10348876
Mining for class-specific motifs in protein sequence classification

PubMed Central

2013-01-01

Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms. PMID:23496846
Systematically Ranking the Tightness of Membrane Association for Peripheral Membrane Proteins (PMPs)*

PubMed Central

Gao, Liyan; Ge, Haitao; Huang, Xiahe; Liu, Kehui; Zhang, Yuanya; Xu, Wu; Wang, Yingchun

2015-01-01

Large-scale quantitative evaluation of the tightness of membrane association for nontransmembrane proteins is important for identifying true peripheral membrane proteins with functional significance. Herein, we simultaneously ranked more than 1000 proteins of the photosynthetic model organism Synechocystis sp. PCC 6803 for their relative tightness of membrane association using a proteomic approach. Using multiple precisely ranked and experimentally verified peripheral subunits of photosynthetic protein complexes as the landmarks, we found that proteins involved in two-component signal transduction systems and transporters are overall tightly associated with the membranes, whereas the associations of ribosomal proteins are much weaker. Moreover, we found that hypothetical proteins containing the same domains generally have similar tightness. This work provided a global view of the structural organization of the membrane proteome with respect to divergent functions, and built the foundation for future investigation of the dynamic membrane proteome reorganization in response to different environmental or internal stimuli. PMID:25505158
Identification and Herc5-mediated ISGylation of novel target proteins.

PubMed

Takeuchi, Tomoharu; Inoue, Satoshi; Yokosawa, Hideyoshi

2006-09-22

ISG15, a protein containing two ubiquitin-like domains, is an interferon-stimulated gene product that functions in antiviral response and is conjugated to various cellular proteins (ISGylation) upon interferon stimulation. ISGylation occurs via a pathway similar to the pathway for ubiquitination that requires the sequential action of E1/E2/E3: the E1 (UBE1L), E2 (UbcH8), and E3 (Efp/Herc5) enzymes for ISGylation have been hitherto identified. In this study, we identified six novel candidate target proteins for ISGylation by a proteomic approach. Four candidate target proteins were demonstrated to be ISGylated in UBE1L- and UbcH8-dependent manners, and ISGylation of the respective target proteins was stimulated by Herc5. In addition, Herc5 was capable of binding with the respective target proteins. Thus, these results suggest that Herc5 functions as a general E3 ligase for protein ISGylation.
DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.

PubMed

Mazandu, Gaston K; Mulder, Nicola J

2013-09-25

The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.
Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

PubMed Central

2010-01-01

Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
IsoCleft Finder – a web-based tool for the detection and analysis of protein binding-site geometric and chemical similarities

PubMed Central

Najmanovich, Rafael

2013-01-01

IsoCleft Finder is a web-based tool for the detection of local geometric and chemical similarities between potential small-molecule binding cavities and a non-redundant dataset of ligand-bound known small-molecule binding-sites. The non-redundant dataset developed as part of this study is composed of 7339 entries representing unique Pfam/PDB-ligand (hetero group code) combinations with known levels of cognate ligand similarity. The query cavity can be uploaded by the user or detected automatically by the system using existing PDB entries as well as user-provided structures in PDB format. In all cases, the user can refine the definition of the cavity interactively via a browser-based Jmol 3D molecular visualization interface. Furthermore, users can restrict the search to a subset of the dataset using a cognate-similarity threshold. Local structural similarities are detected using the IsoCleft software and ranked according to two criteria (number of atoms in common and Tanimoto score of local structural similarity) and the associated Z-score and p-value measures of statistical significance. The results, including predicted ligands, target proteins, similarity scores, number of atoms in common, etc., are shown in a powerful interactive graphical interface. This interface permits the visualization of target ligands superimposed on the query cavity and additionally provides a table of pairwise ligand topological similarities. Similarities between top scoring ligands serve as an additional tool to judge the quality of the results obtained. We present several examples where IsoCleft Finder provides useful functional information. IsoCleft Finder results are complementary to existing approaches for the prediction of protein function from structure, rational drug design and x-ray crystallography. IsoCleft Finder can be found at: http://bcb.med.usherbrooke.ca/isocleftfinder. PMID:24555058
Functional diversification and specialization of cytosolic 70-kDa heat shock proteins.

PubMed

McCallister, Chelsea; Siracusa, Matthew C; Shirazi, Farzaneh; Chalkia, Dimitra; Nikolaidis, Nikolas

2015-03-20

A fundamental question in molecular evolution is how protein functional differentiation alters the ability of cells and organisms to cope with stress and survive. To answer this question we used two paralogous Hsp70s from mouse and explored whether these highly similar cytosolic molecular chaperones, which apart their temporal expression have been considered functionally interchangeable, are differentiated with respect to their lipid-binding function. We demonstrate that the two proteins bind to diverse lipids with different affinities and therefore are functionally specialized. The observed lipid-binding patterns may be related with the ability of both Hsp70s to induce cell death by binding to a particular plasma-membrane lipid, and the potential of only one of them to promote cell survival by binding to a specific lysosomal-membrane lipid. These observations reveal that two seemingly identical proteins differentially modulate cellular adaptation and survival by having acquired specialized functions via sequence divergence. Therefore, this study provides an evolutionary paradigm, where promiscuity, specificity, sub- and neo-functionalization orchestrate one of the most conserved systems in nature, the cellular stress-response.
Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

PubMed

Busk, Peter Kamp; Lange, Lene

2013-06-01

Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.
Mutant phenotypes for thousands of bacterial genes of unknown function

DOE PAGES

Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan; ...

2018-05-16

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Mutant phenotypes for thousands of bacterial genes of unknown function

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Protein design to understand peptide ligand recognition by tetratricopeptide repeat proteins.

PubMed

Cortajarena, Aitziber L; Kajander, Tommi; Pan, Weilan; Cocco, Melanie J; Regan, Lynne

2004-04-01

Protein design aims to understand the fundamentals of protein structure by creating novel proteins with pre-specified folds. An equally important goal is to understand protein function by creating novel proteins with pre-specified activities. Here we describe the design and characterization of a tetratricopeptide (TPR) protein, which binds to the C-terminal peptide of the eukaryotic chaperone Hsp90. The design emphasizes the importance of both direct, short-range protein-peptide interactions and of long-range electrostatic optimization. We demonstrate that the designed protein binds specifically to the desired peptide and discriminates between it and the similar C-terminal peptide of Hsp70.
An additional function of the rough endoplasmic reticulum protein complex prolyl 3-hydroxylase 1·cartilage-associated protein·cyclophilin B: the CXXXC motif reveals disulfide isomerase activity in vitro.

PubMed

Ishikawa, Yoshihiro; Bächinger, Hans Peter

2013-11-01

Collagen biosynthesis occurs in the rough endoplasmic reticulum, and many molecular chaperones and folding enzymes are involved in this process. The folding mechanism of type I procollagen has been well characterized, and protein disulfide isomerase (PDI) has been suggested as a key player in the formation of the correct disulfide bonds in the noncollagenous carboxyl-terminal and amino-terminal propeptides. Prolyl 3-hydroxylase 1 (P3H1) forms a hetero-trimeric complex with cartilage-associated protein and cyclophilin B (CypB). This complex is a multifunctional complex acting as a prolyl 3-hydroxylase, a peptidyl prolyl cis-trans isomerase, and a molecular chaperone. Two major domains are predicted from the primary sequence of P3H1: an amino-terminal domain and a carboxyl-terminal domain corresponding to the 2-oxoglutarate- and iron-dependent dioxygenase domains similar to the α-subunit of prolyl 4-hydroxylase and lysyl hydroxylases. The amino-terminal domain contains four CXXXC sequence repeats. The primary sequence of cartilage-associated protein is homologous to the amino-terminal domain of P3H1 and also contains four CXXXC sequence repeats. However, the function of the CXXXC sequence repeats is not known. Several publications have reported that short peptides containing a CXC or a CXXC sequence show oxido-reductase activity similar to PDI in vitro. We hypothesize that CXXXC motifs have oxido-reductase activity similar to the CXXC motif in PDI. We have tested the enzyme activities on model substrates in vitro using a GCRALCG peptide and the P3H1 complex. Our results suggest that this complex could function as a disulfide isomerase in the rough endoplasmic reticulum.

Comparative Genome Analysis of “Candidatus Phytoplasma australiense” (Subgroup tuf-Australia I; rp-A) and “Ca. Phytoplasma asteris” Strains OY-M and AY-WB▿ †

PubMed Central

Tran-Nguyen, L. T. T.; Kube, M.; Schneider, B.; Reinhardt, R.; Gibb, K. S.

2008-01-01

The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function. PMID:18359806
Rye B chromosomes encode a functional Argonaute-like protein with in vitro slicer activities similar to its A chromosome paralog.

PubMed

Ma, Wei; Gabriel, Tobias Sebastian; Martis, Mihaela Maria; Gursinsky, Torsten; Schubert, Veit; Vrána, Jan; Doležel, Jaroslav; Grundlach, Heidrun; Altschmied, Lothar; Scholz, Uwe; Himmelbach, Axel; Behrens, Sven-Erik; Banaei-Moghaddam, Ali Mohammad; Houben, Andreas

2017-01-01

B chromosomes (Bs) are supernumerary, dispensable parts of the nuclear genome, which appear in many different species of eukaryote. So far, Bs have been considered to be genetically inert elements without any functional genes. Our comparative transcriptome analysis and the detection of active RNA polymerase II (RNAPII) in the proximity of B chromatin demonstrate that the Bs of rye (Secale cereale) contribute to the transcriptome. In total, 1954 and 1218 B-derived transcripts with an open reading frame were expressed in generative and vegetative tissues, respectively. In addition to B-derived transposable element transcripts, a high percentage of short transcripts without detectable similarity to known proteins and gene fragments from A chromosomes (As) were found, suggesting an ongoing gene erosion process. In vitro analysis of the A- and B-encoded AGO4B protein variants demonstrated that both possess RNA slicer activity. These data demonstrate unambiguously the presence of a functional AGO4B gene on Bs and that these Bs carry both functional protein coding genes and pseudogene copies. Thus, B-encoded genes may provide an additional level of gene control and complexity in combination with their related A-located genes. Hence, physiological effects, associated with the presence of Bs, may partly be explained by the activity of B-located (pseudo)genes. © 2016 IPK Gatersleben. New Phytologist © 2016 New Phytologist Trust.
The Bcr-Abl kinase regulates the actin cytoskeleton via a GADS/Slp-76/Nck1 adaptor protein pathway.

PubMed

Preisinger, Christian; Kolch, Walter

2010-05-01

Bcr-Abl is the transforming principle underlying chronic myelogenous leukaemia (CML). Here, we use a functional interaction proteomics approach to map pathways by which Bcr-Abl regulates defined cellular processes. The results show that Bcr-Abl regulates the actin cytoskeleton and non-apoptotic membrane blebbing via a GADS/Slp-76/Nck1 adaptor protein pathway. The binding of GADS to Bcr-Abl requires Bcr-Abl tyrosine kinase activity and is sensitive to the Bcr-Abl inhibitor imatinib, while the GADS/Slp-76 and Slp-76/Nck interactions are tyrosine phosphorylation independent. All three adaptor proteins co-localize with cortical actin in membrane blebs. Downregulation of each adaptor protein disrupts the actin cytoskeleton and membrane blebbing in a similar fashion and similar to imatinib. These findings highlight the importance of protein interaction dependent adaptor protein pathways in oncogenic kinase signaling. 2010 Elsevier Inc. All rights reserved.
Usher syndrome protein network functions in the retina and their relation to other retinal ciliopathies.

PubMed

Sorusch, Nasrin; Wunderlich, Kirsten; Bauss, Katharina; Nagel-Wolfrum, Kerstin; Wolfrum, Uwe

2014-01-01

The human Usher syndrome (USH) is the most frequent cause of combined hereditary deaf-blindness. USH is genetically and clinically heterogeneous: 15 chromosomal loci assigned to 3 clinical types, USH1-3. All USH1 and 2 proteins are organized into protein networks by the scaffold proteins harmonin (USH1C), whirlin (USH2D) and SANS (USH1G). This has contributed essentially to our current understanding of the USH protein function in the eye and the ear and explains why defects in proteins of different families cause very similar phenotypes. Ongoing in depth analyses of USH protein networks in the eye indicated cytoskeletal functions as well as roles in molecular transport processes and ciliary cargo delivery in photoreceptor cells. The analysis of USH protein networks revealed molecular links of USH to other ciliopathies, including non-syndromic inner ear defects and isolated retinal dystrophies but also to kidney diseases and syndromes like the Bardet-Biedl syndrome. These findings provide emerging evidence that USH is a ciliopathy molecularly related to other ciliopathies, which opens an avenue for common therapy strategies to treat these diseases.
Identification and Characterization of Arabidopsis Seed Coat Mucilage Proteins.

PubMed

Tsai, Allen Yi-Lun; Kunieda, Tadashi; Rogalski, Jason; Foster, Leonard J; Ellis, Brian E; Haughn, George W

2017-02-01

Plant cell wall proteins are important regulators of cell wall architecture and function. However, because cell wall proteins are difficult to extract and analyze, they are generally poorly understood. Here, we describe the identification and characterization of proteins integral to the Arabidopsis (Arabidopsis thaliana) seed coat mucilage, a specialized layer of the extracellular matrix composed of plant cell wall carbohydrates that is used as a model for cell wall research. The proteins identified in mucilage include those previously identified by genetic analysis, and several mucilage proteins are reduced in mucilage-deficient mutant seeds, suggesting that these proteins are genuinely associated with the mucilage. Arabidopsis mucilage has both nonadherent and adherent layers. Both layers have similar protein profiles except for proteins involved in lipid metabolism, which are present exclusively in the adherent mucilage. The most abundant mucilage proteins include a family of proteins named TESTA ABUNDANT1 (TBA1) to TBA3; a less abundant fourth homolog was named TBA-LIKE (TBAL). TBA and TBAL transcripts and promoter activities were detected in developing seed coats, and their expression requires seed coat differentiation regulators. TBA proteins are secreted to the mucilage pocket during differentiation. Although reverse genetics failed to identify a function for TBAs/TBAL, the TBA promoters are highly expressed and cell type specific and so should be very useful tools for targeting proteins to the seed coat epidermis. Altogether, these results highlight the mucilage proteome as a model for cell walls in general, as it shares similarities with other cell wall proteomes while also containing mucilage-specific features. © 2017 American Society of Plant Biologists. All Rights Reserved.
Identification and Characterization of Arabidopsis Seed Coat Mucilage Proteins1[OPEN

PubMed Central

Tsai, Allen Yi-Lun; Kunieda, Tadashi; Rogalski, Jason; Foster, Leonard J.; Ellis, Brian E.

2017-01-01

Plant cell wall proteins are important regulators of cell wall architecture and function. However, because cell wall proteins are difficult to extract and analyze, they are generally poorly understood. Here, we describe the identification and characterization of proteins integral to the Arabidopsis (Arabidopsis thaliana) seed coat mucilage, a specialized layer of the extracellular matrix composed of plant cell wall carbohydrates that is used as a model for cell wall research. The proteins identified in mucilage include those previously identified by genetic analysis, and several mucilage proteins are reduced in mucilage-deficient mutant seeds, suggesting that these proteins are genuinely associated with the mucilage. Arabidopsis mucilage has both nonadherent and adherent layers. Both layers have similar protein profiles except for proteins involved in lipid metabolism, which are present exclusively in the adherent mucilage. The most abundant mucilage proteins include a family of proteins named TESTA ABUNDANT1 (TBA1) to TBA3; a less abundant fourth homolog was named TBA-LIKE (TBAL). TBA and TBAL transcripts and promoter activities were detected in developing seed coats, and their expression requires seed coat differentiation regulators. TBA proteins are secreted to the mucilage pocket during differentiation. Although reverse genetics failed to identify a function for TBAs/TBAL, the TBA promoters are highly expressed and cell type specific and so should be very useful tools for targeting proteins to the seed coat epidermis. Altogether, these results highlight the mucilage proteome as a model for cell walls in general, as it shares similarities with other cell wall proteomes while also containing mucilage-specific features. PMID:28003327
Structure of Rhodococcus equi virulence-associated protein B (VapB) reveals an eight-stranded antiparallel β-barrel consisting of two Greek-key motifs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Geerds, Christina; Wohlmann, Jens; Haas, Albert

The structure of VapB, a member of the Vap protein family that is involved in virulence of the bacterial pathogen R. equi, was determined by SAD phasing and reveals an eight-stranded antiparallel β-barrel similar to avidin, suggestive of a binding function. Made up of two Greek-key motifs, the topology of VapB is unusual or even unique. Members of the virulence-associated protein (Vap) family from the pathogen Rhodococcus equi regulate virulence in an unknown manner. They do not share recognizable sequence homology with any protein of known structure. VapB and VapA are normally associated with isolates from pigs and horses, respectively.more » To contribute to a molecular understanding of Vap function, the crystal structure of a protease-resistant VapB fragment was determined at 1.4 Å resolution. The structure was solved by SAD phasing employing the anomalous signal of one endogenous S atom and two bound Co ions with low occupancy. VapB is an eight-stranded antiparallel β-barrel with a single helix. Structural similarity to avidins suggests a potential binding function. Unlike other eight- or ten-stranded β-barrels found in avidins, bacterial outer membrane proteins, fatty-acid-binding proteins and lysozyme inhibitors, Vaps do not have a next-neighbour arrangement but consist of two Greek-key motifs with strand order 41238567, suggesting an unusual or even unique topology.« less
Molecular cloning and characterization of human trabeculin-alpha, a giant protein defining a new family of actin-binding proteins.

PubMed

Sun, Y; Zhang, J; Kraeft, S K; Auclair, D; Chang, M S; Liu, Y; Sutherland, R; Salgia, R; Griffin, J D; Ferland, L H; Chen, L B

1999-11-19

We describe the molecular cloning and characterization of a novel giant human cytoplasmic protein, trabeculin-alpha (M(r) = 614,000). Analysis of the deduced amino acid sequence reveals homologies with several putative functional domains, including a pair of alpha-actinin-like actin binding domains; regions of homology to plakins at either end of the giant polypeptide; 29 copies of a spectrin-like motif in the central region of the protein; two potential Ca(2+)-binding EF-hand motifs; and a Ser-rich region containing a repeated GSRX motif. With similarities to both plakins and spectrins, trabeculin-alpha appears to have evolved as a hybrid of these two families of proteins. The functionality of the actin binding domains located near the N terminus was confirmed with an F-actin binding assay using glutathione S-transferase fusion proteins comprising amino acids 9-486 of the deduced peptide. Northern and Western blotting and immunofluorescence studies suggest that trabeculin is ubiquitously expressed and is distributed throughout the cytoplasm, though the protein was found to be greatly up-regulated upon differentiation of myoblasts into myotubes. Finally, the presence of cDNAs similar to, yet distinct from, trabeculin-alpha in both human and mouse suggests that trabeculins may form a new subfamily of giant actin-binding/cytoskeletal cross-linking proteins.
A proteomic analysis of leaf sheaths from rice.

PubMed

Shen, Shihua; Matsubae, Masami; Takao, Toshifumi; Tanaka, Naoki; Komatsu, Setsuko

2002-10-01

The proteins extracted from the leaf sheaths of rice seedlings were separated by 2-D PAGE, and analyzed by Edman sequencing and mass spectrometry, followed by database searching. Image analysis revealed 352 protein spots on 2-D PAGE after staining with Coomassie Brilliant Blue. The amino acid sequences of 44 of 84 proteins were determined; for 31 of these proteins, a clear function could be assigned, whereas for 12 proteins, no function could be assigned. Forty proteins did not yield amino acid sequence information, because they were N-terminally blocked, or the obtained sequences were too short and/or did not give unambiguous results. Fifty-nine proteins were analyzed by mass spectrometry; all of these proteins were identified by matching to the protein database. The amino acid sequences of 19 of 27 proteins analyzed by mass spectrometry were similar to the results of Edman sequencing. These results suggest that 2-D PAGE combined with Edman sequencing and mass spectrometry analysis can be effectively used to identify plant proteins.
Functional metabolite assemblies—a review

NASA Astrophysics Data System (ADS)

Aizen, Ruth; Tao, Kai; Rencus-Lazar, Sigal; Gazit, Ehud

2018-05-01

Metabolites are essential for the normal operation of cells and fulfill various physiological functions. It was recently found that in several metabolic disorders, the associated metabolites could self-assemble to generate amyloid-like structures, similar to canonical protein amyloids that have a role in neurodegenerative disorders. Yet, assemblies with typical amyloid characteristics are also known to have physiological function. In addition, many non-natural proteins and peptides presenting amyloidal properties have been used for the fabrication of functional nanomaterials. Similarly, functional metabolite assemblies are also found in nature, demonstrating various physiological roles. A notable example is the structural color formed by guanine crystals or fluorescent crystals in feline eyes responsible for enhanced night vision. Moreover, some metabolites have been used for the in vitro fabrication of functional materials, such as glycine crystals presenting remarkable piezoelectric properties or indigo films used to assemble organic semi-conductive electronic devices. Therefore, we believe that the study of metabolite assemblies is not only important in order to understand their role in normal physiology and in pathology, but also paves a new route in exploring the fabrication of organic, bio-compatible materials.
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

PubMed

Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

2015-01-01

Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Proteins with similar architecture exhibit similar large-scale dynamic behavior.

PubMed Central

Keskin, O; Jernigan, R L; Bahar, I

2000-01-01

We have investigated the similarities and differences in the computed dynamic fluctuations exhibited by six members of a protein fold family with a coarse-grained Gaussian network model. Specifically, we consider the cofactor binding fragment of CysB; the lysine/arginine/ornithine-binding protein (LAO); the enzyme porphobilinogen deaminase (PBGD); the ribose-binding protein (RBP); the N-terminal lobe of ovotransferrin in apo-form (apo-OVOT); and the leucine/isoleucine/valine-binding protein (LIVBP). All have domains that resemble a Rossmann fold, but there are also some significant differences. Results indicate that similar global dynamic behavior is preserved for the members of a fold family, and that differences usually occur in regions only where specific function is localized. The present work is a computational demonstration that the scaffold of a protein fold may be utilized for diverse purposes. LAO requires a bound ligand before it conforms to the large-scale fluctuation behavior of the three other members of the family, CysB, PBGD, and RBP, all of which contain a substrate (cofactor) at the active site cleft. The dynamics of the ligand-free enzymes LIVBP and apo-OVOT, on the other hand, concur with that of unliganded LAO. The present results suggest that it is possible to construct structure alignments based on dynamic fluctuation behavior. PMID:10733987
Coagulation parameters and platelet function analysis in patients with acromegaly.

PubMed

Colak, A; Yılmaz, H; Temel, Y; Demirpence, M; Simsek, N; Karademirci, İ; Bozkurt, U; Yasar, E

2016-01-01

Acromegaly is associated with increased cardiovascular morbidity and mortality. The data about the evaluation of coagulation and fibrinolysis in acromegalic patients are very limited and to our knowledge, platelet function analysis has never been investigated. So, we aimed to investigate the levels of protein C, protein S, fibrinogen, antithrombin 3 and platelet function analysis in patients with acromegaly. Thirty-nine patients with active acromegaly and 35 healthy subjects were included in the study. Plasma glucose and lipid profile, fibrinogen levels, GH and IGF-1 levels and protein C, protein S and antithrombin III activities were measured in all study subjects. Also, platelet function analysis was evaluated with collagen/ADP and collagen-epinephrine-closure times. Demographic characteristics of the patient and the control were similar. As expected, fasting blood glucose levels and serum GH and IGF-1 levels were significantly higher in the patient group compared with the control group (pglc: 0.002, pGH: 0.006, pIGF-1: 0.001, respectively). But lipid parameters were similar between the two groups. While serum fibrinogen and antithrombin III levels were found to be significantly higher in acromegaly group (p fibrinogen: 0.005 and pantithrombin III: 0.001), protein S and protein C activity values were significantly lower in the patient group (p protein S: 0.001, p protein C: 0.001). Also significantly enhanced platelet function (measured by collagen/ADP- and collagen/epinephrine-closure times) was demonstrated in acromegaly (p col-ADP: 0.002, p col-epinephrine: 0.002). The results did not change, when we excluded six patients with type 2 diabetes in the acromegaly group. There was a negative correlation between serum GH levels and protein S (r: -0.25, p: 0.04)) and protein C (r: -0.26, p: 0.04) values. Likewise, there was a negative correlation between IGF-1 levels and protein C values (r: -0.39, p: 0.002), protein S values (r: -0.39, p: 0.001), collagen/ADP-closure times (r: -0.28, p: 0.02) and collagen/epinephrine-closure times (r:-0.26, p: 0.04). Also, we observed a positive correlation between IGF-1 levels and fibrinogen levels (r: 0.31, p: 0.01). Acromegaly was found to be associated with increased tendency to coagulation and enhanced platelet activity. This hypercoagulable state might increase the risk for cardiovascular and cerebrovascular events in acromegaly.
Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

PubMed

Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

2003-07-04

The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.
A carrot leucine-rich-repeat protein that inhibits ice recrystallization.

PubMed

Worrall, D; Elias, L; Ashford, D; Smallwood, M; Sidebottom, C; Lillford, P; Telford, J; Holt, C; Bowles, D

1998-10-02

Many organisms adapted to live at subzero temperatures express antifreeze proteins that improve their tolerance to freezing. Although structurally diverse, all antifreeze proteins interact with ice surfaces, depress the freezing temperature of aqueous solutions, and inhibit ice crystal growth. A protein purified from carrot shares these functional features with antifreeze proteins of fish. Expression of the carrot complementary DNA in tobacco resulted in the accumulation of antifreeze activity in the apoplast of plants grown at greenhouse temperatures. The sequence of carrot antifreeze protein is similar to that of polygalacturonase inhibitor proteins and contains leucine-rich repeats.
Distribution and Evolution of Yersinia Leucine-Rich Repeat Proteins

PubMed Central

Hu, Yueming; Huang, He; Hui, Xinjie; Cheng, Xi; White, Aaron P.

2016-01-01

Leucine-rich repeat (LRR) proteins are widely distributed in bacteria, playing important roles in various protein-protein interaction processes. In Yersinia, the well-characterized type III secreted effector YopM also belongs to the LRR protein family and is encoded by virulence plasmids. However, little has been known about other LRR members encoded by Yersinia genomes or their evolution. In this study, the Yersinia LRR proteins were comprehensively screened, categorized, and compared. The LRR proteins encoded by chromosomes (LRR1 proteins) appeared to be more similar to each other and different from those encoded by plasmids (LRR2 proteins) with regard to repeat-unit length, amino acid composition profile, and gene expression regulation circuits. LRR1 proteins were also different from LRR2 proteins in that the LRR1 proteins contained an E3 ligase domain (NEL domain) in the C-terminal region or an NEL domain-encoding nucleotide relic in flanking genomic sequences. The LRR1 protein-encoding genes (LRR1 genes) varied dramatically and were categorized into 4 subgroups (a to d), with the LRR1a to -c genes evolving from the same ancestor and LRR1d genes evolving from another ancestor. The consensus and ancestor repeat-unit sequences were inferred for different LRR1 protein subgroups by use of a maximum parsimony modeling strategy. Structural modeling disclosed very similar repeat-unit structures between LRR1 and LRR2 proteins despite the different unit lengths and amino acid compositions. Structural constraints may serve as the driving force to explain the observed mutations in the LRR regions. This study suggests that there may be functional variation and lays the foundation for future experiments investigating the functions of the chromosomally encoded LRR proteins of Yersinia. PMID:27217422
Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).

PubMed

Natale, D A; Shankavaram, U T; Galperin, M Y; Wolf, Y I; Aravind, L; Koonin, E V

2000-01-01

Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs)

PubMed Central

Natale, Darren A; Shankavaram, Uma T; Galperin, Michael Y; Wolf, Yuri I; Aravind, L; Koonin, Eugene V

2000-01-01

Background: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. Results: A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Conclusions: Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange. PMID:11178258
Phytochemicals perturb membranes and promiscuously alter protein function.

PubMed

Ingólfsson, Helgi I; Thakur, Pratima; Herold, Karl F; Hobart, E Ashley; Ramsey, Nicole B; Periole, Xavier; de Jong, Djurre H; Zwama, Martijn; Yilmaz, Duygu; Hall, Katherine; Maretzky, Thorsten; Hemmings, Hugh C; Blobel, Carl; Marrink, Siewert J; Koçer, Armağan; Sack, Jon T; Andersen, Olaf S

2014-08-15

A wide variety of phytochemicals are consumed for their perceived health benefits. Many of these phytochemicals have been found to alter numerous cell functions, but the mechanisms underlying their biological activity tend to be poorly understood. Phenolic phytochemicals are particularly promiscuous modifiers of membrane protein function, suggesting that some of their actions may be due to a common, membrane bilayer-mediated mechanism. To test whether bilayer perturbation may underlie this diversity of actions, we examined five bioactive phenols reported to have medicinal value: capsaicin from chili peppers, curcumin from turmeric, EGCG from green tea, genistein from soybeans, and resveratrol from grapes. We find that each of these widely consumed phytochemicals alters lipid bilayer properties and the function of diverse membrane proteins. Molecular dynamics simulations show that these phytochemicals modify bilayer properties by localizing to the bilayer/solution interface. Bilayer-modifying propensity was verified using a gramicidin-based assay, and indiscriminate modulation of membrane protein function was demonstrated using four proteins: membrane-anchored metalloproteases, mechanosensitive ion channels, and voltage-dependent potassium and sodium channels. Each protein exhibited similar responses to multiple phytochemicals, consistent with a common, bilayer-mediated mechanism. Our results suggest that many effects of amphiphilic phytochemicals are due to cell membrane perturbations, rather than specific protein binding.
Phytochemicals Perturb Membranes and Promiscuously Alter Protein Function

PubMed Central

2015-01-01

A wide variety of phytochemicals are consumed for their perceived health benefits. Many of these phytochemicals have been found to alter numerous cell functions, but the mechanisms underlying their biological activity tend to be poorly understood. Phenolic phytochemicals are particularly promiscuous modifiers of membrane protein function, suggesting that some of their actions may be due to a common, membrane bilayer-mediated mechanism. To test whether bilayer perturbation may underlie this diversity of actions, we examined five bioactive phenols reported to have medicinal value: capsaicin from chili peppers, curcumin from turmeric, EGCG from green tea, genistein from soybeans, and resveratrol from grapes. We find that each of these widely consumed phytochemicals alters lipid bilayer properties and the function of diverse membrane proteins. Molecular dynamics simulations show that these phytochemicals modify bilayer properties by localizing to the bilayer/solution interface. Bilayer-modifying propensity was verified using a gramicidin-based assay, and indiscriminate modulation of membrane protein function was demonstrated using four proteins: membrane-anchored metalloproteases, mechanosensitive ion channels, and voltage-dependent potassium and sodium channels. Each protein exhibited similar responses to multiple phytochemicals, consistent with a common, bilayer-mediated mechanism. Our results suggest that many effects of amphiphilic phytochemicals are due to cell membrane perturbations, rather than specific protein binding. PMID:24901212

Huntingtin-interacting protein 1 influences worm and mouse presynaptic function and protects Caenorhabditis elegans neurons against mutant polyglutamine toxicity.

PubMed

Parker, J Alex; Metzler, Martina; Georgiou, John; Mage, Marilyne; Roder, John C; Rose, Ann M; Hayden, Michael R; Néri, Christian

2007-10-10

Huntingtin-interacting protein 1 (HIP1) was identified through its interaction with htt (huntingtin), the Huntington's disease (HD) protein. HIP1 is an endocytic protein that influences transport and function of AMPA and NMDA receptors in the brain. However, little is known about its contribution to neuronal dysfunction in HD. We report that the Caenorhabditis elegans HIP1 homolog hipr-1 modulates presynaptic activity and the abundance of synaptobrevin, a protein involved in synaptic vesicle fusion. Presynaptic function was also altered in hippocampal brain slices of HIP1-/- mice demonstrating delayed recovery from synaptic depression and a reduction in paired-pulse facilitation, a form of presynaptic plasticity. Interestingly, neuronal dysfunction in transgenic nematodes expressing mutant N-terminal huntingtin was specifically enhanced by hipr-1 loss of function. A similar effect was observed with several other mutant proteins that are expressed at the synapse and involved in endocytosis, such as unc-11/AP180, unc-26/synaptojanin, and unc-57/endophilin. Thus, HIP1 is involved in presynaptic nerve terminal activity and modulation of mutant polyglutamine-induced neuronal dysfunction. Moreover, synaptic proteins involved in endocytosis may protect neurons against amino acid homopolymer expansion.
African swine fever virus encodes two genes which share significant homology with the two largest subunits of DNA-dependent RNA polymerases.

PubMed Central

Yáñez, R J; Boursnell, M; Nogal, M L; Yuste, L; Viñuela, E

1993-01-01

A random sequencing strategy applied to two large SalI restriction fragments (SB and SD) of the African swine fever virus (ASFV) genome revealed that they might encode proteins similar to the two largest RNA polymerase subunits of eukaryotes, poxviruses and Escherichia coli. After further mapping by dot-blot hybridization, two large open reading frames (ORFs) were completely sequenced. The first ORF (NP1450L) encodes a protein of 1450 amino acids with extensive similarity to the largest subunit of RNA polymerases. The second one (EP1242L) codes for a protein of 1242 amino acids similar to the second largest RNA polymerase subunit. Proteins NP1450L and EP1242L are more similar to the corresponding subunits of eukaryotic RNA polymerase II than to those of vaccinia virus, the prototype poxvirus, which shares many functional characteristics with ASFV. ORFs NP1450L and EP1242L are mainly expressed late in ASFV infection, after the onset of DNA replication. Images PMID:8506138
LEAping to conclusions: a computational reanalysis of late embryogenesis abundant proteins and their possible roles.

PubMed

Wise, Michael J

2003-10-29

The late embryogenesis abundant (LEA) proteins cover a number of loosely related groups of proteins, originally found in plants but now being found in non-plant species. Their precise function is unknown, though considerable evidence suggests that LEA proteins are involved in desiccation resistance. Using a number of statistically-based bioinformatics tools the classification of a large set of LEA proteins, covering all Groups, is reexamined together with some previous findings. Searches based on peptide composition return proteins with similar composition to different LEA Groups; keyword clustering is then applied to reveal keywords and phrases suggestive of the Groups' properties. Previous research has suggested that glycine is characteristic of LEA proteins, but it is only highly over-represented in Groups 1 and 2, while alanine, thought characteristic of Group 2, is over-represented in Group 3, 4 and 6 but under-represented in Groups 1 and 2. However, for LEA Groups 1 2 and 3 it is shown that glutamine is very significantly over-represented, while cysteine, phenylalanine, isoleucine, leucine and tryptophan are significantly under-represented. There is also evidence that the Group 4 LEA proteins are more appropriately redistributed to Group 2 and Group 3. Similarly, Group 5 is better found among the Group 3 LEA proteins. There is evidence that Group 2 and Group 3 LEA proteins, though distinct, might be related. This relationship is also evident in the overlapping sets of keywords for the two Groups, emphasising alpha-helical structure and, at a larger scale, filaments, all of which fits well with experimental evidence that proteins from both Groups are natively unstructured, but become structured under stress conditions. The keywords support localisation of LEA proteins both in the nucleus and associated with the cytoskeleton, and a mode of action similar to chaperones, perhaps the cold shock chaperones, via a role in DNA-binding. In general, non-globular and low-complexity proteins, such as the LEA proteins, pose particular challenges in determining their functions and modes of action. Rather than masking off and ignoring low-complexity domains, novel tools and tool combinations are needed which are capable of analysing such proteins in their entirety.
Single Honeybee Silk Protein Mimics Properties of Multi-Protein Silk

PubMed Central

Sutherland, Tara D.; Church, Jeffrey S.; Hu, Xiao; Huson, Mickey G.; Kaplan, David L.; Weisman, Sarah

2011-01-01

Honeybee silk is composed of four fibrous proteins that, unlike other silks, are readily synthesized at full-length and high yield. The four silk genes have been conserved for over 150 million years in all investigated bee, ant and hornet species, implying a distinct functional role for each protein. However, the amino acid composition and molecular architecture of the proteins are similar, suggesting functional redundancy. In this study we compare materials generated from a single honeybee silk protein to materials containing all four recombinant proteins or to natural honeybee silk. We analyse solution conformation by dynamic light scattering and circular dichroism, solid state structure by Fourier Transform Infrared spectroscopy and Raman spectroscopy, and fiber tensile properties by stress-strain analysis. The results demonstrate that fibers artificially generated from a single recombinant silk protein can reproduce the structural and mechanical properties of the natural silk. The importance of the four protein complex found in natural silk may lie in biological silk storage or hierarchical self-assembly. The finding that the functional properties of the mature material can be achieved with a single protein greatly simplifies the route to production for artificial honeybee silk. PMID:21311767
Lamin-like analogues in plants: the characterization of NMCP1 in Allium cepa

PubMed Central

Moreno Díaz de la Espina, Susana

2013-01-01

The nucleoskeleton of plants contains a peripheral lamina (also called plamina) and, even though lamins are absent in plants, their roles are still fulfilled in plant nuclei. One of the most intriguing topics in plant biology concerns the identity of lamin protein analogues in plants. Good candidates to play lamin functions in plants are the members of the NMCP (nuclear matrix constituent protein) family, which exhibit the typical tripartite structure of lamins. This paper describes a bioinformatics analysis and classification of the NMCP family based on phylogenetic relationships, sequence similarity and the distribution of conserved regions in 76 homologues. In addition, NMCP1 in the monocot Allium cepa characterized by its sequence and structure, biochemical properties, and subnuclear distribution and alterations in its expression throughout the root were identified. The results demonstrate that these proteins exhibit many similarities to lamins (structural organization, conserved regions, subnuclear distribution, and solubility) and that they may fulfil the functions of lamins in plants. These findings significantly advance understanding of the structural proteins of the plant lamina and nucleoskeleton and provide a basis for further investigation of the protein networks forming these structures. PMID:23378381
Lamin-like analogues in plants: the characterization of NMCP1 in Allium cepa.

PubMed

Ciska, Malgorzata; Masuda, Kiyoshi; Moreno Díaz de la Espina, Susana

2013-04-01

The nucleoskeleton of plants contains a peripheral lamina (also called plamina) and, even though lamins are absent in plants, their roles are still fulfilled in plant nuclei. One of the most intriguing topics in plant biology concerns the identity of lamin protein analogues in plants. Good candidates to play lamin functions in plants are the members of the NMCP (nuclear matrix constituent protein) family, which exhibit the typical tripartite structure of lamins. This paper describes a bioinformatics analysis and classification of the NMCP family based on phylogenetic relationships, sequence similarity and the distribution of conserved regions in 76 homologues. In addition, NMCP1 in the monocot Allium cepa characterized by its sequence and structure, biochemical properties, and subnuclear distribution and alterations in its expression throughout the root were identified. The results demonstrate that these proteins exhibit many similarities to lamins (structural organization, conserved regions, subnuclear distribution, and solubility) and that they may fulfil the functions of lamins in plants. These findings significantly advance understanding of the structural proteins of the plant lamina and nucleoskeleton and provide a basis for further investigation of the protein networks forming these structures.
Crystal structure of secretory abundant heat soluble protein 4 from one of the toughest “water bears” micro‐animals Ramazzottius Varieornatus

PubMed Central

Fukuda, Yohta

2018-01-01

Abstract Though anhydrobiotic tardigrades (micro‐animals also known as water bears) possess many genes of secretory abundant heat soluble (SAHS) proteins unique to Tardigrada, their functions are unknown. A previous crystallographic study revealed that a SAHS protein (RvSAHS1) from one of the toughest tardigrades, Ramazzottius varieornatus, has a β‐barrel architecture similar to fatty acid binding proteins (FABPs) and two putative ligand binding sites (LBS1 and LBS2) where fatty acids can bind. However, some SAHS proteins such as RvSAHS4 have different sets of amino acid residues at LBS1 and LBS2, implying that they prefer other ligands and have different functions. Here RvSAHS4 was crystallized and analyzed under a condition similar to that for RvSAHS1. There was no electron density corresponding to a fatty acid at LBS1 of RvSAHS4, where a putative fatty acid was observed in RvSAHS1. Instead, LBS2 of RvSAHS4, which was composed of uncharged residues, captured a putative polyethylene glycol molecule. These results suggest that RvSAHS4 mainly uses LBS2 for the binding of uncharged molecules. PMID:29493034
The petunia AGL6 gene has a SEPALLATA-like function in floral patterning.

PubMed

Rijpkema, Anneke S; Zethof, Jan; Gerats, Tom; Vandenbussche, Michiel

2009-10-01

SEPALLATA (SEP) MADS-box genes are required for the regulation of floral meristem determinacy and the specification of sepals, petals, stamens, carpels and ovules, specifically in angiosperms. The SEP subfamily is closely related to the AGAMOUS LIKE6 (AGL6) and SQUAMOSA (SQUA) subfamilies. So far, of these three groups only AGL6-like genes have been found in extant gymnosperms. AGL6 genes are more similar to SEP than to SQUA genes, both in sequence and in expression pattern. Despite the ancestry and wide distribution of AGL6-like MADS-box genes, not a single loss-of-function mutant exhibiting a clear phenotype has yet been reported; consequently the function of AGL6-like genes has remained elusive. Here, we characterize the Petunia hybrida AGL6 (PhAGL6, formerly called PETUNIA MADS BOX GENE4/pMADS4) gene, and show that it functions redundantly with the SEP genes FLORAL BINDING PROTEIN2 (FBP2) and FBP5 in petal and anther development. Moreover, expression analysis suggests a function for PhAGL6 in ovary and ovule development. The PhAGL6 and FBP2 proteins interact in in vitro experiments overall with the same partners, indicating that the two proteins are biochemically quite similar. It will be interesting to determine the functions of AGL6-like genes of other species, especially those of gymnosperms.
Classification of Phylogenetic Profiles for Protein Function Prediction: An SVM Approach

NASA Astrophysics Data System (ADS)

Kotaru, Appala Raju; Joshi, Ramesh C.

Predicting the function of an uncharacterized protein is a major challenge in post-genomic era due to problems complexity and scale. Having knowledge of protein function is a crucial link in the development of new drugs, better crops, and even the development of biochemicals such as biofuels. Recently numerous high-throughput experimental procedures have been invented to investigate the mechanisms leading to the accomplishment of a protein’s function and Phylogenetic profile is one of them. Phylogenetic profile is a way of representing a protein which encodes evolutionary history of proteins. In this paper we proposed a method for classification of phylogenetic profiles using supervised machine learning method, support vector machine classification along with radial basis function as kernel for identifying functionally linked proteins. We experimentally evaluated the performance of the classifier with the linear kernel, polynomial kernel and compared the results with the existing tree kernel. In our study we have used proteins of the budding yeast saccharomyces cerevisiae genome. We generated the phylogenetic profiles of 2465 yeast genes and for our study we used the functional annotations that are available in the MIPS database. Our experiments show that the performance of the radial basis kernel is similar to polynomial kernel is some functional classes together are better than linear, tree kernel and over all radial basis kernel outperformed the polynomial kernel, linear kernel and tree kernel. In analyzing these results we show that it will be feasible to make use of SVM classifier with radial basis function as kernel to predict the gene functionality using phylogenetic profiles.
Structural and Functional Similarities of Calcium Homeostasis Modulator 1 (CALHM1) Ion Channel with Connexins, Pannexins, and Innexins*

PubMed Central

Siebert, Adam P.; Ma, Zhongming; Grevet, Jeremy D.; Demuro, Angelo; Parker, Ian; Foskett, J. Kevin

2013-01-01

CALHM1 (calcium homeostasis modulator 1) forms a plasma membrane ion channel that mediates neuronal excitability in response to changes in extracellular Ca2+ concentration. Six human CALHM homologs exist with no homology to other proteins, although CALHM1 is conserved across >20 species. Here we demonstrate that CALHM1 shares functional and quaternary and secondary structural similarities with connexins and evolutionarily distinct innexins and their vertebrate pannexin homologs. A CALHM1 channel is a hexamer, comprised of six monomers, each of which possesses four transmembrane domains, cytoplasmic amino and carboxyl termini, an amino-terminal helix, and conserved extracellular cysteines. The estimated pore diameter of the CALHM1 channel is ∼14 Å, enabling permeation of large charged molecules. Thus, CALHMs, connexins, and pannexins and innexins are structurally related protein families with shared and distinct functional properties. PMID:23300080
MetalS2: a tool for the structural alignment of minimal functional sites in metal-binding proteins and nucleic acids.

PubMed

Andreini, Claudia; Cavallaro, Gabriele; Rosato, Antonio; Valasatava, Yana

2013-11-25

We developed a new software tool, MetalS(2), for the structural alignment of Minimal Functional Sites (MFSs) in metal-binding biological macromolecules. MFSs are 3D templates that describe the local environment around the metal(s) independently of the larger context of the macromolecular structure. Such local environment has a determinant role in tuning the chemical reactivity of the metal, ultimately contributing to the functional properties of the whole system. On our example data sets, MetalS(2) unveiled structural similarities that other programs for protein structure comparison do not consistently point out and overall identified a larger number of structurally similar MFSs. MetalS(2) supports the comparison of MFSs harboring different metals and/or with different nuclearity and is available both as a stand-alone program and a Web tool ( http://metalweb.cerm.unifi.it/tools/metals2/).
Influence of extraction pH on the foaming, emulsification, oil-binding and visco-elastic properties of marama protein.

PubMed

Gulzar, Muhammad; Taylor, John Rn; Minnaar, Amanda

2017-11-01

Marama bean protein, as extracted previously at pH 8, forms a viscous, adhesive and extensible dough. To obtain a protein isolate with optimum functional properties, protein extraction under slightly acidic conditions (pH 6) was investigated. Two-dimensional electrophoresis showed that pH 6 extracted marama protein lacked some basic 11S legumin polypeptides, present in pH 8 extracted protein. However, it additionally contained acidic high molecular weight polypeptides (∼180 kDa), which were disulfide crosslinked into larger proteins. pH 6 extracted marama proteins had similar emulsification properties to soy protein isolate and several times higher foaming capacity than pH 8 extracted protein, egg white and soy protein isolate. pH 6 extracted protein dough was more elastic than pH 8 extracted protein, approaching the elasticity of wheat gluten. Marama protein extracted at pH 6 has excellent food-type functional properties, probably because it lacks some 11S polypeptides but has additional high molecular weight proteins. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Similarity and functional analyses of expressed parasitism genes in Heterodera schachtii and Heterodera glycines

USDA-ARS?s Scientific Manuscript database

The secreted proteins encoded by “parasitism genes” expressed within the esophageal glands cells of cyst nematodes play important roles in plant parasitism. Homologous transcripts and encoded proteins of the Heterodera glycines pioneer parasitism genes Hgsyv46, Hg4e02 and Hg5d08 were identified and ...
PASS2: an automated database of protein alignments organised as structural superfamilies.

PubMed

Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

2004-04-02

The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
[Molecular aspects of allergy to plant products. Part II. Pathogenesis-related proteins (PRs), apple allergenicity governed by Mal d 1 gene].

PubMed

Bokszczanin, Kamila Ł; Przybyła, Andrzej A

2012-03-01

Of the plant allergens listed in the Official Allergen Database of the International Union of Immunological Societies, approximately 25% belong to the group of pathogenesis-related proteins (PRs). They have been classified into 17 PR families based on similarities in their amino acid sequence, enzymatic activities, or other functional properties. Plant-derived allergens have been identified with sequence similarities to PR families 2, 3, 4, 5, 8, 10, and 14. The main birch allergen in northern Europe is a class 10 (PR-10) protein from the European white birch (Betula pendula) termed Bet v 1. Pollen of other Fagales species contains PR-10 homologues that share epitopes with Bet v 1, as do several fruits, nuts and vegetables. Among the plant food fruits of the Rosaceae family are the most frequently responsible for allergenic reactions. It is documented, that approximately 2% of European population is allergic to apples. The article presents molecular characterization of PR-10 proteins with regard to their structure and function as well as apple Mal d 1 gene-determined allergenicity.
Neutron scattering studies on protein dynamics using the human myelin peripheral membrane protein P2

NASA Astrophysics Data System (ADS)

Laulumaa, Saara; Kursula, Petri; Natali, Francesca

2015-01-01

Myelin is a multilayered proteolipid membrane structure surrounding selected axons in the vertebrate nervous system, which allows the rapid saltatory conduction of nerve impulses. Deficits in myelin formation and maintenance may lead to chronic neurological disease. P2 is an abundant myelin protein from peripheral nerves, binding between two apposing lipid bilayers. We studied the dynamics of the human myelin protein P2 and its mutated P38G variant in hydrated powders using elastic incoherent neutron scattering. The local harmonic vibrations at low temperatures were very similar for both samples, but the mutant protein had increased flexibility and softness close to physiological temperatures. The results indicate that a drastic mutation of proline to glycine at a functional site can affect protein dynamics, and in the case of P2, they may explain functional differences between the two proteins.
Scallop DMT functions as a Ca2+ transporter.

PubMed

Toyohara, Haruhiko; Yamamoto, Sayuri; Hosoi, Masatomi; Takagi, Masaya; Hayashi, Isao; Nakao, Kenji; Kaneko, Shuji

2005-05-09

We identified a DMT (divalent metal transporter) homologous protein that functions as a Ca(2+) transporter. Scallop DMT cDNA encodes a 539-amino-acid protein with 12 putative membrane-spanning domains and has a consensus transport motif in the fourth extracellular loop. Since its mRNA is significantly expressed in the gill and intestine, it is assumed that scallop DMT transports Ca(2+) from seawater by the gill and from food by the intestine. Scallop DMT lacks the iron-responsive element commonly found in iron-regulatory proteins, suggesting that it is free of the post-transcriptional regulation from intracellular Fe(2+) concentration. Scallop DMT distinctly functions as a Ca(2+) transporter unlike other DMTs, however, it also transports Fe(2+) and Cd(2+) similar to them.
Evolutionary, Molecular and Genetic Analyses of Tic22 Homologues in Arabidopsis thaliana Chloroplasts

PubMed Central

Kasmati, Ali Reza; Patel, Ramesh; Ling, Qihua; Karim, Sazzad; Aronsson, Henrik; Jarvis, Paul

2013-01-01

The Tic22 protein was previously identified in pea as a putative component of the chloroplast protein import apparatus. It is a peripheral protein of the inner envelope membrane, residing in the intermembrane space. In Arabidopsis, there are two Tic22 homologues, termed atTic22-III and atTic22-IV, both of which are predicted to localize in chloroplasts. These two proteins defined clades that are conserved in all land plants, which appear to have evolved at a similar rates since their separation >400 million years ago, suggesting functional conservation. The atTIC22-IV gene was expressed several-fold more highly than atTIC22-III, but the genes exhibited similar expression profiles and were expressed throughout development. Knockout mutants lacking atTic22-IV were visibly normal, whereas those lacking atTic22-III exhibited moderate chlorosis. Double mutants lacking both isoforms were more strongly chlorotic, particularly during early development, but were viable and fertile. Double-mutant chloroplasts were small and under-developed relative to those in wild type, and displayed inefficient import of precursor proteins. The data indicate that the two Tic22 isoforms act redundantly in chloroplast protein import, and that their function is non-essential but nonetheless required for normal chloroplast biogenesis, particularly during early plant development. PMID:23675512
Measuring and comparing structural fluctuation patterns in large protein datasets.

PubMed

Fuglebakk, Edvin; Echave, Julián; Reuter, Nathalie

2012-10-01

The function of a protein depends not only on its structure but also on its dynamics. This is at the basis of a large body of experimental and theoretical work on protein dynamics. Further insight into the dynamics-function relationship can be gained by studying the evolutionary divergence of protein motions. To investigate this, we need appropriate comparative dynamics methods. The most used dynamical similarity score is the correlation between the root mean square fluctuations (RMSF) of aligned residues. Despite its usefulness, RMSF is in general less evolutionarily conserved than the native structure. A fundamental issue is whether RMSF is not as conserved as structure because dynamics is less conserved or because RMSF is not the best property to use to study its conservation. We performed a systematic assessment of several scores that quantify the (dis)similarity between protein fluctuation patterns. We show that the best scores perform as well as or better than structural dissimilarity, as assessed by their consistency with the SCOP classification. We conclude that to uncover the full extent of the evolutionary conservation of protein fluctuation patterns, it is important to measure the directions of fluctuations and their correlations between sites. Nathalie.Reuter@mbi.uib.no Supplementary data are available at Bioinformatics Online.
Exploring the Midgut Transcriptome and Brush Border Membrane Vesicle Proteome of the Rice Stem Borer, Chilo suppressalis (Walker)

PubMed Central

Peng, Chuanhua; Wang, Xiaoping; Li, Fei; Lin, Yongjun

2012-01-01

The rice stem borer, Chilo suppressalis (Walker) (Lepidoptera: Pyralidae), is one of the most detrimental pests affecting rice crops. The use of Bacillus thuringiensis (Bt) toxins has been explored as a means to control this pest, but the potential for C. suppressalis to develop resistance to Bt toxins makes this approach problematic. Few C. suppressalis gene sequences are known, which makes in-depth study of gene function difficult. Herein, we sequenced the midgut transcriptome of the rice stem borer. In total, 37,040 contigs were obtained, with a mean size of 497 bp. As expected, the transcripts of C. suppressalis shared high similarity with arthropod genes. Gene ontology and KEGG analysis were used to classify the gene functions in C. suppressalis. Using the midgut transcriptome data, we conducted a proteome analysis to identify proteins expressed abundantly in the brush border membrane vesicles (BBMV). Of the 100 top abundant proteins that were excised and subjected to mass spectrometry analysis, 74 share high similarity with known proteins. Among these proteins, Western blot analysis showed that Aminopeptidase N and EH domain-containing protein have the binding activities with Bt-toxin Cry1Ac. These data provide invaluable information about the gene sequences of C. suppressalis and the proteins that bind with Cry1Ac. PMID:22666467

Glucan Binding Protein C of Streptococcus mutans Mediates both Sucrose-Independent and Sucrose-Dependent Adherence.

PubMed

Mieher, Joshua L; Larson, Matthew R; Schormann, Norbert; Purushotham, Sangeetha; Wu, Ren; Rajashankar, Kanagalaghatta R; Wu, Hui; Deivanayagam, Champion

2018-07-01

The high-resolution structure of glucan binding protein C (GbpC) at 1.14 Å, a sucrose-dependent virulence factor of the dental caries pathogen Streptococcus mutans , has been determined. GbpC shares not only structural similarities with the V regions of AgI/II and SspB but also functional adherence to salivary agglutinin (SAG) and its scavenger receptor cysteine-rich domains (SRCRs). This is not only a newly identified function for GbpC but also an additional fail-safe binding mechanism for S. mutans Despite the structural similarities with S. mutans antigen I/II (AgI/II) and SspB of Streptococcus gordonii , GbpC remains unique among these surface proteins in its propensity to adhere to dextran/glucans. The complex crystal structure of GbpC with dextrose (β-d-glucose; Protein Data Bank ligand BGC) highlights exclusive structural features that facilitate this interaction with dextran. Targeted deletion mutant studies on GbpC's divergent loop region in the vicinity of a highly conserved calcium binding site confirm its role in biofilm formation. Finally, we present a model for adherence to dextran. The structure of GbpC highlights how artfully microbes have engineered the lectin-like folds to broaden their functional adherence repertoire. Copyright © 2018 American Society for Microbiology.
Three new members of the RNP protein family in Xenopus.

PubMed Central

Good, P J; Rebbert, M L; Dawid, I B

1993-01-01

Many RNP proteins contain one or more copies of the RNA recognition motif (RRM) and are thought to be involved in cellular RNA metabolism. We have previously characterized in Xenopus a nervous system specific gene, nrp1, that is more similar to the hnRNP A/B proteins than to other known proteins (K. Richter, P. J. Good, and I. B. Dawid (1990), New Biol. 2, 556-565). PCR amplification with degenerate primers was used to identify additional cDNAs encoding two RRMs in Xenopus. Three previously uncharacterized genes were identified. Two genes encode hnRNP A/B proteins with two RRMs and a glycine-rich domain. One of these is the Xenopus homolog of the human A2/B1 gene; the other, named hnRNP A3, is similar to both the A1 and A2 hnRNP genes. The Xenopus hnRNP A1, A2 and A3 genes are expressed throughout development and in all adult tissues. Multiple protein isoforms for the hnRNP A2 gene are predicted that differ by the insertion of short peptide sequences in the glycine-rich domain. The third newly isolated gene, named xrp1, encodes a protein that is related by sequence to the nrp1 protein but is expressed ubiquitously. Despite the similarity to nuclear RNP proteins, both the nrp1 and xrp1 proteins are localized to the cytoplasm in the Xenopus oocyte. The xrp1 gene may have a function in all cells that is similar to that executed by nrp1 specifically within the nervous system. Images PMID:8451200
eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape.

PubMed

Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki

2007-07-01

We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek).
NovelFam3000 – Uncharacterized human protein domains conserved across model organisms

PubMed Central

Kemmer, Danielle; Podowski, Raf M; Arenillas, David; Lim, Jonathan; Hodges, Emily; Roth, Peggy; Sonnhammer, Erik LL; Höög, Christer; Wasserman, Wyeth W

2006-01-01

Background Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins. Description From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system. Conclusion Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families. PMID:16533400
Function, dynamics and evolution of network motif modules in integrated gene regulatory networks of worm and plant.

PubMed

Defoort, Jonas; Van de Peer, Yves; Vermeirssen, Vanessa

2018-06-05

Gene regulatory networks (GRNs) consist of different molecular interactions that closely work together to establish proper gene expression in time and space. Especially in higher eukaryotes, many questions remain on how these interactions collectively coordinate gene regulation. We study high quality GRNs consisting of undirected protein-protein, genetic and homologous interactions, and directed protein-DNA, regulatory and miRNA-mRNA interactions in the worm Caenorhabditis elegans and the plant Arabidopsis thaliana. Our data-integration framework integrates interactions in composite network motifs, clusters these in biologically relevant, higher-order topological network motif modules, overlays these with gene expression profiles and discovers novel connections between modules and regulators. Similar modules exist in the integrated GRNs of worm and plant. We show how experimental or computational methodologies underlying a certain data type impact network topology. Through phylogenetic decomposition, we found that proteins of worm and plant tend to functionally interact with proteins of a similar age, while at the regulatory level TFs favor same age, but also older target genes. Despite some influence of the duplication mode difference, we also observe at the motif and module level for both species a preference for age homogeneity for undirected and age heterogeneity for directed interactions. This leads to a model where novel genes are added together to the GRNs in a specific biological functional context, regulated by one or more TFs that also target older genes in the GRNs. Overall, we detected topological, functional and evolutionary properties of GRNs that are potentially universal in all species.
HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

PubMed

Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel; Ten Have, Arjen

2018-01-01

Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.
HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold

PubMed Central

Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel

2018-01-01

Background Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER. PMID:29579071
Single TRAM domain RNA-binding proteins in Archaea: functional insight from Ctr3 from the Antarctic methanogen Methanococcoides burtonii.

PubMed

Taha; Siddiqui, K S; Campanaro, S; Najnin, T; Deshpande, N; Williams, T J; Aldrich-Wright, J; Wilkins, M; Curmi, P M G; Cavicchioli, R

2016-09-01

TRAM domain proteins present in Archaea and Bacteria have a β-barrel shape with anti-parallel β-sheets that form a nucleic acid binding surface; a structure also present in cold shock proteins (Csps). Aside from protein structures, experimental data defining the function of TRAM domains is lacking. Here, we explore the possible functional properties of a single TRAM domain protein, Ctr3 (cold-responsive TRAM domain protein 3) from the Antarctic archaeon Methanococcoides burtonii that has increased abundance during low temperature growth. Ribonucleic acid (RNA) bound by Ctr3 in vitro was determined using RNA-seq. Ctr3-bound M. burtonii RNA with a preference for transfer (t)RNA and 5S ribosomal RNA, and a potential binding motif was identified. In tRNA, the motif represented the C loop; a region that is conserved in tRNA from all domains of life and appears to be solvent exposed, potentially providing access for Ctr3 to bind. Ctr3 and Csps are structurally similar and are both inferred to function in low temperature translation. The broad representation of single TRAM domain proteins within Archaea compared with their apparent absence in Bacteria, and scarcity of Csps in Archaea but prevalence in Bacteria, suggests they represent distinct evolutionary lineages of functionally equivalent RNA-binding proteins. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
Plant Cation-Chloride Cotransporters (CCC): Evolutionary Origins and Functional Insights.

PubMed

Henderson, Sam W; Wege, Stefanie; Gilliham, Matthew

2018-02-06

Genomes of unicellular and multicellular green algae, mosses, grasses and dicots harbor genes encoding cation-chloride cotransporters (CCC). CCC proteins from the plant kingdom have been comparatively less well investigated than their animal counterparts, but proteins from both plants and animals have been shown to mediate ion fluxes, and are involved in regulation of osmotic processes. In this review, we show that CCC proteins from plants form two distinct phylogenetic clades (CCC1 and CCC2). Some lycophytes and bryophytes possess members from each clade, most land plants only have members of the CCC1 clade, and green algae possess only the CCC2 clade. It is currently unknown whether CCC1 and CCC2 proteins have similar or distinct functions, however they are both more closely related to animal KCC proteins compared to NKCCs. Existing heterologous expression systems that have been used to functionally characterize plant CCC proteins, namely yeast and Xenopus laevis oocytes, have limitations that are discussed. Studies from plants exposed to chemical inhibitors of animal CCC protein function are reviewed for their potential to discern CCC function in planta. Thus far, mutations in plant CCC genes have been evaluated only in two species of angiosperms, and such mutations cause a diverse array of phenotypes-seemingly more than could simply be explained by localized disruption of ion transport alone. We evaluate the putative roles of plant CCC proteins and suggest areas for future investigation.
Nucleic acids encoding plant glutamine phenylpyruvate transaminase (GPT) and uses thereof

DOEpatents

Unkefer, Pat J.; Anderson, Penelope S.; Knight, Thomas J.

2016-03-29

Glutamine phenylpyruvate transaminase (GPT) proteins, nucleic acid molecules encoding GPT proteins, and uses thereof are disclosed. Provided herein are various GPT proteins and GPT gene coding sequences isolated from a number of plant species. As disclosed herein, GPT proteins share remarkable structural similarity within plant species, and are active in catalyzing the synthesis of 2-hydroxy-5-oxoproline (2-oxoglutaramate), a powerful signal metabolite which regulates the function of a large number of genes involved in the photosynthesis apparatus, carbon fixation and nitrogen metabolism.
SChloro: directing Viridiplantae proteins to six chloroplastic sub-compartments.

PubMed

Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

2017-02-01

Chloroplasts are organelles found in plants and involved in several important cell processes. Similarly to other compartments in the cell, chloroplasts have an internal structure comprising several sub-compartments, where different proteins are targeted to perform their functions. Given the relation between protein function and localization, the availability of effective computational tools to predict protein sub-organelle localizations is crucial for large-scale functional studies. In this paper we present SChloro, a novel machine-learning approach to predict protein sub-chloroplastic localization, based on targeting signal detection and membrane protein information. The proposed approach performs multi-label predictions discriminating six chloroplastic sub-compartments that include inner membrane, outer membrane, stroma, thylakoid lumen, plastoglobule and thylakoid membrane. In comparative benchmarks, the proposed method outperforms current state-of-the-art methods in both single- and multi-compartment predictions, with an overall multi-label accuracy of 74%. The results demonstrate the relevance of the approach that is eligible as a good candidate for integration into more general large-scale annotation pipelines of protein subcellular localization. The method is available as web server at http://schloro.biocomp.unibo.it gigi@biocomp.unibo.it.
Prokaryotic cytoskeletons: protein filaments organizing small cells.

PubMed

Wagstaff, James; Löwe, Jan

2018-04-01

Most, if not all, bacterial and archaeal cells contain at least one protein filament system. Although these filament systems in some cases form structures that are very similar to eukaryotic cytoskeletons, the term 'prokaryotic cytoskeletons' is used to refer to many different kinds of protein filaments. Cytoskeletons achieve their functions through polymerization of protein monomers and the resulting ability to access length scales larger than the size of the monomer. Prokaryotic cytoskeletons are involved in many fundamental aspects of prokaryotic cell biology and have important roles in cell shape determination, cell division and nonchromosomal DNA segregation. Some of the filament-forming proteins have been classified into a small number of conserved protein families, for example, the almost ubiquitous tubulin and actin superfamilies. To understand what makes filaments special and how the cytoskeletons they form enable cells to perform essential functions, the structure and function of cytoskeletal molecules and their filaments have been investigated in diverse bacteria and archaea. In this Review, we bring these data together to highlight the diverse ways that linear protein polymers can be used to organize other molecules and structures in bacteria and archaea.
Protein architecture and core residues in unwound α-helices provide insights to the transport function of plant AtCHX17

DOE Office of Scientific and Technical Information (OSTI.GOV)

Czerny, Daniel D.; Padmanaban, Senthilkumar; Anishkin, Andriy

Using Arabidopsis thaliana AtCHX17 as an example, we combine structural modeling and mutagenesis to provide insights on its protein architecture and transport function which is poorly characterized. This approach is based on the observation that protein structures are significantly more conserved in evolution than linear sequences, and mechanistic similarities among diverse transporters are emerging. Two homology models of AtCHX17 were obtained that show a protein fold similar to known structures of bacterial Na +/H + antiporters, EcNhaA and TtNapA. The distinct secondary and tertiary structure models highlighted residues at positions potentially important for CHX17 activity. Mutagenesis showed that asparagine-N200 andmore » aspartate-D201 inside transmembrane5 (TM5), and lysine-K355 inside TM10 are critical for AtCHX17 activity. We reveal previously unrecognized threonine-T170 and lysine-K383 as key residues at unwound regions in the middle of TM4 and TM11 α-helices, respectively. Mutation of glutamate-E111 located near the membrane surface inhibited AtCHX17 activity, suggesting a role in pH sensing. The long carboxylic tail of unknown purpose has an alternating β-sheet and α-helix secondary structure that is conserved in prokaryote universal stress proteins. Here, these results support the overall architecture of AtCHX17 and identify D201, N200 and novel residues T170 and K383 at the functional core which likely participates in ion recognition, coordination and/or translocation, similar to characterized cation/H + exchangers. The core of AtCHX17 models according to EcNhaA and TtNapA templates faces inward and outward, respectively, which may reflect two conformational states of the alternating access transport mode for proteins belonging to the plant CHX family.« less
Protein architecture and core residues in unwound α-helices provide insights to the transport function of plant AtCHX17

DOE PAGES

Czerny, Daniel D.; Padmanaban, Senthilkumar; Anishkin, Andriy; ...

2016-05-11

Using Arabidopsis thaliana AtCHX17 as an example, we combine structural modeling and mutagenesis to provide insights on its protein architecture and transport function which is poorly characterized. This approach is based on the observation that protein structures are significantly more conserved in evolution than linear sequences, and mechanistic similarities among diverse transporters are emerging. Two homology models of AtCHX17 were obtained that show a protein fold similar to known structures of bacterial Na +/H + antiporters, EcNhaA and TtNapA. The distinct secondary and tertiary structure models highlighted residues at positions potentially important for CHX17 activity. Mutagenesis showed that asparagine-N200 andmore » aspartate-D201 inside transmembrane5 (TM5), and lysine-K355 inside TM10 are critical for AtCHX17 activity. We reveal previously unrecognized threonine-T170 and lysine-K383 as key residues at unwound regions in the middle of TM4 and TM11 α-helices, respectively. Mutation of glutamate-E111 located near the membrane surface inhibited AtCHX17 activity, suggesting a role in pH sensing. The long carboxylic tail of unknown purpose has an alternating β-sheet and α-helix secondary structure that is conserved in prokaryote universal stress proteins. Here, these results support the overall architecture of AtCHX17 and identify D201, N200 and novel residues T170 and K383 at the functional core which likely participates in ion recognition, coordination and/or translocation, similar to characterized cation/H + exchangers. The core of AtCHX17 models according to EcNhaA and TtNapA templates faces inward and outward, respectively, which may reflect two conformational states of the alternating access transport mode for proteins belonging to the plant CHX family.« less
Characterization of the fusion core in zebrafish endogenous retroviral envelope protein

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shi, Jian; State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, Hubei 430071; Zhang, Huaidong

2015-05-08

Zebrafish endogenous retrovirus (ZFERV) is the unique endogenous retrovirus in zebrafish, as yet, containing intact open reading frames of its envelope protein gene in zebrafish genome. Similarly, several envelope proteins of endogenous retroviruses in human and other mammalian animal genomes (such as syncytin-1 and 2 in human, syncytin-A and B in mouse) were identified and shown to be functional in induction of cell–cell fusion involved in placental development. ZFERV envelope protein (Env) gene appears to be also functional in vivo because it is expressible. After sequence alignment, we found ZFERV Env shares similar structural profiles with syncytin and other type Imore » viral envelopes, especially in the regions of N- and C-terminal heptad repeats (NHR and CHR) which were crucial for membrane fusion. We expressed the regions of N + C protein in the ZFERV Env (residues 459–567, including predicted NHR and CHR) to characterize the fusion core structure. We found N + C protein could form a stable coiled-coil trimer that consists of three helical NHR regions forming a central trimeric core, and three helical CHR regions packing into the grooves on the surface of the central core. The structural characterization of the fusion core revealed the possible mechanism of fusion mediated by ZFERV Env. These results gave comprehensive explanation of how the ancient virus infects the zebrafish and integrates into the genome million years ago, and showed a rational clue for discovery of physiological significance (e.g., medicate cell–cell fusion). - Highlights: • ZFERV Env shares similar structural profiles with syncytin and other type I viral envelopes. • The fusion core of ZFERV Env forms stable coiled-coil trimer including three NHRs and three CHRs. • The structural mechanism of viral entry mediated by ZFERV Env is disclosed. • The results are helpful for further discovery of physiological function of ZFERV Env in zebrafish.« less
Unrelated solubility-enhancing fusion partners MBP and NusA utilize a similar mode of action

PubMed Central

Raran-Kurussi, Sreejith; Waugh, David S.

2014-01-01

The tendency of recombinant proteins to accumulate in the form of insoluble aggregates in Escherichia coli is a major hindrance to their overproduction. One of the more effective approaches to circumvent this problem is to use translation fusion partners (solubility-enhancers, SEs). E. coli maltose binding protein (MBP) and N-utilization substance A (NusA) are arguably the most effective solubilizing agents that have been discovered so far. Here, we show that although these two proteins are structurally, functionally, and physiochemically distinct, they influence the solubility and folding of their fusion partners in a very similar manner. These SEs act as “holdases” that prevent the aggregation of their fusion partners. Subsequent folding of the passenger proteins, when it occurs, is either spontaneous or chaperone-mediated. PMID:24942647
New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

PubMed

Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard

2009-05-01

The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.
Activated protein C cofactor function of protein S: a critical role for Asp95 in the EGF1-like domain

PubMed Central

Andersson, Helena M.; Arantes, Márcia J.; Crawley, James T. B.; Luken, Brenda M.; Tran, Sinh; Dahlbäck, Björn; Rezende, Suely M.

2010-01-01

Protein S has an established role in the protein C anticoagulant pathway, where it enhances the factor Va (FVa) and factor VIIIa (FVIIIa) inactivating property of activated protein C (APC). Despite its physiological role and clinical importance, the molecular basis of its action is not fully understood. To clarify the mechanism of the protein S interaction with APC, we have constructed and expressed a library of composite or point variants of human protein S, with residue substitutions introduced into the Gla, thrombin-sensitive region (TSR), epidermal growth factor 1 (EGF1), and EGF2 domains. Cofactor activity for APC was evaluated by calibrated automated thrombography (CAT) using protein S–deficient plasma. Of 27 variants tested initially, only one, protein S D95A (within the EGF1 domain), was largely devoid of functional APC cofactor activity. Protein S D95A was, however, γ-carboxylated and bound phospholipids with an apparent dissociation constant (Kdapp) similar to that of wild-type (WT) protein S. In a purified assay using FVa R506Q/R679Q, purified protein S D95A was shown to have greatly reduced ability to enhance APC-induced cleavage of FVa Arg306. It is concluded that residue Asp95 within EGF1 is critical for APC cofactor function of protein S and could define a principal functional interaction site for APC. PMID:20308596
AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs.

PubMed

Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael

2017-06-15

Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . gribskov@purdue.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Sequence space and the ongoing expansion of the protein universe.

PubMed

Povolotskaya, Inna S; Kondrashov, Fyodor A

2010-06-17

The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

PoSSuM v.2.0: data update and a new function for investigating ligand analogs and target proteins of small-molecule drugs.

PubMed

Ito, Jun-ichi; Ikeda, Kazuyoshi; Yamada, Kazunori; Mizuguchi, Kenji; Tomii, Kentaro

2015-01-01

PoSSuM (http://possum.cbrc.jp/PoSSuM/) is a database for detecting similar small-molecule binding sites on proteins. Since its initial release in 2011, PoSSuM has grown to provide information related to 49 million pairs of similar binding sites discovered among 5.5 million known and putative binding sites. This enlargement of the database is expected to enhance opportunities for biological and pharmaceutical applications, such as predictions of new functions and drug discovery. In this release, we have provided a new service named PoSSuM drug search (PoSSuMds) at http://possum.cbrc.jp/PoSSuM/drug_search/, in which we selected 194 approved drug compounds retrieved from ChEMBL, and detected their known binding pockets and pockets that are similar to them. Users can access and download all of the search results via a new web interface, which is useful for finding ligand analogs as well as potential target proteins. Furthermore, PoSSuMds enables users to explore the binding pocket universe within PoSSuM. Additionally, we have improved the web interface with new functions, including sortable tables and a viewer for visualizing and downloading superimposed pockets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Activation of different split functionalities upon re-association of RNA-DNA hybrids

PubMed Central

Afonin, Kirill A.; Viard, Mathias; Martins, Angelica N.; Lockett, Stephen J.; Maciag, Anna E.; Freed, Eric O.; Heldman, Eliahu; Jaeger, Luc; Blumenthal, Robert; Shapiro, Bruce A.

2013-01-01

Split-protein systems, an approach that relies on fragmentation of proteins with their further conditional re-association to form functional complexes, are increasingly used for various biomedical applications. This approach offers tight control of the protein functions and improved detection sensitivity. Here we show a similar technique based on a pair of RNA-DNA hybrids that can be generally used for triggering different split functionalities. Individually, each hybrid is inactive but when two cognate hybrids re-associate, different functionalities are triggered inside mammalian cells. As a proof of concept this work is mainly focused on activation of RNA interference; however the release of other functionalities (resonance energy transfer and RNA aptamer) is also shown. Furthermore, in vivo studies demonstrate a significant uptake of the hybrids by tumors together with specific gene silencing. This split-functionality approach presents a new route in the development of “smart” nucleic acids based nanoparticles and switches for various biomedical applications. PMID:23542902
A proteomic study of the arabidopsis nuclear matrix.

PubMed

Calikowski, Tomasz T; Meulia, Tea; Meier, Iris

2003-10-01

The eukaryotic nucleus has been proposed to be organized by two interdependent nucleoprotein structures, the DNA-based chromatin and the RNA-dependent nuclear matrix. The functional composition and molecular organization of the second component have not yet been resolved. Here, we describe the isolation of the nuclear matrix from the model plant Arabidopsis, its initial characterization by confocal and electron microscopy, and the identification of 36 proteins by mass spectrometry. Electron microscopy of resinless samples confirmed a structure very similar to that described for the animal nuclear matrix. Two-dimensional gel electrophoresis resolved approximately 300 protein spots. Proteins were identified in batches by ESI tandem mass spectrometry after resolution by 1D SDS-PAGE. Among the identified proteins were a number of demonstrated or predicted Arabidopsis homologs of nucleolar proteins such as IMP4, Nop56, Nop58, fibrillarins, nucleolin, as well as ribosomal components and a putative histone deacetylase. Others included homologs of eEF-1, HSP/HSC70, and DnaJ, which have also been identified in the nucleolus or nuclear matrix of human cells, as well as a number of novel proteins with unknown function. This study is the first proteomic approach towards the characterization of a higher plant nuclear matrix. It demonstrates the striking similarities both in structure and protein composition of the operationally defined nuclear matrix across kingdoms whose unicellular ancestors have separated more than one billion years ago. Copyright 2003 Wiley-Liss, Inc.
SANSparallel: interactive homology search against Uniprot.

PubMed

Somervuo, Panu; Holm, Liisa

2015-07-01

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

PubMed Central

Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

1995-01-01

The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning.

PubMed

Mirzaei, Shokoufeh; Sidi, Tomer; Keasar, Chen; Crivelli, Silvia

2016-08-24

The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. However, selection of the best quality decoys is challenging as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.
Poly(A) polymerase contains multiple functional domains.

PubMed Central

Raabe, T; Murthy, K G; Manley, J L

1994-01-01

Poly(A) polymerase (PAP) contains regions of similarity with several known protein domains. Through site-directed mutagenesis, we provide evidence that PAP contains a functional ribonucleoprotein-type RNA binding domain (RBD) that is responsible for primer binding, making it the only known polymerase to contain such a domain. The RBD is adjacent to, and probably overlaps with, an apparent catalytic region responsible for polymerization. Despite the presence of sequence similarities, this catalytic domain appears to be distinct from the conserved polymerase module found in a large number of RNA-dependent polymerases. PAP contains two nuclear localization signals (NLSs) in its C terminus, each by itself similar to the consensus bipartite NLS found in many nuclear proteins. Mutagenesis experiments indicate that both signals, which are separated by nearly 140 residues, play important roles in directing PAP exclusively to the nucleus. Surprisingly, basic amino acids in the N-terminal-most NLS are also essential for AAUAAA-dependent polyadenylation but not for nonspecific poly(A) synthesis, suggesting that this region of PAP is involved in interactions both with nuclear targeting proteins and with nuclear polyadenylation factors. The serine/threonine-rich C terminus is multiply phosphorylated, including at sites affected by mutations in either NLS. Images PMID:8164653
RNA-binding proteins in plants: the tip of an iceberg?

NASA Technical Reports Server (NTRS)

Fedoroff, Nina V.; Federoff, N. V. (Principal Investigator)

2002-01-01

RNA-binding proteins, which are involved in the synthesis, processing, transport, translation, and degradation of RNA, are emerging as important, often multifunctional, cellular regulatory proteins. Although relatively few RNA-binding proteins have been studied in plants, they are being identified with increasing frequency, both genetically and biochemically. RNA-binding proteins that regulate chloroplast mRNA stability and translation in response to light and that have been elegantly analyzed in Clamydomonas reinhardtii have counterparts with similar functions in higher plants. Several recent reports describe mutations in genes encoding RNA-binding proteins that affect plant development and hormone signaling.
A mammalian germ cell-specific RNA-binding protein interacts with ubiquitously expressed proteins involved in splice site selection

NASA Astrophysics Data System (ADS)

Elliott, David J.; Bourgeois, Cyril F.; Klink, Albrecht; Stévenin, James; Cooke, Howard J.

2000-05-01

RNA-binding motif (RBM) genes are found on all mammalian Y chromosomes and are implicated in spermatogenesis. Within human germ cells, RBM protein shows a similar nuclear distribution to components of the pre-mRNA splicing machinery. To address the function of RBM, we have used protein-protein interaction assays to test for possible physical interactions between these proteins. We find that RBM protein directly interacts with members of the SR family of splicing factors and, in addition, strongly interacts with itself. We have mapped the protein domains responsible for mediating these interactions and expressed the mouse RBM interaction region as a bacterial fusion protein. This fusion protein can pull-down several functionally active SR protein species from cell extracts. Depletion and add-back experiments indicate that these SR proteins are the only splicing factors bound by RBM which are required for the splicing of a panel of pre-mRNAs. Our results suggest that RBM protein is an evolutionarily conserved mammalian splicing regulator which operates as a germ cell-specific cofactor for more ubiquitously expressed pre-mRNA splicing activators.
Sponge non-metastatic Group I Nme gene/protein - structure and function is conserved from sponges to humans

PubMed Central

2011-01-01

Background Nucleoside diphosphate kinases NDPK are evolutionarily conserved enzymes present in Bacteria, Archaea and Eukarya, with human Nme1 the most studied representative of the family and the first identified metastasis suppressor. Sponges (Porifera) are simple metazoans without tissues, closest to the common ancestor of all animals. They changed little during evolution and probably provide the best insight into the metazoan ancestor's genomic features. Recent studies show that sponges have a wide repertoire of genes many of which are involved in diseases in more complex metazoans. The original function of those genes and the way it has evolved in the animal lineage is largely unknown. Here we report new results on the metastasis suppressor gene/protein homolog from the marine sponge Suberites domuncula, NmeGp1Sd. The purpose of this study was to investigate the properties of the sponge Group I Nme gene and protein, and compare it to its human homolog in order to elucidate the evolution of the structure and function of Nme. Results We found that sponge genes coding for Group I Nme protein are intron-rich. Furthermore, we discovered that the sponge NmeGp1Sd protein has a similar level of kinase activity as its human homolog Nme1, does not cleave negatively supercoiled DNA and shows nonspecific DNA-binding activity. The sponge NmeGp1Sd forms a hexamer, like human Nme1, and all other eukaryotic Nme proteins. NmeGp1Sd interacts with human Nme1 in human cells and exhibits the same subcellular localization. Stable clones expressing sponge NmeGp1Sd inhibited the migratory potential of CAL 27 cells, as already reported for human Nme1, which suggests that Nme's function in migratory processes was engaged long before the composition of true tissues. Conclusions This study suggests that the ancestor of all animals possessed a NmeGp1 protein with properties and functions similar to evolutionarily recent versions of the protein, even before the appearance of true tissues and the origin of tumors and metastasis. PMID:21457554
Overexpression of neurofilament H disrupts normal cell structure and function

NASA Technical Reports Server (NTRS)

Szebenyi, Gyorgyi; Smith, George M.; Li, Ping; Brady, Scott T.

2002-01-01

Studying exogenously expressed tagged proteins in live cells has become a standard technique for evaluating protein distribution and function. Typically, expression levels of experimentally introduced proteins are not regulated, and high levels are often preferred to facilitate detection. However, overexpression of many proteins leads to mislocalization and pathologies. Therefore, for normative studies, moderate levels of expression may be more suitable. To understand better the dynamics of intermediate filament formation, transport, and stability in a healthy, living cell, we inserted neurofilament heavy chain (NFH)-green fluorescent protein (GFP) fusion constructs in adenoviral vectors with tetracycline (tet)-regulated promoters. This system allows for turning on or off the synthesis of NFH-GFP at a selected time, for a defined period, in a dose-dependent manner. We used this inducible system for live cell imaging of changes in filament structure and cell shape, motility, and transport associated with increasing NFH-GFP expression. Cells with low to intermediate levels of NFH-GFP were structurally and functionally similar to neighboring, nonexpressing cells. In contrast, overexpression led to pathological alterations in both filament organization and cell function. Copyright 2002 Wiley-Liss, Inc.
Cytotoxicity of Protein-Carbon Nanotubes on J774 Macrophages Is a Functionalization Grade-Dependent Effect

PubMed Central

Montes-Fonseca, Silvia Lorena; Sánchez-Ramírez, Blanca; Luna-Velasco, Antonia; Arzate-Quintana, Carlos; Silva-Cazares, Macrina Beatriz; González Horta, Carmen

2015-01-01

Carbon nanotubes (CNTs) are used as carriers in medicine due to their ability to be functionalized with chemical substances. However, cytotoxicity analysis is required prior to use for in vivo models. The aim of this study was to evaluate the cytotoxic effect of CNTs functionalized with a 46 kDa surface protein from Entamoeba histolytica (P46-CNTs) on J774A macrophages. With this purpose, CNTs were synthesized by spray pyrolysis and purified (P-CNTs) using sonication for 48 h. A 46 kDa protein, with a 4.6–5.4 pI range, was isolated from E. histolytica HM1:IMSS strain trophozoites using an OFFGEL system. The P-CNTs were functionalized with the purified 46 kDa protein, classified according to their degree of functionalization, and characterized by Raman and Infrared spectroscopy. In vitro cytotoxicity was evaluated by MTT, apoptosis, and morphological assays. The results demonstrated that P46-CNTs exhibited cytotoxicity dependent upon the functionalized grade. Contrary to what was expected, P46-CNTs with a high grade of functionalization were more toxic to J774 macrophages than P46-CNTs with a low grade of functionalization, than P-CNTs, and had a similar level of toxicity as UP-CNT. This suggests that the nature of the functionalized protein plays a key role in the cytotoxicity of these nanoparticles. PMID:26075262
Effect of acid- and alkaline-aided extractions on functional and rheological properties of proteins recovered from mechanically separated turkey meat (MSTM).

PubMed

Hrynets, Yuliya; Omana, Dileep A; Xu, Yan; Betti, Mirko

2010-09-01

Functional and rheological characteristics of acid- and alkali-extracted proteins from mechanically separated turkey meat (MSTM) have been investigated. Extractions were carried out at 4 pH values (2.5, 3.5, 10.5, and 11.5). The study demonstrated that alkali and acid extractions resulted in significant (P < 0.0001) decreases of cooking and water loss compared to raw MSTM; however, the cooking loss was found to be similar (P = 0.5699) among the different protein isolates. Proteins extracted at pH 10.5 showed the lowest (P = 0.0249) water loss. Emulsion and foaming properties were found to be slightly higher in alkali-extracted proteins compared to those for acid extractions. The myofibrillar protein fraction showed better ability to form and stabilize emulsions compared to sarcoplasmic proteins. Myofibrillar proteins also showed better foam expansion; however, foam volume stability was similar for both myofibrillar and sarcoplasmic protein fractions. Textural characteristics (hardness, chewiness, springiness, and cohesiveness) of recovered proteins were found to be unaffected (P > 0.05) by different extraction pH. The protein extracted at pH 3.5 formed a highly viscoelastic gel network as evidenced by storage modulus (G') values, whereas the gel formed from proteins extracted at pH 10.5 was found to be the weakest. The work also revealed that acid treatments were more effective for removal of total heme pigments from MSTM. Color characteristics of protein isolates were markedly improved compared to the initial material and tended to be better when subjected to acid extractions. Mechanically separated meat is one of the cheapest sources of protein obtained by grinding meat and bones together and forcing the mixture through a perforated drum. The use of mechanically separated turkey meat (MSTM) for the production of further processed poultry products is limited due to its undesirable color and textural properties. Recovery of proteins from MSTM using pH shifting process will help the poultry processors to get better returns and also create opportunity to produce functional food ingredients.
An orthologue of the host-defense protein psoriasin (S100A7) is expressed in frog skin.

PubMed

Matthijs, Severine; Hernalsteens, Jean-Pierre; Roelants, Kim

2017-02-01

Host-defense peptides and proteins are vital for first line protection against bacteria. Most host-defense peptides and proteins common in vertebrates have been studied primarily in mammals, while their orthologues in non-mammalian vertebrates received less attention. We found that the European Common Frog Rana temporaria expresses a protein in its skin that is evolutionarily related to the host-defense protein S100A7. This prompted us to test if the encoded protein, which is an important microbicidal protein in human skin, shows similar activity in frogs. The R. temporaria protein lacks the zinc-binding sites that are key to the antimicrobial activity of human S100A7 at neutral pH. However, despite being less potent, the R. temporaria protein does compromise bacterial membranes at low pH, similar to its human counterpart. We postulate that, while amphibian S100A7 likely serves other functions, the capacity to compromise bacterial cell membranes evolved early in tetrapod evolution. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
Structure, Biology, and Therapeutic Application of Toxin-Antitoxin Systems in Pathogenic Bacteria.

PubMed

Lee, Ki-Young; Lee, Bong-Jin

2016-10-22

Bacterial toxin-antitoxin (TA) systems have received increasing attention for their diverse identities, structures, and functional implications in cell cycle arrest and survival against environmental stresses such as nutrient deficiency, antibiotic treatments, and immune system attacks. In this review, we describe the biological functions and the auto-regulatory mechanisms of six different types of TA systems, among which the type II TA system has been most extensively studied. The functions of type II toxins include mRNA/tRNA cleavage, gyrase/ribosome poison, and protein phosphorylation, which can be neutralized by their cognate antitoxins. We mainly explore the similar but divergent structures of type II TA proteins from 12 important pathogenic bacteria, including various aspects of protein-protein interactions. Accumulating knowledge about the structure-function correlation of TA systems from pathogenic bacteria has facilitated a novel strategy to develop antibiotic drugs that target specific pathogens. These molecules could increase the intrinsic activity of the toxin by artificially interfering with the intermolecular network of the TA systems.
Hypothesis: NDL proteins function in stress responses by regulating microtubule organization

PubMed Central

Khatri, Nisha; Mudgil, Yashwanti

2015-01-01

N-MYC DOWNREGULATED-LIKE proteins (NDL), members of the alpha/beta hydrolase superfamily were recently rediscovered as interactors of G-protein signaling in Arabidopsis thaliana. Although the precise molecular function of NDL proteins is still elusive, in animals these proteins play protective role in hypoxia and expression is induced by hypoxia and nickel, indicating role in stress. Homology of NDL1 with animal counterpart N-MYC DOWNREGULATED GENE (NDRG) suggests similar functions in animals and plants. It is well established that stress responses leads to the microtubule depolymerization and reorganization which is crucial for stress tolerance. NDRG is a microtubule-associated protein which mediates the microtubule organization in animals by causing acetylation and increases the stability of α-tubulin. As NDL1 is highly homologous to NDRG, involvement of NDL1 in the microtubule organization during plant stress can also be expected. Discovery of interaction of NDL with protein kinesin light chain- related 1, enodomembrane family protein 70, syntaxin-23, tubulin alpha-2 chain, as a part of G protein interactome initiative encourages us to postulate microtubule stabilizing functions for NDL family in plants. Our search for NDL interactors in G protein interactome also predicts the role of NDL proteins in abiotic stress tolerance management. Based on published report in animals and predicted interacting partners for NDL in G protein interactome lead us to hypothesize involvement of NDL in the microtubule organization during abiotic stress management in plants. PMID:26583023
Hypothesis: NDL proteins function in stress responses by regulating microtubule organization.

PubMed

Khatri, Nisha; Mudgil, Yashwanti

2015-01-01

N-MYC DOWNREGULATED-LIKE proteins (NDL), members of the alpha/beta hydrolase superfamily were recently rediscovered as interactors of G-protein signaling in Arabidopsis thaliana. Although the precise molecular function of NDL proteins is still elusive, in animals these proteins play protective role in hypoxia and expression is induced by hypoxia and nickel, indicating role in stress. Homology of NDL1 with animal counterpart N-MYC DOWNREGULATED GENE (NDRG) suggests similar functions in animals and plants. It is well established that stress responses leads to the microtubule depolymerization and reorganization which is crucial for stress tolerance. NDRG is a microtubule-associated protein which mediates the microtubule organization in animals by causing acetylation and increases the stability of α-tubulin. As NDL1 is highly homologous to NDRG, involvement of NDL1 in the microtubule organization during plant stress can also be expected. Discovery of interaction of NDL with protein kinesin light chain- related 1, enodomembrane family protein 70, syntaxin-23, tubulin alpha-2 chain, as a part of G protein interactome initiative encourages us to postulate microtubule stabilizing functions for NDL family in plants. Our search for NDL interactors in G protein interactome also predicts the role of NDL proteins in abiotic stress tolerance management. Based on published report in animals and predicted interacting partners for NDL in G protein interactome lead us to hypothesize involvement of NDL in the microtubule organization during abiotic stress management in plants.
Structural Disorder Provides Increased Adaptability for Vesicle Trafficking Pathways

PubMed Central

Tompa, Peter

2013-01-01

Vesicle trafficking systems play essential roles in the communication between the organelles of eukaryotic cells and also between cells and their environment. Endocytosis and the late secretory route are mediated by clathrin-coated vesicles, while the COat Protein I and II (COPI and COPII) routes stand for the bidirectional traffic between the ER and the Golgi apparatus. Despite similar fundamental organizations, the molecular machinery, functions, and evolutionary characteristics of the three systems are very different. In this work, we compiled the basic functional protein groups of the three main routes for human and yeast and analyzed them from the structural disorder perspective. We found similar overall disorder content in yeast and human proteins, confirming the well-conserved nature of these systems. Most functional groups contain highly disordered proteins, supporting the general importance of structural disorder in these routes, although some of them seem to heavily rely on disorder, while others do not. Interestingly, the clathrin system is significantly more disordered (∼23%) than the other two, COPI (∼9%) and COPII (∼8%). We show that this structural phenomenon enhances the inherent plasticity and increased evolutionary adaptability of the clathrin system, which distinguishes it from the other two routes. Since multi-functionality (moonlighting) is indicative of both plasticity and adaptability, we studied its prevalence in vesicle trafficking proteins and correlated it with structural disorder. Clathrin adaptors have the highest capability for moonlighting while also comprising the most highly disordered members. The ability to acquire tissue specific functions was also used to approach adaptability: clathrin route genes have the most tissue specific exons encoding for protein segments enriched in structural disorder and interaction sites. Overall, our results confirm the general importance of structural disorder in vesicle trafficking and suggest major roles for this structural property in shaping the differences of evolutionary adaptability in the three routes. PMID:23874186
A new member of the aldo-keto reductase family from the plant pathogen Xylella fastidiosa.

PubMed

Rosselli, Luciana K; Oliveira, Cristiano L P; Azzoni, Adriano R; Tada, Susely F S; Catani, Cleide F; Saraiva, Antonio M; Soares, José Sérgio M; Medrano, Francisco J; Torriani, Iris L; Souza, Anete P

2006-09-15

The Xylella fastidiosa genome program generated a large number of gene sequences that belong to pathogenicity, virulence and adaptation categories from this important plant pathogen. One of these genes (XF1729) encodes a protein similar to a superfamily of aldo-keto reductase together with a number of structurally and functionally related NADPH-dependent oxidoreductases. In this work, the similar sequence XF1729 from X. fastidiosa was cloned onto the pET32Xa/LIC vector in order to overexpress a recombinant His-tag fusion protein in Escherichia coli BL21(DE3). The expressed protein in the soluble fraction was purified by immobilized metal affinity chromatography (agarose-IDA-Ni resin). Secondary structure contents were verified by circular dichroism spectroscopy. Small angle X-ray scattering (SAXS) measurements furnish general structural parameters and provide a strong indication that the protein has a monomeric form in solution. Also, ab initio calculations show that the protein has some similarities with a previously crystallized aldo-keto reductase protein. The recombinant XF1729 purified to homogeneity catalyzed the reduction of dl-glyceraldehyde (K(cat) 2.26s(-1), Km 8.20+/-0.98 mM) and 2-nitrobenzaldehyde (K(cat) 11.74 s(-1), Km 0.14+/-0.04 mM) in the presence of NADPH. The amino acid sequence deduced from XF1729 showed the highest identity (40% or higher) with several functional unknown proteins. Among the identified AKRs, we found approximately 29% of identity with YakC (AKR13), 30 and 28% with AKR11A and AKR11B, respectively. The results establish XF1729 as the new member of AKR family, AKR13B1. Finally, the first characterization by gel filtration chromatography assays indicates that the protein has an elongated shape, which generates an apparent higher molecular weight. The study of this protein is an effort to fight X. fastidiosa, which causes tremendous losses in many economically important plants.
Metamorphic Proteins: Emergence of Dual Protein Folds from One Primary Sequence.

PubMed

Lella, Muralikrishna; Mahalakshmi, Radhakrishnan

2017-06-20

Every amino acid exhibits a different propensity for distinct structural conformations. Hence, decoding how the primary amino acid sequence undergoes the transition to a defined secondary structure and its final three-dimensional fold is presently considered predictable with reasonable certainty. However, protein sequences that defy the first principles of secondary structure prediction (they attain two different folds) have recently been discovered. Such proteins, aptly named metamorphic proteins, decrease the conformational constraint by increasing flexibility in the secondary structure and thereby result in efficient functionality. In this review, we discuss the major factors driving the conformational switch related both to protein sequence and to structure using illustrative examples. We discuss the concept of an evolutionary transition in sequence and structure, the functional impact of the tertiary fold, and the pressure of intrinsic and external factors that give rise to metamorphic proteins. We mainly focus on the major components of protein architecture, namely, the α-helix and β-sheet segments, which are involved in conformational switching within the same or highly similar sequences. These chameleonic sequences are widespread in both cytosolic and membrane proteins, and these folds are equally important for protein structure and function. We discuss the implications of metamorphic proteins and chameleonic peptide sequences in de novo peptide design.

DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures

PubMed Central

2013-01-01

Background The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. Results We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. Conclusions The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis. PMID:24067102
Fluorescein isothiocyanate-labeled human plasma fibronectin in extracellular matrix remodeling.

PubMed

Hoffmann, Celine; Leroy-Dudal, Johanne; Patel, Salima; Gallet, Olivier; Pauthe, Emmanuel

2008-01-01

Fluorescein isothiocyanate (FITC) is a well-known probe for labeling biologically relevant proteins. However, the impact of the labeling procedure on protein structure and biological activities remains unclear. In this work, FITC-labeled human plasma fibronectin (Fn) was developed to gain insight into the dynamic relationship between cells and Fn. The similarities and differences concerning the structure and function between Fn-FITC and standard Fn were evaluated using biochemical as well as cellular approaches. By varying the FITC/Fn ratio, we demonstrated that overlabeling (>10 FITC molecules/Fn molecule) induces probe fluorescence quenching, protein aggregation, and cell growth modifications. A correct balance between reliable fluorescence for detection and no significant modifications to structure and biological function compared with standard Fn was obtained with a final ratio of 3 FITC molecules per Fn molecule (Fn-FITC3). Fn-FITC3, similar to standard Fn, is correctly recruited into the cell matrix network. Also, Fn-FITC3 is proposed to be a powerful molecular tool to investigate Fn organization and cellular behavior concomitantly.
A Novel Gibberellin-Induced Gene from Rice and Its Potential Regulatory Role in Stem Growth1

PubMed Central

van der Knaap, Esther; Kim, Jeong Hoe; Kende, Hans

2000-01-01

Os-GRF1 (Oryza sativa-GROWTH-REGULATING FACTOR1) was identified in a search for genes that are differentially expressed in the intercalary meristem of deepwater rice (Oryza sativa L.) internodes in response to gibberellin (GA). Os-GRF1 displays general features of transcription factors, contains a functional nuclear localization signal, and has three regions with similarities to sequences in the database. One of these regions is similar to a protein interaction domain of SWI2/SNF2, which is a subunit of a chromatin-remodeling complex in yeast. The two other domains are novel and found only in plant proteins of unknown function. To study its role in plant growth, Os-GRF1 was expressed in Arabidopsis. Stem elongation of transformed plants was severely inhibited, and normal growth could not be recovered by the application of GA. Our results indicate that Os-GRF1 belongs to a novel class of plant proteins and may play a regulatory role in GA-induced stem elongation. PMID:10712532
Rapid comparison of properties on protein surface

PubMed Central

Sael, Lee; La, David; Li, Bin; Rustamov, Raif; Kihara, Daisuke

2008-01-01

The mapping of physicochemical characteristics onto the surface of a protein provides crucial insights into its function and evolution. This information can be further used in the characterization and identification of similarities within protein surface regions. We propose a novel method which quantitatively compares global and local properties on the protein surface. We have tested the method on comparison of electrostatic potentials and hydrophobicity. The method is based on 3D Zernike descriptors, which provides a compact representation of a given property defined on a protein surface. Compactness and rotational invariance of this descriptor enable fast comparison suitable for database searches. The usefulness of this method is exemplified by studying several protein families including globins, thermophilic and mesophilic proteins, and active sites of TIM β/α barrel proteins. In all the cases studied, the descriptor is able to cluster proteins into functionally relevant groups. The proposed approach can also be easily extended to other surface properties. This protein surface-based approach will add a new way of viewing and comparing proteins to conventional methods, which compare proteins in terms of their primary sequence or tertiary structure. PMID:18618695
Rapid comparison of properties on protein surface.

PubMed

Sael, Lee; La, David; Li, Bin; Rustamov, Raif; Kihara, Daisuke

2008-10-01

The mapping of physicochemical characteristics onto the surface of a protein provides crucial insights into its function and evolution. This information can be further used in the characterization and identification of similarities within protein surface regions. We propose a novel method which quantitatively compares global and local properties on the protein surface. We have tested the method on comparison of electrostatic potentials and hydrophobicity. The method is based on 3D Zernike descriptors, which provides a compact representation of a given property defined on a protein surface. Compactness and rotational invariance of this descriptor enable fast comparison suitable for database searches. The usefulness of this method is exemplified by studying several protein families including globins, thermophilic and mesophilic proteins, and active sites of TIM beta/alpha barrel proteins. In all the cases studied, the descriptor is able to cluster proteins into functionally relevant groups. The proposed approach can also be easily extended to other surface properties. This protein surface-based approach will add a new way of viewing and comparing proteins to conventional methods, which compare proteins in terms of their primary sequence or tertiary structure.
The Arabidopsis KIN17 and its homolog KLP mediate different aspects of plant growth and development.

PubMed

Garcia-Molina, Antoni; Xing, Shuping; Huijser, Peter

2014-01-01

Proteins harboring the kin17 domain (KIN17) constitute a family of well-conserved eukaryotic nuclear proteins involved in nucleic acid metabolism. In mammals, KIN17 orthologs contribute to DNA replication, RNA splicing, and DNA integrity maintenance. Recently, we reported a functional characterization of an Arabidopsis thaliana KIN17 homolog (AtKIN17) that uncovered a role for this protein in tuning physiological responses during copper (Cu) deficiency and oxidative stress. However, functions similar to those described in mammals may also be expected in plants given the conservation of functional domains in KIN17 orthologs. Here, we provide additional data consistent with the participation of AtKIN17 in controlling general plant growth and development, as well as in response to UV radiation. Furthermore, the Arabidopsis genome codes for a second homolog to KIN17, we referred to as KIN17-like-protein (KLP). KLP loss-of-function lines exhibited a reduced inhibition of root growth in response to copper excess and relatively elongated hypocotyls in etiolated seedlings. Altogether, our experimental data point to a general function of the kin17 domain proteins in plant growth and development.
The Arabidopsis KIN17 and its homolog KLP mediate different aspects of plant growth and development

PubMed Central

Garcia-Molina, Antoni; Xing, Shuping; Huijser, Peter

2014-01-01

Proteins harboring the kin17 domain (KIN17) constitute a family of well-conserved eukaryotic nuclear proteins involved in nucleic acid metabolism. In mammals, KIN17 orthologs contribute to DNA replication, RNA splicing, and DNA integrity maintenance. Recently, we reported a functional characterization of an Arabidopsis thaliana KIN17 homolog (AtKIN17) that uncovered a role for this protein in tuning physiological responses during copper (Cu) deficiency and oxidative stress. However, functions similar to those described in mammals may also be expected in plants given the conservation of functional domains in KIN17 orthologs. Here, we provide additional data consistent with the participation of AtKIN17 in controlling general plant growth and development, as well as in response to UV radiation. Furthermore, the Arabidopsis genome codes for a second homolog to KIN17, we referred to as KIN17-LIKE-PROTEIN (KLP). KLP loss-of-function lines exhibited a reduced inhibition of root growth in response to copper excess and relatively elongated hypocotyls in etiolated seedlings. Altogether, our experimental data point to a general function of the kin17 domain proteins in plant growth and development. PMID:24713636
Global analysis of the rat and human platelet proteome – the molecular blueprint for illustrating multi-functional platelets and cross-species function evolution

PubMed Central

Yu, Yanbao; Leng, Taohua; Yun, Dong; Liu, Na; Yao, Jun; Dai, Ying; Yang, Pengyuan; Chen, Xian

2013-01-01

Emerging evidences indicate that blood platelets function in multiple biological processes including immune response, bone metastasis and liver regeneration in addition to their known roles in hemostasis and thrombosis. Global elucidation of platelet proteome will provide the molecular base of these platelet functions. Here, we set up a high throughput platform for maximum exploration of the rat/human platelet proteome using integrated proteomics technologies, and then applied to identify the largest number of the proteins expressed in both rat and human platelets. After stringent statistical filtration, a total of 837 unique proteins matched with at least two unique peptides were precisely identified, making it the first comprehensive protein database so far for rat platelets. Meanwhile, quantitative analyses of the thrombin-stimulated platelets offered great insights into the biological functions of platelet proteins and therefore confirmed our global profiling data. A comparative proteomic analysis between rat and human platelets was also conducted, which revealed not only a significant similarity, but also an across-species evolutionary link that the orthologous proteins representing ‘core proteome’, and the ‘evolutionary proteome’ is actually a relatively static proteome. PMID:20443191
Protein Aggregates and Novel Presenilin Gene Variants in Idiopathic Dilated Cardiomyopathy

PubMed Central

Gianni, Davide; Li, Airong; Tesco, Giuseppina; McKay, Kenneth M.; Moore, John; Raygor, Kunal; Rota, Marcello; Gwathmey, Judith K; Dec, G William; Aretz, Thomas; Leri, Annarosa; Semigran, Marc J; Anversa, Piero; Macgillivray, Thomas E; Tanzi, Rudolph E.; Monte, Federica del

2010-01-01

Background Heart failure (HF) is a debilitating condition resulting in severe disability and death. In a subset of cases, clustered as Idiopathic Dilated Cardiomyopathy (iDCM), the origin of HF is unknown. In the brain of patients with dementia, proteinaceous aggregates and abnormal oligomeric assemblies of β-amyloid impair cell function and lead to cell death. Methods and Results We have similarly characterized fibrillar and oligomeric assemblies in the hearts of iDCM patients pointing to abnormal protein aggregation as a determinant of iDCM. We also showed that oligomers alter myocyte Ca2+ homeostasis. Additionally, we have identified two new sequence variants in the presenilin-1 (PSEN1) gene promoter leading to reduced gene and protein expression. We also show that presenilin-1 co-immunoprecipitates with SERCA2a. Conclusions Based on these findings we propose that two mechanisms may link protein aggregation and cardiac function: oligomer-induced changes on Ca2+ handling and a direct effect of PSEN1 sequence variants on EC-coupling protein function. PMID:20194882
The proteome: structure, function and evolution

PubMed Central

Fleming, Keiran; Kelley, Lawrence A; Islam, Suhail A; MacCallum, Robert M; Muller, Arne; Pazos, Florencio; Sternberg, Michael J.E

2006-01-01

This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family. PMID:16524832
Zebrafish ("Danio rerio") endomembrane antiporter similar to a yeast cation/H(+) transporter is required for neural crest development

USDA-ARS?s Scientific Manuscript database

CAtion/H (+) eXchangers (CAXs) are integral membrane proteins that transport Ca (2+) or other cations by exchange with protons. While several yeast and plant CAX proteins have been characterized, no functional analysis of a vertebrate CAX homologue has yet been reported. In this study, we further ch...
Hormone signaling through protein destruction: a lesson from plants.

PubMed

Tan, Xu; Zheng, Ning

2009-02-01

Ubiquitin-dependent protein degradation has emerged as a major pathway regulating eukaryotic biology. By employing a variety of ubiquitin ligases to target specific cellular proteins, the ubiquitin-proteasome system controls physiological processes in a highly regulated fashion. Recent studies on a plant hormone auxin have unveiled a novel paradigm of signal transduction in which ubiquitin ligases function as hormone receptors. Perceived by the F-box protein subunit of the SCF(TIR1) ubiquitin ligase, auxin directly promotes the recruitment of a family of transcriptional repressors for ubiquitination, thereby activating extensive transcriptional programs. Structural studies have revealed that auxin functions through a "molecular glue" mechanism to enhance protein-protein interactions with the assistance of another small molecule cofactor, inositol hexakisphosphate. Given the extensive repertoire of similar ubiquitin ligases in eukaryotic cells, this novel and widely adopted hormone-signaling mechanism in plants may also exist in other organisms.
A phylogenetic analysis of normal modes evolution in enzymes and its relationship to enzyme function

PubMed Central

Lai, Jason; Jin, Jing; Kubelka, Jan; Liberles, David A.

2012-01-01

Since the dynamic nature of protein structures is essential for enzymatic function, it is expected that the functional evolution can be inferred from the changes in the protein dynamics. However, dynamics can also diverge neutrally with sequence substitution between enzymes without changes of function. In this study, a phylogenetic approach is implemented to explore the relationship between enzyme dynamics and function through evolutionary history. Protein dynamics are described by normal mode analysis based on a simplified harmonic potential force field applied to the reduced Cα representation of the protein structure while enzymatic function is described by Enzyme Commission (EC) numbers. Similarity of the binding pocket dynamics at each branch of the protein family’s phylogeny was analyzed in two ways: 1) explicitly by quantifying the normal mode overlap calculated for the reconstructed ancestral proteins at each end and 2) implicitly using a diffusion model to obtain the reconstructed lineage-specific changes in the normal modes. Both explicit and implicit ancestral reconstruction identified generally faster rates of change in dynamics compared with the expected change from neutral evolution at the branches of potential functional divergences for the alpha-amylase, D-isomer specific 2-hydroxyacid dehydrogenase, and copper-containing amine oxidase protein families. Normal modes analysis added additional information over just comparing the RMSD of static structures. However, the branch-specific changes were not statistically significant compared to background function-independent neutral rates of change of dynamic properties and blind application of the analysis would not enable prediction of changes in enzyme specificity. PMID:22651983
A phylogenetic analysis of normal modes evolution in enzymes and its relationship to enzyme function.

PubMed

Lai, Jason; Jin, Jing; Kubelka, Jan; Liberles, David A

2012-09-21

Since the dynamic nature of protein structures is essential for enzymatic function, it is expected that functional evolution can be inferred from the changes in protein dynamics. However, dynamics can also diverge neutrally with sequence substitution between enzymes without changes of function. In this study, a phylogenetic approach is implemented to explore the relationship between enzyme dynamics and function through evolutionary history. Protein dynamics are described by normal mode analysis based on a simplified harmonic potential force field applied to the reduced C(α) representation of the protein structure while enzymatic function is described by Enzyme Commission numbers. Similarity of the binding pocket dynamics at each branch of the protein family's phylogeny was analyzed in two ways: (1) explicitly by quantifying the normal mode overlap calculated for the reconstructed ancestral proteins at each end and (2) implicitly using a diffusion model to obtain the reconstructed lineage-specific changes in the normal modes. Both explicit and implicit ancestral reconstruction identified generally faster rates of change in dynamics compared with the expected change from neutral evolution at the branches of potential functional divergences for the α-amylase, D-isomer-specific 2-hydroxyacid dehydrogenase, and copper-containing amine oxidase protein families. Normal mode analysis added additional information over just comparing the RMSD of static structures. However, the branch-specific changes were not statistically significant compared to background function-independent neutral rates of change of dynamic properties and blind application of the analysis would not enable prediction of changes in enzyme specificity. Copyright © 2012 Elsevier Ltd. All rights reserved.
Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

PubMed Central

Ghaskadbi, Saroj

2013-01-01

Xeroderma pigmentosum group A (XPA) is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER) pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1) and replication protein A 70 kDa subunit (RPA70) proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla. PMID:24083246
Structural and sequence similarities of hydra xeroderma pigmentosum A protein to human homolog suggest early evolution and conservation.

PubMed

Barve, Apurva; Ghaskadbi, Saroj; Ghaskadbi, Surendra

2013-01-01

Xeroderma pigmentosum group A (XPA) is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER) pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1) and replication protein A 70 kDa subunit (RPA70) proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.
[Biological evaluation of a protein mixture intended for enteral nutrition].

PubMed

Meneses, J Olza; Foulquie, J Porres; Valero, G Urbano; de Victoria, E Martínez; Hernández, A Gil

2008-01-01

Enteral nutrition is the best way to feed or supplement the diet when gastrointestinal tract functions of patients are partially or totally preserved. Whenever total enteral nutrition is needed, it represents the only source of nutrients for patients. Thus, it is mandatory to ensure that high biological value proteins are included in enteral formulae. To assess the biological quality of a protein blend constituted by 50% potassium caseinate, 25% whey protein and 25% pea protein intended to be used in enteral nutrition products. Forty Wistar rats (20 male and 20 female), with initial body weight of 51 g, where divided into four groups and feed for 10 days with: casein (Control), experimental protein blend (Experimental), liophylized normo- and hyperproteic enteral nutrition formulae adapted to the animal nutritional requirements (Normoproteic and Hyperproteic). Protein efficiency ratio (PER), apparent digestibility coefficient (ADC), relationship between retained and absorbed nitrogen (R/A) and relationship between retained and consumed nitrogen (R/I) where calculated. Experimental and control groups had similar values for all analysed indices (PER, ADC, R/A and R/I). These indices where also similar between normo and hyperproteic groups, but lower than experimental and control groups, except in PER, where normoproteic group was either similar to control and hiperproteic group. The quality of the protein blend used in this study is high. It is a good protein source to be used in the development of new enteral nutritional products.
Similarity in Shape Dictates Signature Intrinsic Dynamics Despite No Functional Conservation in TIM Barrel Enzymes

PubMed Central

Tiwari, Sandhya P.; Reuter, Nathalie

2016-01-01

The conservation of the intrinsic dynamics of proteins emerges as we attempt to understand the relationship between sequence, structure and functional conservation. We characterise the conservation of such dynamics in a case where the structure is conserved but function differs greatly. The triosephosphate isomerase barrel fold (TBF), renowned for its 8 β-strand-α-helix repeats that close to form a barrel, is one of the most diverse and abundant folds found in known protein structures. Proteins with this fold have diverse enzymatic functions spanning five of six Enzyme Commission classes, and we have picked five different superfamily candidates for our analysis using elastic network models. We find that the overall shape is a large determinant in the similarity of the intrinsic dynamics, regardless of function. In particular, the β-barrel core is highly rigid, while the α-helices that flank the β-strands have greater relative mobility, allowing for the many possibilities for placement of catalytic residues. We find that these elements correlate with each other via the loops that link them, as opposed to being directly correlated. We are also able to analyse the types of motions encoded by the normal mode vectors of the α-helices. We suggest that the global conservation of the intrinsic dynamics in the TBF contributes greatly to its success as an enzymatic scaffold both through evolution and enzyme design. PMID:27015412
The Saccharomyces cerevisiae enolase-related regions encode proteins that are active enolases.

PubMed

Kornblatt, M J; Richard Albert, J; Mattie, S; Zakaib, J; Dayanandan, S; Hanic-Joyce, P J; Joyce, P B M

2013-02-01

In addition to two genes (ENO1 and ENO2) known to code for enolase (EC4.2.1.11), the Saccharomyces cerevisiae genome contains three enolase-related regions (ERR1, ERR2 and ERR3) which could potentially encode proteins with enolase function. Here, we show that products of these genes (Err2p and Err3p) have secondary and quaternary structures similar to those of yeast enolase (Eno1p). In addition, Err2p and Err3p can convert 2-phosphoglycerate to phosphoenolpyruvate, with kinetic parameters similar to those of Eno1p, suggesting that these proteins could function as enolases in vivo. To address this possibility, we overexpressed the ERR2 and ERR3 genes individually in a double-null yeast strain lacking ENO1 and ENO2, and showed that either ERR2 or ERR3 could complement the growth defect in this strain when cells are grown in medium with glucose as the carbon source. Taken together, these data suggest that the ERR genes in Saccharomyces cerevisiae encode a protein that could function in glycolysis as enolase. The presence of these enolase-related regions in Saccharomyces cerevisiae and their absence in other related yeasts suggests that these genes may play some unique role in Saccharomyces cerevisiae. Further experiments will be required to determine whether these functions are related to glycolysis or other cellular processes. Copyright © 2012 John Wiley & Sons, Ltd.
MitProNet: A Knowledgebase and Analysis Platform of Proteome, Interactome and Diseases for Mammalian Mitochondria

PubMed Central

Mao, Song; Chai, Xiaoqiang; Hu, Yuling; Hou, Xugang; Tang, Yiheng; Bi, Cheng; Li, Xiao

2014-01-01

Mitochondrion plays a central role in diverse biological processes in most eukaryotes, and its dysfunctions are critically involved in a large number of diseases and the aging process. A systematic identification of mitochondrial proteomes and characterization of functional linkages among mitochondrial proteins are fundamental in understanding the mechanisms underlying biological functions and human diseases associated with mitochondria. Here we present a database MitProNet which provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. First an inventory of mammalian mitochondrial proteins was compiled by widely collecting proteomic datasets, and the proteins were classified by machine learning to achieve a high-confidence list of mitochondrial proteins. The current version of MitProNet covers 1124 high-confidence proteins, and the remainders were further classified as middle- or low-confidence. An organelle-specific network of functional linkages among mitochondrial proteins was then generated by integrating genomic features encoded by a wide range of datasets including genomic context, gene expression profiles, protein-protein interactions, functional similarity and metabolic pathways. The functional-linkage network should be a valuable resource for the study of biological functions of mitochondrial proteins and human mitochondrial diseases. Furthermore, we utilized the network to predict candidate genes for mitochondrial diseases using prioritization algorithms. All proteins, functional linkages and disease candidate genes in MitProNet were annotated according to the information collected from their original sources including GO, GEO, OMIM, KEGG, MIPS, HPRD and so on. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases. MitProNet is freely accessible at http://bio.scu.edu.cn:8085/MitProNet. PMID:25347823

Functions of the cellular prion protein, the end of Moore's law, and Ockham's razor theory.

PubMed

del Río, José A; Gavín, Rosalina

2016-01-01

Since its discovery the cellular prion protein (encoded by the Prnp gene) has been associated with a large number of functions. The proposed functions rank from basic cellular processes such as cell cycle and survival to neural functions such as behavior and neuroprotection, following a pattern similar to that of Moore's law for electronics. In addition, particular interest is increasing in the participation of Prnp in neurodegeneration. However, in recent years a redefinition of these functions has begun, since examples of previously attributed functions were increasingly re-associated with other proteins. Most of these functions are linked to so-called "Prnp-flanking genes" that are close to the genomic locus of Prnp and which are present in the genome of some Prnp mouse models. In addition, their role in neuroprotection against convulsive insults has been confirmed in recent studies. Lastly, in recent years a large number of models indicating the participation of different domains of the protein in apoptosis have been uncovered. However, after more than 10 years of molecular dissection our view is that the simplest mechanistic model in PrP(C)-mediated cell death should be considered, as Ockham's razor theory suggested.
Binary Classification using Decision Tree based Genetic Programming and Its Application to Analysis of Bio-mass Data

NASA Astrophysics Data System (ADS)

To, Cuong; Pham, Tuan D.

2010-01-01

In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.
Genetic and molecular characterization of a gene encoding a wide specificity purine permease of Aspergillus nidulans reveals a novel family of transporters conserved in prokaryotes and eukaryotes.

PubMed

Diallinas, G; Gorfinkiel, L; Arst, H N; Cecchetto, G; Scazzocchio, C

1995-04-14

In Aspergillus nidulans, loss-of-function mutations in the uapA and azgA genes, encoding the major uric acid-xanthine and hypoxanthine-adenine-guanine permeases, respectively, result in impaired utilization of these purines as sole nitrogen sources. The residual growth of the mutant strains is due to the activity of a broad specificity purine permease. We have identified uapC, the gene coding for this third permease through the isolation of both gain-of-function and loss-of-function mutations. Uptake studies with wild-type and mutant strains confirmed the genetic analysis and showed that the UapC protein contributes 30% and 8-10% to uric acid and hypoxanthine transport rates, respectively. The uapC gene was cloned, its expression studied, its sequence and transcript map established, and the sequence of its putative product analyzed. uapC message accumulation is: (i) weakly induced by 2-thiouric acid; (ii) repressed by ammonium; (iii) dependent on functional uaY and areA regulatory gene products (mediating uric acid induction and nitrogen metabolite repression, respectively); (iv) increased by uapC gain-of-function mutations which specifically, but partially, suppress a leucine to valine mutation in the zinc finger of the protein coded by the areA gene. The putative uapC gene product is a highly hydrophobic protein of 580 amino acids (M(r) = 61,251) including 12-14 putative transmembrane segments. The UapC protein is highly similar (58% identity) to the UapA permease and significantly similar (23-34% identity) to a number of bacterial transporters. Comparisons of the sequences and hydropathy profiles of members of this novel family of transporters yield insights into their structure, functionally important residues, and possible evolutionary relationships.
Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

PubMed Central

Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

2010-01-01

Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614
Proteome analysis of the almond kernel (Prunus dulcis).

PubMed

Li, Shugang; Geng, Fang; Wang, Ping; Lu, Jiankang; Ma, Meihu

2016-08-01

Almond (Prunus dulcis) is a popular tree nut worldwide and offers many benefits to human health. However, the importance of almond kernel proteins in the nutrition and function in human health requires further evaluation. The present study presents a systematic evaluation of the proteins in the almond kernel using proteomic analysis. The nutrient and amino acid content in almond kernels from Xinjiang is similar to that of American varieties; however, Xinjiang varieties have a higher protein content. Two-dimensional electrophoresis analysis demonstrated a wide distribution of molecular weights and isoelectric points of almond kernel proteins. A total of 434 proteins were identified by LC-MS/MS, and most were proteins that were experimentally confirmed for the first time. Gene ontology (GO) analysis of the 434 proteins indicated that proteins involved in primary biological processes including metabolic processes (67.5%), cellular processes (54.1%), and single-organism processes (43.4%), the main molecular function of almond kernel proteins are in catalytic activity (48.0%), binding (45.4%) and structural molecule activity (11.9%), and proteins are primarily distributed in cell (59.9%), organelle (44.9%), and membrane (22.8%). Almond kernel is a source of a wide variety of proteins. This study provides important information contributing to the screening and identification of almond proteins, the understanding of almond protein function, and the development of almond protein products. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.
Structure of the virulence-associated protein VapD from the intracellular pathogen Rhodococcus equi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whittingham, Jean L.; Blagova, Elena V.; Finn, Ciaran E.

2014-08-01

VapD is one of a set of highly homologous virulence-associated proteins from the multi-host pathogen Rhodococcus equi. The crystal structure reveals an eight-stranded β-barrel with a novel fold and a glycine rich ‘bald’ surface. Rhodococcus equi is a multi-host pathogen that infects a range of animals as well as immune-compromised humans. Equine and porcine isolates harbour a virulence plasmid encoding a homologous family of virulence-associated proteins associated with the capacity of R. equi to divert the normal processes of endosomal maturation, enabling bacterial survival and proliferation in alveolar macrophages. To provide a basis for probing the function of the Vapmore » proteins in virulence, the crystal structure of VapD was determined. VapD is a monomer as determined by multi-angle laser light scattering. The structure reveals an elliptical, compact eight-stranded β-barrel with a novel strand topology and pseudo-twofold symmetry, suggesting evolution from an ancestral dimer. Surface-associated octyl-β-d-glucoside molecules may provide clues to function. Circular-dichroism spectroscopic analysis suggests that the β-barrel structure is preceded by a natively disordered region at the N-terminus. Sequence comparisons indicate that the core folds of the other plasmid-encoded virulence-associated proteins from R. equi strains are similar to that of VapD. It is further shown that sequences encoding putative R. equi Vap-like proteins occur in diverse bacterial species. Finally, the functional implications of the structure are discussed in the light of the unique structural features of VapD and its partial structural similarity to other β-barrel proteins.« less
Identification of learning and memory genes in canine; promoter investigation and determining the selective pressure.

PubMed

Seifi Moroudi, Reihane; Masoudi, Ali Akbar; Vaez Torshizi, Rasoul; Zandi, Mohammad

2014-12-01

One of the important behaviors of dogs is trainability which is affected by learning and memory genes. These kinds of the genes have not yet been identified in dogs. In the current research, these genes were found in animal models by mining the biological data and scientific literatures. The proteins of these genes were obtained from the UniProt database in dogs and humans. Not all homologous proteins perform similar functions, thus comparison of these proteins was studied in terms of protein families, domains, biological processes, molecular functions, and cellular location of metabolic pathways in Interpro, KEGG, Quick Go and Psort databases. The results showed that some of these proteins have the same performance in the rat or mouse, dog, and human. It is anticipated that the protein of these genes may be effective in learning and memory in dogs. Then, the expression pattern of the recognized genes was investigated in the dog hippocampus using the existing information in the GEO profile. The results showed that BDNF, TAC1 and CCK genes are expressed in the dog hippocampus, therefore, these genes could be strong candidates associated with learning and memory in dogs. Subsequently, due to the importance of the promoter regions in gene function, this region was investigated in the above genes. Analysis of the promoter indicated that the HNF-4 site of BDNF gene and the transcription start site of CCK gene is exposed to methylation. Phylogenetic analysis of protein sequences of these genes showed high similarity in each of these three genes among the studied species. The dN/dS ratio for BDNF, TAC1 and CCK genes indicates a purifying selection during the evolution of the genes.
Evolution of heliobacteria: implications for photosynthetic reaction center complexes

NASA Technical Reports Server (NTRS)

Vermaas, W. F.; Blankenship, R. E. (Principal Investigator)

1994-01-01

The evolutionary position of the heliobacteria, a group of green photosynthetic bacteria with a photosynthetic apparatus functionally resembling Photosystem I of plants and cyanobacteria, has been investigated with respect to the evolutionary relationship to Gram-positive bacteria and cyanobacteria. On the basis of 16S rRNA sequence analysis, the heliobacteria appear to be most closely related to Gram-positive bacteria, but also an evolutionary link to cyanobacteria is evident. Interestingly, a 46-residue domain including the putative sixth membrane-spanning region of the heliobacterial reaction center protein show rather strong similarity (33% identity and 72% similarity) to a region including the sixth membrane-spanning region of the CP47 protein, a chlorophyll-binding core antenna polypeptide of Photosystem II. The N-terminal half of the heliobacterial reaction center polypeptide shows a moderate sequence similarity (22% identity over 232 residues) with the CP47 protein, which is significantly more than the similarity with the Photosystem I core polypeptides in this region. An evolutionary model for photosynthetic reaction center complexes is discussed, in which an ancestral homodimeric reaction center protein (possibly resembling the heliobacterial reaction center protein) with 11 membrane-spanning regions per polypeptide has diverged to give rise to the core of Photosystem I, Photosystem II, and of the photosynthetic apparatus in green, purple, and heliobacteria.
Nitrogen Balance and Protein Requirements for Critically Ill Older Patients.

PubMed

Dickerson, Roland N

2016-04-18

Critically ill older patients with sarcopenia experience greater morbidity and mortality than younger patients. It is anticipated that unabated protein catabolism would be detrimental for the critically ill older patient. Healthy older subjects experience a diminished response to protein supplementation when compared to their younger counterparts, but this anabolic resistance can be overcome by increasing protein intake. Preliminary evidence suggests that older patients may respond differently to protein intake than younger patients during critical illness as well. If sufficient protein intake is given, older patients can achieve a similar nitrogen accretion response as younger patients even during critical illness. However, there is concern among some clinicians that increasing protein intake in older patients during critical illness may lead to azotemia due to decreased renal functional reserve which may augment the propensity towards worsened renal function and worsened clinical outcomes. Current evidence regarding protein requirements, nitrogen balance, ureagenesis, and clinical outcomes during nutritional therapy for critically ill older patients is reviewed.
Nitrogen Balance and Protein Requirements for Critically Ill Older Patients

PubMed Central

Dickerson, Roland N.

2016-01-01

Critically ill older patients with sarcopenia experience greater morbidity and mortality than younger patients. It is anticipated that unabated protein catabolism would be detrimental for the critically ill older patient. Healthy older subjects experience a diminished response to protein supplementation when compared to their younger counterparts, but this anabolic resistance can be overcome by increasing protein intake. Preliminary evidence suggests that older patients may respond differently to protein intake than younger patients during critical illness as well. If sufficient protein intake is given, older patients can achieve a similar nitrogen accretion response as younger patients even during critical illness. However, there is concern among some clinicians that increasing protein intake in older patients during critical illness may lead to azotemia due to decreased renal functional reserve which may augment the propensity towards worsened renal function and worsened clinical outcomes. Current evidence regarding protein requirements, nitrogen balance, ureagenesis, and clinical outcomes during nutritional therapy for critically ill older patients is reviewed. PMID:27096868
SARS-unique fold in the Rousettus bat coronavirus HKU9.

PubMed

Hammond, Robert G; Tan, Xuan; Johnson, Margaret A

2017-09-01

The coronavirus nonstructural protein 3 (nsp3) is a multifunctional protein that comprises multiple structural domains. This protein assists viral polyprotein cleavage, host immune interference, and may play other roles in genome replication or transcription. Here, we report the solution NMR structure of a protein from the "SARS-unique region" of the bat coronavirus HKU9. The protein contains a frataxin fold or double-wing motif, which is an α + β fold that is associated with protein/protein interactions, DNA binding, and metal ion binding. High structural similarity to the human severe acute respiratory syndrome (SARS) coronavirus nsp3 is present. A possible functional site that is conserved among some betacoronaviruses has been identified using bioinformatics and biochemical analyses. This structure provides strong experimental support for the recent proposal advanced by us and others that the "SARS-unique" region is not unique to the human SARS virus, but is conserved among several different phylogenetic groups of coronaviruses and provides essential functions. © 2017 The Protein Society.
SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis.

PubMed

Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V Lila; Karagouni, Amalia D; Tsakalidis, Athanasios; Kossida, Sophia

2012-01-01

Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality.
SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis

PubMed Central

Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V. Lila; Karagouni, Amalia D.; Tsakalidis, Athanasios; Kossida, Sophia

2012-01-01

Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality. PMID:22267904
The Caenorhabditis elegans EGL-15 Signaling Pathway Implicates a DOS-Like Multisubstrate Adaptor Protein in Fibroblast Growth Factor Signal Transduction

PubMed Central

Schutzman, Jennifer L.; Borland, Christina Z.; Newman, John C.; Robinson, Matthew K.; Kokel, Michelle; Stern, Michael J.

2001-01-01

EGL-15 is a fibroblast growth factor receptor in the nematode Caenorhabditis elegans. Components that mediate EGL-15 signaling have been identified via mutations that confer a Clear (Clr) phenotype, indicative of hyperactivity of this pathway, or a suppressor-of-Clr (Soc) phenotype, indicative of reduced pathway activity. We have isolated a gain-of-function allele of let-60 ras that confers a Clr phenotype and implicated both let-60 ras and components of a mitogen-activated protein kinase cascade in EGL-15 signaling by their Soc phenotype. Epistasis analysis indicates that the gene soc-1 functions in EGL-15 signaling by acting either upstream of or independently of LET-60 RAS. soc-1 encodes a multisubstrate adaptor protein with an amino-terminal pleckstrin homology domain that is structurally similar to the DOS protein in Drosophila and mammalian GAB1. DOS is known to act with the cytoplasmic tyrosine phosphatase Corkscrew (CSW) in signaling pathways in Drosophila. Similarly, the C. elegans CSW ortholog PTP-2 was found to be involved in EGL-15 signaling. Structure-function analysis of SOC-1 and phenotypic analysis of single and double mutants are consistent with a model in which SOC-1 and PTP-2 act together in a pathway downstream of EGL-15 and the Src homology domain 2 (SH2)/SH3-adaptor protein SEM-5/GRB2 contributes to SOC-1-independent activities of EGL-15. PMID:11689700
Structural analysis of the Quaking homodimerization interface

PubMed Central

Beuck, Christine; Qu, Song; Fagg, W. Samuel; Ares, Manuel; Williamson, James R.

2012-01-01

Quaking is a prototypical member of the STAR protein family, which plays key roles in posttranscriptional gene regulation by controlling mRNA translation, stability and splicing. QkI-5 has been shown to regulate mRNA expression in the central nervous system, but little is known about its roles in other tissues. STAR proteins function as dimers and bind to bipartite RNA sequences, however, the structural and functional roles of homo- and hetero-dimerization are still unclear. Here, we present the crystal structure of the QkI dimerization domain, which adopts a similar stacked helix-turn-helix arrangement as its homologs GLD-1 and Sam68, but differs by an additional helix inserted in the dimer interface. Variability of the dimer interface residues likely ensures selective homodimerization by preventing association with non-cognate STAR family proteins in the cell. Mutations that inhibit dimerization also significantly impair RNA binding in vitro, alter QkI-5 protein levels, and impair QkI function in a splicing assay in vivo. Together our results indicate that a functional Qua1 homodimerization domain is required for QkI-5 function in mammalian cells. PMID:22982292
SALAD database: a motif-based database of protein annotations for plant comparative genomics

PubMed Central

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
Comprehensive and quantitative proteomic analyses of zebrafish plasma reveals conserved protein profiles between genders and between zebrafish and human.

PubMed

Li, Caixia; Tan, Xing Fei; Lim, Teck Kwang; Lin, Qingsong; Gong, Zhiyuan

2016-04-13

Omic approaches have been increasingly used in the zebrafish model for holistic understanding of molecular events and mechanisms of tissue functions. However, plasma is rarely used for omic profiling because of the technical challenges in collecting sufficient blood. In this study, we employed two mass spectrometric (MS) approaches for a comprehensive characterization of zebrafish plasma proteome, i.e. conventional shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS) for an overview study and quantitative SWATH (Sequential Window Acquisition of all THeoretical fragment-ion spectra) for comparison between genders. 959 proteins were identified in the shotgun profiling with estimated concentrations spanning almost five orders of magnitudes. Other than the presence of a few highly abundant female egg yolk precursor proteins (vitellogenins), the proteomic profiles of male and female plasmas were very similar in both number and abundance and there were basically no other highly gender-biased proteins. The types of plasma proteins based on IPA (Ingenuity Pathway Analysis) classification and tissue sources of production were also very similar. Furthermore, the zebrafish plasma proteome shares significant similarities with human plasma proteome, in particular in top abundant proteins including apolipoproteins and complements. Thus, the current study provided a valuable dataset for future evaluation of plasma proteins in zebrafish.
Comprehensive and quantitative proteomic analyses of zebrafish plasma reveals conserved protein profiles between genders and between zebrafish and human

PubMed Central

Li, Caixia; Tan, Xing Fei; Lim, Teck Kwang; Lin, Qingsong; Gong, Zhiyuan

2016-01-01

Omic approaches have been increasingly used in the zebrafish model for holistic understanding of molecular events and mechanisms of tissue functions. However, plasma is rarely used for omic profiling because of the technical challenges in collecting sufficient blood. In this study, we employed two mass spectrometric (MS) approaches for a comprehensive characterization of zebrafish plasma proteome, i.e. conventional shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS) for an overview study and quantitative SWATH (Sequential Window Acquisition of all THeoretical fragment-ion spectra) for comparison between genders. 959 proteins were identified in the shotgun profiling with estimated concentrations spanning almost five orders of magnitudes. Other than the presence of a few highly abundant female egg yolk precursor proteins (vitellogenins), the proteomic profiles of male and female plasmas were very similar in both number and abundance and there were basically no other highly gender-biased proteins. The types of plasma proteins based on IPA (Ingenuity Pathway Analysis) classification and tissue sources of production were also very similar. Furthermore, the zebrafish plasma proteome shares significant similarities with human plasma proteome, in particular in top abundant proteins including apolipoproteins and complements. Thus, the current study provided a valuable dataset for future evaluation of plasma proteins in zebrafish. PMID:27071722
Bacterial-like PPP protein phosphatases: novel sequence alterations in pathogenic eukaryotes and peculiar features of bacterial sequence similarity.

PubMed

Kerk, David; Uhrig, R Glen; Moorhead, Greg B

2013-01-01

Reversible phosphorylation is a widespread modification affecting the great majority of eukaryotic cellular proteins, and whose effects influence nearly every cellular function. Protein phosphatases are increasingly recognized as exquisitely regulated contributors to these changes. The PPP (phosphoprotein phosphatase) family comprises enzymes, which catalyze dephosphorylation at serine and threonine residues. Nearly a decade ago, "bacterial-like" enzymes were recognized with similarity to proteins from various bacterial sources: SLPs (Shewanella-like phosphatases), RLPHs (Rhizobiales-like phosphatases), and ALPHs (ApaH-like phosphatases). A recent article from our laboratory appearing in Plant Physiology characterizes their extensive organismal distribution, abundance in plant species, predicted subcellular localization, motif organization, and sequence evolution. One salient observation is the distinct evolutionary trajectory followed by SLP genes and proteins in photosynthetic eukaryotes vs. animal and plant pathogens derived from photosynthetic ancestors. We present here a closer look at sequence data that emphasizes the distinctiveness of pathogen SLP proteins and that suggests that they might represent novel drug targets. A second observation in our original report was the high degree of similarity between the bacterial-like PPPs of eukaryotes and closely related proteins of the "eukaryotic-like" phyla Myxococcales and Planctomycetes. We here reflect on the possible implications of these observations and their importance for future research.
The nonstructural proteins of Pneumoviruses are remarkably distinct in substrate diversity and specificity.

PubMed

Ribaudo, Michael; Barik, Sailen

2017-11-06

Interferon (IFN) inhibits viruses by inducing several hundred cellular genes, aptly named 'interferon (IFN)-stimulated genes' (ISGs). The only two RNA viruses of the Pneumovirus genus of the Paramyxoviridae family, namely Respiratory Syncytial Virus (RSV) and Pneumonia Virus of Mice (PVM), each encode two nonstructural (NS) proteins that share no sequence similarity but yet suppress IFN. Since suppression of IFN underlies the ability of these viruses to replicate in the host cells, the mechanism of such suppression has become an important area of research. This Short Report is an important extension of our previous efforts in defining this mechanism. We show that, like their PVM counterparts, the RSV NS proteins also target multiple members of the ISG family. While significantly extending the substrate repertoire of the RSV NS proteins, these results, unexpectedly, also reveal that the target preferences of the NS proteins of the two viruses are entirely different. This is surprising since the two Pneumoviruses are phylogenetically close with similar genome organization and gene function, and the NS proteins of both also serve as suppressors of host IFN response. The finding that the NS proteins of the two highly similar viruses suppress entirely different members of the ISG family raises intriguing questions of pneumoviral NS evolution and mechanism of action.

SALAD database: a motif-based database of protein annotations for plant comparative genomics.

PubMed

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
Calcium/calmodulin-mediated signal network in plants

NASA Technical Reports Server (NTRS)

Yang, Tianbao; Poovaiah, B. W.

2003-01-01

Various extracellular stimuli elicit specific calcium signatures that can be recognized by different calcium sensors. Calmodulin, the predominant calcium receptor, is one of the best-characterized calcium sensors in eukaryotes. In recent years, completion of the Arabidopsis genome project and advances in functional genomics have helped to identify and characterize numerous calmodulin-binding proteins in plants. There are some similarities in Ca(2+)/calmodulin-mediated signaling in plants and animals. However, plants possess multiple calmodulin genes and many calmodulin target proteins, including unique protein kinases and transcription factors. Some of these proteins are likely to act as "hubs" during calcium signal transduction. Hence, a better understanding of the function of these calmodulin target proteins should help in deciphering the Ca(2+)/calmodulin-mediated signal network and its role in plant growth, development and response to environmental stimuli.
Membrane nanodomains in plants: capturing form, function, and movement.

PubMed

Tapken, Wiebke; Murphy, Angus S

2015-03-01

The plasma membrane is the interface between the cell and the external environment. Plasma membrane lipids provide scaffolds for proteins and protein complexes that are involved in cell to cell communication, signal transduction, immune responses, and transport of small molecules. In animals, fungi, and plants, a substantial subset of these plasma membrane proteins function within ordered sterol- and sphingolipid-rich nanodomains. High-resolution microscopy, lipid dyes, pharmacological inhibitors of lipid biosynthesis, and lipid biosynthetic mutants have been employed to examine the relationship between the lipid environment and protein activity in plants. They have also been used to identify proteins associated with nanodomains and the pathways by which nanodomain-associated proteins are trafficked to their plasma membrane destinations. These studies suggest that plant membrane nanodomains function in a context-specific manner, analogous to similar structures in animals and fungi. In addition to the highly conserved flotillin and remorin markers, some members of the B and G subclasses of ATP binding cassette transporters have emerged as functional markers for plant nanodomains. Further, the glycophosphatidylinositol-anchored fasciclin-like arabinogalactan proteins, that are often associated with detergent-resistant membranes, appear also to have a functional role in membrane nanodomains. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Pyranopterin conformation defines the function of molybdenum and tungsten enzymes.

PubMed

Rothery, Richard A; Stein, Benjamin; Solomonson, Matthew; Kirk, Martin L; Weiner, Joel H

2012-09-11

We have analyzed the conformations of 319 pyranopterins in 102 protein structures of mononuclear molybdenum and tungsten enzymes. These span a continuum between geometries anticipated for quinonoid dihydro, tetrahydro, and dihydro oxidation states. We demonstrate that pyranopterin conformation is correlated with the protein folds defining the three major mononuclear molybdenum and tungsten enzyme families, and that binding-site micro-tuning controls pyranopterin oxidation state. Enzymes belonging to the bacterial dimethyl sulfoxide reductase (DMSOR) family contain a metal-bis-pyranopterin cofactor, the two pyranopterins of which have distinct conformations, with one similar to the predicted tetrahydro form, and the other similar to the predicted dihydro form. Enzymes containing a single pyranopterin belong to either the xanthine dehydrogenase (XDH) or sulfite oxidase (SUOX) families, and these have pyranopterin conformations similar to those predicted for tetrahydro and dihydro forms, respectively. This work provides keen insight into the roles of pyranopterin conformation and oxidation state in catalysis, redox potential modulation of the metal site, and catalytic function.
Small proteins in cyanobacteria provide a paradigm for the functional analysis of the bacterial micro-proteome.

PubMed

Baumgartner, Desiree; Kopf, Matthias; Klähn, Stephan; Steglich, Claudia; Hess, Wolfgang R

2016-11-28

Despite their versatile functions in multimeric protein complexes, in the modification of enzymatic activities, intercellular communication or regulatory processes, proteins shorter than 80 amino acids (μ-proteins) are a systematically underestimated class of gene products in bacteria. Photosynthetic cyanobacteria provide a paradigm for small protein functions due to extensive work on the photosynthetic apparatus that led to the functional characterization of 19 small proteins of less than 50 amino acids. In analogy, previously unstudied small ORFs with similar degrees of conservation might encode small proteins of high relevance also in other functional contexts. Here we used comparative transcriptomic information available for two model cyanobacteria, Synechocystis sp. PCC 6803 and Synechocystis sp. PCC 6714 for the prediction of small ORFs. We found 293 transcriptional units containing candidate small ORFs ≤80 codons in Synechocystis sp. PCC 6803, also including the known mRNAs encoding small proteins of the photosynthetic apparatus. From these transcriptional units, 146 are shared between the two strains, 42 are shared with the higher plant Arabidopsis thaliana and 25 with E. coli. To verify the existence of the respective μ-proteins in vivo, we selected five genes as examples to which a FLAG tag sequence was added and re-introduced them into Synechocystis sp. PCC 6803. These were the previously annotated gene ssr1169, two newly defined genes norf1 and norf4, as well as nsiR6 (nitrogen stress-induced RNA 6) and hliR1(high light-inducible RNA 1) , which originally were considered non-coding. Upon activation of expression via the Cu 2+. responsive petE promoter or from the native promoters, all five proteins were detected in Western blot experiments. The distribution and conservation of these five genes as well as their regulation of expression and the physico-chemical properties of the encoded proteins underline the likely great bandwidth of small protein functions in bacteria and makes them attractive candidates for functional studies.
Antibody to CCDC104 is associated with a paraneoplastic antibody to CDR2 (anti-Yo).

PubMed

Totland, Cecilie; Bredholt, Geir; Haugen, Mette; Haukanes, Bjørn Ivar; Vedeler, Christian A

2010-02-01

Patients with cancer may develop paraneoplastic neurological syndromes (PNS) in which onconeural antibodies are important diagnostic findings. As the functional role of onconeural antibodies is largely unknown, insight gained by identifying associated antibodies may help to clarify the pathogenesis of the PNS. In this study, we identified patients with Yo antibodies who also had antibodies to an uncharacterized protein called coiled-coil domain-containing protein 104 (CCDC104). We found a significant association between CCDC104 and Yo antibodies (4 of 38, 10.5%), but not other onconeural antibodies (0 of 158) (P = 0.007, Fisher's exact test). The prevalence of CCDC104 antibodies was approximately similar in patients with cancer (8 of 756, 1.1%) and in healthy blood donors (2 of 300, 0.7%). CCDC104 antibodies were not associated with PNS, as this was found in only two of the ten CCDC104-positive patients. The CCDC104 protein, whose function is unknown, is expressed in various human tissues, including the brain, and is localized mainly to the nucleus, but is also found in the cytoplasm. The association between Yo and CCDC104 antibodies may indicate functional similarities.
A transthyretin-related protein is functionally expressed in Herbaspirillum seropedicae.

PubMed

Matiollo, Camila; Vernal, Javier; Ecco, Gabriela; Bertoldo, Jean Borges; Razzera, Guilherme; de Souza, Emanuel M; Pedrosa, Fábio O; Terenzi, Hernán

2009-10-02

Transthyretin-related proteins (TRPs) constitute a family of proteins structurally related to transthyretin (TTR) and are found in a large range of bacterial, fungal, plant, invertebrate, and vertebrate species. However, it was recently recognized that both prokaryotic and eukaryotic members of this family are not functionally related to transthyretins. TRPs are in fact involved in the purine catabolic pathway and function as hydroxyisourate hydrolases. An open reading frame encoding a protein similar to the Escherichia coli TRP was identified in Herbaspirillum seropedicae genome (Hs_TRP). It was cloned, overexpressed in E. coli, and purified to homogeneity. Mass spectrometry data confirmed the identity of this protein, and circular dichroism spectrum indicated a predominance of beta-sheet structure, as expected for a TRP. We have demonstrated that Hs_TRP is a 5-hydroxyisourate hydrolase and by site-directed mutagenesis the importance of three conserved catalytic residues for Hs_TRP activity was further confirmed. The production of large quantities of this recombinant protein opens up the possibility of obtaining its 3D-structure and will help further investigations into purine catabolism.
PrPC has nucleic acid chaperoning properties similar to the nucleocapsid protein of HIV-1.

PubMed

Derrington, Edmund; Gabus, Caroline; Leblanc, Pascal; Chnaidermann, Jonas; Grave, Linda; Dormont, Dominique; Swietnicki, Wieslaw; Morillas, Manuel; Marck, Daniel; Nandi, Pradip; Darlix, Jean-Luc

2002-01-01

The function of the cellular prion protein (PrPC) remains obscure. Studies suggest that PrPC functions in several processes including signal transduction and Cu2+ metabolism. PrPC has also been established to bind nucleic acids. Therefore we investigated the properties of PrPC as a putative nucleic acid chaperone. Surprisingly, PrPC possesses all the nucleic acid chaperoning properties previously specific to retroviral nucleocapsid proteins. PrPC appears to be a molecular mimic of NCP7, the nucleocapsid protein of HIV-1. Thus PrPC, like NCP7, chaperones the annealing of tRNA(Lys) to the HIV-1 primer binding site, the initial step of retrovirus replication. PrPC also chaperones the two DNA strand transfers required for production of a complete proviral DNA with LTRs. Concerning the functions of NCP7 during budding, PrPC also mimices NCP7 by dimerizing the HIV-1 genomic RNA. These data are unprecedented because, although many cellular proteins have been identified as nucleic acid chaperones, none have the properties of retroviral nucleocapsid proteins.
Conformational diversity analysis reveals three functional mechanisms in proteins

PubMed Central

Fornasari, María Silvina

2017-01-01

Protein motions are a key feature to understand biological function. Recently, a large-scale analysis of protein conformational diversity showed a positively skewed distribution with a peak at 0.5 Å C-alpha root-mean-square-deviation (RMSD). To understand this distribution in terms of structure-function relationships, we studied a well curated and large dataset of ~5,000 proteins with experimentally determined conformational diversity. We searched for global behaviour patterns studying how structure-based features change among the available conformer population for each protein. This procedure allowed us to describe the RMSD distribution in terms of three main protein classes sharing given properties. The largest of these protein subsets (~60%), which we call “rigid” (average RMSD = 0.83 Å), has no disordered regions, shows low conformational diversity, the largest tunnels and smaller and buried cavities. The two additional subsets contain disordered regions, but with differential sequence composition and behaviour. Partially disordered proteins have on average 67% of their conformers with disordered regions, average RMSD = 1.1 Å, the highest number of hinges and the longest disordered regions. In contrast, malleable proteins have on average only 25% of disordered conformers and average RMSD = 1.3 Å, flexible cavities affected in size by the presence of disordered regions and show the highest diversity of cognate ligands. Proteins in each set are mostly non-homologous to each other, share no given fold class, nor functional similarity but do share features derived from their conformer population. These shared features could represent conformational mechanisms related with biological functions. PMID:28192432
ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling.

PubMed

Cheng, Yiming; Perocchi, Fabiana

2015-07-01

ProtPhylo is a web-based tool to identify proteins that are functionally linked to either a phenotype or a protein of interest based on co-evolution. ProtPhylo infers functional associations by comparing protein phylogenetic profiles (co-occurrence patterns of orthology relationships) for more than 9.7 million non-redundant protein sequences from all three domains of life. Users can query any of 2048 fully sequenced organisms, including 1678 bacteria, 255 eukaryotes and 115 archaea. In addition, they can tailor ProtPhylo to a particular kind of biological question by choosing among four main orthology inference methods based either on pair-wise sequence comparisons (One-way Best Hits and Best Reciprocal Hits) or clustering of orthologous proteins across multiple species (OrthoMCL and eggNOG). Next, ProtPhylo ranks phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance as a measure of similarity between pairs of phylogenetic profiles. Candidate hits can be easily and flexibly prioritized by complementary clues on subcellular localization, known protein-protein interactions, membrane spanning regions and protein domains. The resulting protein list can be quickly exported into a csv text file for further analyses. ProtPhylo is freely available at http://www.protphylo.org. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Toxic gain of function from mutant FUS protein is crucial to trigger cell autonomous motor neuron loss.

PubMed

Scekic-Zahirovic, Jelena; Sendscheid, Oliver; El Oussini, Hajer; Jambeau, Mélanie; Sun, Ying; Mersmann, Sina; Wagner, Marina; Dieterlé, Stéphane; Sinniger, Jérome; Dirrig-Grosch, Sylvie; Drenner, Kevin; Birling, Marie-Christine; Qiu, Jinsong; Zhou, Yu; Li, Hairi; Fu, Xiang-Dong; Rouaux, Caroline; Shelkovnikova, Tatyana; Witting, Anke; Ludolph, Albert C; Kiefer, Friedemann; Storkebaum, Erik; Lagier-Tourenne, Clotilde; Dupuis, Luc

2016-05-17

FUS is an RNA-binding protein involved in amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). Cytoplasmic FUS-containing aggregates are often associated with concomitant loss of nuclear FUS Whether loss of nuclear FUS function, gain of a cytoplasmic function, or a combination of both lead to neurodegeneration remains elusive. To address this question, we generated knockin mice expressing mislocalized cytoplasmic FUS and complete FUS knockout mice. Both mouse models display similar perinatal lethality with respiratory insufficiency, reduced body weight and length, and largely similar alterations in gene expression and mRNA splicing patterns, indicating that mislocalized FUS results in loss of its normal function. However, FUS knockin mice, but not FUS knockout mice, display reduced motor neuron numbers at birth, associated with enhanced motor neuron apoptosis, which can be rescued by cell-specific CRE-mediated expression of wild-type FUS within motor neurons. Together, our findings indicate that cytoplasmic FUS mislocalization not only leads to nuclear loss of function, but also triggers motor neuron death through a toxic gain of function within motor neurons. © 2016 The Authors. Published under the terms of the CC BY NC ND 4.0 license.
A Common Fold Mediates Vertebrate Defense and Bacterial Attack

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rosado, Carlos J.; Buckle, Ashley M.; Law, Ruby H.P.

2008-10-02

Proteins containing membrane attack complex/perforin (MACPF) domains play important roles in vertebrate immunity, embryonic development, and neural-cell migration. In vertebrates, the ninth component of complement and perforin form oligomeric pores that lyse bacteria and kill virus-infected cells, respectively. However, the mechanism of MACPF function is unknown. We determined the crystal structure of a bacterial MACPF protein, Plu-MACPF from Photorhabdus luminescens, to 2.0 angstrom resolution. The MACPF domain reveals structural similarity with poreforming cholesterol-dependent cytolysins (CDCs) from Gram-positive bacteria. This suggests that lytic MACPF proteins may use a CDC-like mechanism to form pores and disrupt cell membranes. Sequence similarity between bacterialmore » and vertebrate MACPF domains suggests that the fold of the CDCs, a family of proteins important for bacterial pathogenesis, is probably used by vertebrates for defense against infection.« less
Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction.

PubMed

Bossi, Flavia; Fan, Jue; Xiao, Jun; Chandra, Lilyana; Shen, Max; Dorone, Yanniv; Wagner, Doris; Rhee, Seung Y

2017-06-26

The molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. To identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation. We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.
An improved approach to infer protein-protein interaction based on a hierarchical vector space model.

PubMed

Zhang, Jiongmin; Jia, Ke; Jia, Jinmeng; Qian, Ying

2018-04-27

Comparing and classifying functions of gene products are important in today's biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by "is_a" and "part_of" relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .
The TIR domain of TIR-NB-LRR resistance proteins is a signaling domain involved in cell death induction.

PubMed

Swiderski, Michal R; Birker, Doris; Jones, Jonathan D G

2009-02-01

In plants, the TIR (toll interleukin 1 receptor) domain is found almost exclusively in nucleotide-binding (NB) leucine-rich repeat resistance proteins and their truncated homologs, and has been proposed to play a signaling role during resistance responses mediated by TIR containing R proteins. Transient expression in Nicotiana benthamiana leaves of "TIR + 80", the RPS4 truncation without the NB-ARC domain, leads to EDS1-, SGT1-, and HSP90-dependent cell death. Transgenic Arabidopsis plants expressing the RPS4 TIR+80 from either dexamethasone or estradiol-inducible promoters display inducer-dependent cell death. Cell death is also elicited by transient expression of similarly truncated constructs from two other R proteins, RPP1A and At4g19530, but is not elicited by similar constructs representing RPP2A and RPP2B proteins. Site-directed mutagenesis of the RPS4 TIR domain identified many loss-of-function mutations but also revealed several gain-of function substitutions. Lack of cell death induction by the E160A substitution suggests that amino acids outside of the TIR domain contribute to cell death signaling in addition to the TIR domain itself. This is consistent with previous observations that the TIR domain itself is insufficient to induce cell death upon transient expression.
Structural and Functional Characterization of an Ancient Bacterial Transglutaminase Sheds Light on the Minimal Requirements for Protein Cross-Linking.

PubMed

Fernandes, Catarina G; Plácido, Diana; Lousa, Diana; Brito, José A; Isidro, Anabela; Soares, Cláudio M; Pohl, Jan; Carrondo, Maria A; Archer, Margarida; Henriques, Adriano O

2015-09-22

Transglutaminases are best known for their ability to catalyze protein cross-linking reactions that impart chemical and physical resilience to cellular structures. Here, we report the crystal structure and characterization of Tgl, a transglutaminase from the bacterium Bacillus subtilis. Tgl is produced during sporulation and cross-links the surface of the highly resilient spore. Tgl-like proteins are found only in spore-forming bacteria of the Bacillus and Clostridia classes, indicating an ancient origin. Tgl is a single-domain protein, produced in active form, and the smallest transglutaminase characterized to date. We show that Tgl is structurally similar to bacterial cell wall endopeptidases and has an NlpC/P60 catalytic core, thought to represent the ancestral unit of the cysteine protease fold. We show that Tgl functions through a unique partially redundant catalytic dyad formed by Cys116 and Glu187 or Glu115. Strikingly, the catalytic Cys is insulated within a hydrophobic tunnel that traverses the molecule from side to side. The lack of similarity of Tgl to other transglutaminases together with its small size suggests that an NlpC/P60 catalytic core and insulation of the active site during catalysis may be essential requirements for protein cross-linking.
Plant Cation-Chloride Cotransporters (CCC): Evolutionary Origins and Functional Insights

PubMed Central

Gilliham, Matthew

2018-01-01

Genomes of unicellular and multicellular green algae, mosses, grasses and dicots harbor genes encoding cation-chloride cotransporters (CCC). CCC proteins from the plant kingdom have been comparatively less well investigated than their animal counterparts, but proteins from both plants and animals have been shown to mediate ion fluxes, and are involved in regulation of osmotic processes. In this review, we show that CCC proteins from plants form two distinct phylogenetic clades (CCC1 and CCC2). Some lycophytes and bryophytes possess members from each clade, most land plants only have members of the CCC1 clade, and green algae possess only the CCC2 clade. It is currently unknown whether CCC1 and CCC2 proteins have similar or distinct functions, however they are both more closely related to animal KCC proteins compared to NKCCs. Existing heterologous expression systems that have been used to functionally characterize plant CCC proteins, namely yeast and Xenopus laevis oocytes, have limitations that are discussed. Studies from plants exposed to chemical inhibitors of animal CCC protein function are reviewed for their potential to discern CCC function in planta. Thus far, mutations in plant CCC genes have been evaluated only in two species of angiosperms, and such mutations cause a diverse array of phenotypes—seemingly more than could simply be explained by localized disruption of ion transport alone. We evaluate the putative roles of plant CCC proteins and suggest areas for future investigation. PMID:29415511
A scoring function based on solvation thermodynamics for protein structure prediction

PubMed Central

Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

2012-01-01

We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed. PMID:27493529
Comparative analysis of the L, M, and S RNA segments of Crimean-Congo haemorrhagic fever virus isolates from southern Africa.

PubMed

Goedhals, Dominique; Bester, Phillip A; Paweska, Janusz T; Swanepoel, Robert; Burt, Felicity J

2015-05-01

Crimean-Congo haemorrhagic fever virus (CCHFV) is a member of the Bunyaviridae family with a tripartite, negative sense RNA genome. This study used predictive software to analyse the L (large), M (medium), and S (small) segments of 14 southern African CCHFV isolates. The OTU-like cysteine protease domain and the RdRp domain of the L segment are highly conserved among southern African CCHFV isolates. The M segment encodes the structural glycoproteins, GN and GC, and the non-structural glycoproteins which are post-translationally cleaved at highly conserved furin and subtilase SKI-1 cleavage sites. All of the sites previously identified were shown to be conserved among southern African CCHFV isolates. The heavily O-glycosylated N-terminal variable mucin-like domain of the M segment shows the highest sequence variability of the CCHFV proteins. Five transmembrane domains are predicted in the M segment polyprotein resulting in three regions internal to and three regions external to the membrane across the G(N), NS(M) and G(C) glycoproteins. The corroboration of conserved genome domains and sequence identity among geographically diverse isolates may assist in the identification of protein function and pathogenic mechanisms, as well as the identification of potential targets for antiviral therapy and vaccine design. As detailed functional studies are lacking for many of the CCHFV proteins, identification of functional domains by prediction of protein structure, and identification of amino acid level similarity to functionally characterised proteins of related viruses or viruses with similar pathogenic mechanisms are a necessary step for selection of areas for further study. © 2015 Wiley Periodicals, Inc.
Functional insights from proteome-wide structural modeling of Treponema pallidum subspecies pallidum, the causative agent of syphilis.

PubMed

Houston, Simon; Lithgow, Karen Vivien; Osbak, Kara Krista; Kenyon, Chris Richard; Cameron, Caroline E

2018-05-16

Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome.

"Multiple partial recognitions in dynamic equilibrium" in the binding sites of proteins form the molecular basis of promiscuous recognition of structurally diverse ligands.

PubMed

Kohda, Daisuke

2018-04-01

Promiscuous recognition of ligands by proteins is as important as strict recognition in numerous biological processes. In living cells, many short, linear amino acid motifs function as targeting signals in proteins to specify the final destination of the protein transport. In general, the target signal is defined by a consensus sequence containing wild-characters, and hence represented by diverse amino acid sequences. The classical lock-and-key or induced-fit/conformational selection mechanism may not cover all aspects of the promiscuous recognition. On the basis of our crystallographic and NMR studies on the mitochondrial Tom20 protein-presequence interaction, we proposed a new hypothetical mechanism based on "a rapid equilibrium of multiple states with partial recognitions". This dynamic, multiple recognition mode enables the Tom20 receptor to recognize diverse mitochondrial presequences with nearly equal affinities. The plant Tom20 is evolutionally unrelated to the animal Tom20 in our study, but is a functional homolog of the animal/fungal Tom20. NMR studies by another research group revealed that the presequence binding by the plant Tom20 was not fully explained by simple interaction modes, suggesting the presence of a similar dynamic, multiple recognition mode. Circumstantial evidence also suggested that similar dynamic mechanisms may be applicable to other promiscuous recognitions of signal peptides by the SRP54/Ffh and SecA proteins.
Comparative proteomic analysis of developing rhizomes of the ancient vascular plant Equisetum hyemale and different monocot species.

PubMed

Salvato, Fernanda; Balbuena, Tiago S; Nelson, William; Rao, R Shyama Prasad; He, Ruifeng; Soderlund, Carol A; Gang, David R; Thelen, Jay J

2015-04-03

The rhizome is responsible for the invasiveness and competitiveness of many plants with great economic and agricultural impact worldwide. Besides its value as an invasive organ, the rhizome plays a role in the establishment and massive growth of forage, providing biomass for biofuel production. Despite these features, little is known about the molecular mechanisms that contribute to rhizome growth, development, and function in plants. In this work, we characterized the proteome of rhizome apical tips and elongation zones from different species using a GeLC-MS/MS (one-dimensional electrophoresis in combination with liquid chromatography coupled online with tandem mass spectrometry) spectral-counting proteomics strategy. Five rhizomatous grasses and an ancient species were compared to study the protein regulation in rhizomes. An average of 2200 rhizome proteins per species were confidently identified and quantified. Rhizome-characteristic proteins showed similar functional distributions across all species analyzed. The over-representation of proteins associated with central roles in cellular, metabolic, and developmental processes indicated accelerated metabolism in growing rhizomes. Moreover, 61 rhizome-characteristic proteins appeared to be regulated similarly among analyzed plants. In addition, 36 showed conserved regulation between rhizome apical tips and elongation zones across species. These proteins were preferentially expressed in rhizome tissues regardless of the species analyzed, making them interesting candidates for more detailed investigative studies about their roles in rhizome development.
Purification, amino acid sequence and characterisation of kangaroo IGF-I.

PubMed

Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

1998-01-01

Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.
GASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms.

PubMed

Moraes, João P A; Pappa, Gisele L; Pires, Douglas E V; Izidoro, Sandro C

2017-07-03

Enzyme active sites are important and conserved functional regions of proteins whose identification can be an invaluable step toward protein function prediction. Most of the existing methods for this task are based on active site similarity and present limitations including performing only exact matches on template residues, template size restraints, despite not being capable of finding inter-domain active sites. To fill this gap, we proposed GASS-WEB, a user-friendly web server that uses GASS (Genetic Active Site Search), a method based on an evolutionary algorithm to search for similar active sites in proteins. GASS-WEB can be used under two different scenarios: (i) given a protein of interest, to match a set of specific active site templates; or (ii) given an active site template, looking for it in a database of protein structures. The method has shown to be very effective on a range of experiments and was able to correctly identify >90% of the catalogued active sites from the Catalytic Site Atlas. It also managed to achieve a Matthew correlation coefficient of 0.63 using the Critical Assessment of protein Structure Prediction (CASP 10) dataset. In our analysis, GASS was ranking fourth among 18 methods. GASS-WEB is freely available at http://gass.unifei.edu.br/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
[Architecture of receptor-operated ionic channels of biological membranes].

PubMed

Bregestovski, P D

2011-01-01

Ion channels of biological membranes are the key proteins, which provide bioelectric functioning of living systems. These proteins are homo- or heterooligomers assembled from several identical or different subunits. Understanding the architectural organization and functioning of ion channels has been significantly extended due to resolving the crystal structure of several types of voltage-gated and receptor-operated channels. This review summarizes the information obtained from crystal structures of potassium, nicotinic acetylcholine receptor, P2X, and other ligand-gated ion channels. Despite the differences in the function, topology, ionic selectivity, and the subunit stoichiometry, a high similarity in the principles of organization of these macromolecular complexes has been revealed.
Protein phosphorylations in poliovirus infected cells.

PubMed

James, L A; Tershak, D R

1981-01-01

In vivo phosphorylation of proteins that are associated with polysomes of poliovirus-infected VERO (African green monkey kidney) and HeLa (Henrietta Lacks) cells differed from phosphorylations observed with uninfected cells that were fed fresh medium. With both types of cells infection stimulated phosphorylation of proteins with molecular weights of 40 000-41 000, 39 000, 34 000, 32 000, and 24 000. Similarities of phosphorylations in VERO and HeLa cells suggest that they are a specific consequence of infection and might serve a regulatory function during protein synthesis.
A Conserved Deubiquitinating Enzyme Uses Intrinsically Disordered Regions to Scaffold Multiple Protein Interaction Sites*

PubMed Central

Reed, Benjamin J.; Locke, Melissa N.; Gardner, Richard G.

2015-01-01

In the canonical view of protein function, it is generally accepted that the three-dimensional structure of a protein determines its function. However, the past decade has seen a dramatic growth in the identification of proteins with extensive intrinsically disordered regions (IDRs), which are conformationally plastic and do not appear to adopt single three-dimensional structures. One current paradigm for IDR function is that disorder enables IDRs to adopt multiple conformations, expanding the ability of a protein to interact with a wide variety of disparate proteins. The capacity for many interactions is an important feature of proteins that occupy the hubs of protein networks, in particular protein-modifying enzymes that usually have a broad spectrum of substrates. One such protein modification is ubiquitination, where ubiquitin is attached to proteins through ubiquitin ligases (E3s) and removed through deubiquitinating enzymes. Numerous proteomic studies have found that thousands of proteins are dynamically regulated by cycles of ubiquitination and deubiquitination. Thus, how these enzymes target their wide array of substrates is of considerable importance for understanding the function of the cell's diverse ubiquitination networks. Here, we characterize a yeast deubiquitinating enzyme, Ubp10, that possesses IDRs flanking its catalytic protease domain. We show that Ubp10 possesses multiple, distinct binding modules within its IDRs that are necessary and sufficient for directing protein interactions important for Ubp10's known roles in gene silencing and ribosome biogenesis. The human homolog of Ubp10, USP36, also has IDRs flanking its catalytic domain, and these IDRs similarly contain binding modules important for protein interactions. This work highlights the significant protein interaction scaffolding abilities of IDRs in the regulation of dynamic protein ubiquitination. PMID:26149687
Function, structure, and stability of enzymes confined in agarose gels.

PubMed

Kunkel, Jeffrey; Asuri, Prashanth

2014-01-01

Research over the past few decades has attempted to answer how proteins behave in molecularly confined or crowded environments when compared to dilute buffer solutions. This information is vital to understanding in vivo protein behavior, as the average spacing between macromolecules in the cell cytosol is much smaller than the size of the macromolecules themselves. In our study, we attempt to address this question using three structurally and functionally different model enzymes encapsulated in agarose gels of different porosities. Our studies reveal that under standard buffer conditions, the initial reaction rates of the agarose-encapsulated enzymes are lower than that of the solution phase enzymes. However, the encapsulated enzymes retain a higher percentage of their activity in the presence of denaturants. Moreover, the concentration of agarose used for encapsulation had a significant effect on the enzyme functional stability; enzymes encapsulated in higher percentages of agarose were more stable than the enzymes encapsulated in lower percentages of agarose. Similar results were observed through structural measurements of enzyme denaturation using an 8-anilinonaphthalene-1-sulfonic acid fluorescence assay. Our work demonstrates the utility of hydrogels to study protein behavior in highly confined environments similar to those present in vivo; furthermore, the enhanced stability of gel-encapsulated enzymes may find use in the delivery of therapeutic proteins, as well as the design of novel strategies for biohybrid medical devices.
Analysis of expressed sequence tags for Frankliniella occidentalis, the western flower thrips.

PubMed

Rotenberg, D; Whitfield, A E

2010-08-01

Thrips are members of the insect order Thysanoptera and Frankliniella occidentalis (the western flower thrips) is the most economically important pest within this order. F. occidentalis is both a direct pest of crops and an efficient vector of plant viruses, including Tomato spotted wilt virus (TSWV). Despite the world-wide importance of thrips in agriculture, there is little knowledge of the F. occidentalis genome or gene functions at this time. A normalized cDNA library was constructed from first instar thrips and 13 839 expressed sequence tags (ESTs) were obtained. Our EST data assembled into 894 contigs and 11 806 singletons (12 700 nonredundant sequences). We found that 31% of these sequences had significant similarity (E< or = 10(-10)) to protein sequences in the National Center for Biotechnology Information nonredundant (nr) protein database, and 25% were functionally annotated using Blast 2GO. We identified 74 sequences with putative homology to proteins associated with insect innate immunity. Sixteen sequences had significant similarity to proteins associated with small RNA-mediated gene silencing pathways (RNA interference; RNAi), including the antiviral pathway (short interfering RNA-mediated pathway). Our EST collection provides new sequence resources for characterizing gene functions in F. occidentalis and other thrips species with regards to vital biological processes, studying the mechanism of interactions with the viruses harboured and transmitted by the vector, and identifying new insect gene-centred targets for plant disease and insect control.
Protrudin regulates endoplasmic reticulum morphology and function associated with the pathogenesis of hereditary spastic paraplegia.

PubMed

Hashimoto, Yutaka; Shirane, Michiko; Matsuzaki, Fumiko; Saita, Shotaro; Ohnishi, Takafumi; Nakayama, Keiichi I

2014-05-09

Protrudin is a membrane protein that regulates polarized vesicular trafficking in neurons. The protrudin gene (ZFYVE27) is mutated in a subset of individuals with hereditary spastic paraplegia (HSP), and protrudin is therefore also referred to as spastic paraplegia (SPG) 33. We have now generated mice that express a transgene for dual epitope-tagged protrudin under control of a neuron-specific promoter, and we have subjected highly purified protrudin-containing complexes isolated from the brain of these mice to proteomics analysis to identify proteins that associate with protrudin. Protrudin was found to interact with other HSP-related proteins including myelin proteolipid protein 1 (SPG2), atlastin-1 (SPG3A), REEP1 (SPG31), REEP5 (similar to REEP1), Kif5A (SPG10), Kif5B, Kif5C, and reticulon 1, 3, and 4 (similar to reticulon 2, SPG12). Membrane topology analysis indicated that one of three hydrophobic segments of protrudin forms a hydrophobic hairpin domain similar to those of other SPG proteins. Protrudin was found to localize predominantly to the tubular endoplasmic reticulum (ER), and forced expression of protrudin promoted the formation and stabilization of the tubular ER network. The protrudin(G191V) mutant, which has been identified in a subset of HSP patients, manifested an increased intracellular stability, and cells expressing this mutant showed an increased susceptibility to ER stress. Our results thus suggest that protrudin contributes to the regulation of ER morphology and function, and that its deregulation by mutation is a causative defect in HSP.
Modulating protein adsorption onto hydroxyapatite particles using different amino acid treatments

PubMed Central

Lee, Wing-Hin; Loo, Ching-Yee; Van, Kim Linh; Zavgorodniy, Alexander V.; Rohanizadeh, Ramin

2012-01-01

Hydroxyapatite (HA) is a material of choice for bone grafts owing to its chemical and structural similarities to the mineral phase of hard tissues. The combination of osteogenic proteins with HA materials that carry and deliver the proteins to the bone-defective areas will accelerate bone regeneration. The study investigated the treatment of HA particles with different amino acids such as serine (Ser), asparagine (Asn), aspartic acid (Asp) and arginine (Arg) to enhance the adsorption ability of HA carrier for delivering therapeutic proteins to the body. The crystallinity of HA reduced when amino acids were added during HA preparation. Depending on the types of amino acid, the specific surface area of the amino acid-functionalized HA particles varied from 105 to 149 m2 g–1. Bovine serum albumin (BSA) and lysozyme were used as model proteins for adsorption study. The protein adsorption onto the surface of amino acid-functionalized HA depended on the polarities of HA particles, whereby, compared with lysozyme, BSA demonstrated higher affinity towards positively charged Arg-HA. Alternatively, the binding affinity of lysozyme onto the negatively charged Asp-HA was higher when compared with BSA. The BSA and lysozyme adsorptions onto the amino acid-functionalized HA fitted better into the Freundlich than Langmuir model. The amino acid-functionalized HA particles that had higher protein adsorption demonstrated a lower protein-release rate. PMID:21957116
Eukaryotic ribonucleases P/MRP: the crystal structure of the P3 domain.

PubMed

Perederina, Anna; Esakova, Olga; Quan, Chao; Khanova, Elena; Krasilnikov, Andrey S

2010-02-17

Ribonuclease (RNase) P is a site-specific endoribonuclease found in all kingdoms of life. Typical RNase P consists of a catalytic RNA component and a protein moiety. In the eukaryotes, the RNase P lineage has split into two, giving rise to a closely related enzyme, RNase MRP, which has similar components but has evolved to have different specificities. The eukaryotic RNases P/MRP have acquired an essential helix-loop-helix protein-binding RNA domain P3 that has an important function in eukaryotic enzymes and distinguishes them from bacterial and archaeal RNases P. Here, we present a crystal structure of the P3 RNA domain from Saccharomyces cerevisiae RNase MRP in a complex with RNase P/MRP proteins Pop6 and Pop7 solved to 2.7 A. The structure suggests similar structural organization of the P3 RNA domains in RNases P/MRP and possible functions of the P3 domains and proteins bound to them in the stabilization of the holoenzymes' structures as well as in interactions with substrates. It provides the first insight into the structural organization of the eukaryotic enzymes of the RNase P/MRP family.
The La and related RNA-binding proteins (LARPs): structures, functions, and evolving perspectives.

PubMed

Maraia, Richard J; Mattijssen, Sandy; Cruz-Gallardo, Isabel; Conte, Maria R

2017-11-01

La was first identified as a polypeptide component of ribonucleic protein complexes targeted by antibodies in autoimmune patients and is now known to be a eukaryote cell-ubiquitous protein. Structure and function studies have shown that La binds to a common terminal motif, UUU-3'-OH, of nascent RNA polymerase III (RNAP III) transcripts and protects them from exonucleolytic decay. For precursor-tRNAs, the most diverse and abundant of these transcripts, La also functions as an RNA chaperone that helps to prevent their misfolding. Related to this, we review evidence that suggests that La and its link to RNAP III were significant in the great expansions of the tRNAomes that occurred in eukaryotes. Four families of La-related proteins (LARPs) emerged during eukaryotic evolution with specialized functions. We provide an overview of the high-resolution structural biology of La and LARPs. LARP7 family members most closely resemble La but function with a single RNAP III nuclear transcript, 7SK, or telomerase RNA. A cytoplasmic isoform of La protein as well as LARPs 6, 4, and 1 function in mRNA metabolism and translation in distinct but similar ways, sometimes with the poly(A)-binding protein, and in some cases by direct binding to poly(A)-RNA. New structures of LARP domains, some complexed with RNA, provide novel insights into the functional versatility of these proteins. We also consider LARPs in relation to ancestral La protein and potential retention of links to specific RNA-related pathways. One such link may be tRNA surveillance and codon usage by LARP-associated mRNAs. WIREs RNA 2017, 8:e1430. doi: 10.1002/wrna.1430 For further resources related to this article, please visit the WIREs website. © 2017 Wiley Periodicals, Inc.
Entropyology: the application of bioinformatics and data modeling to digital virus and malware recognition

NASA Astrophysics Data System (ADS)

Jaenisch, Holger M.; Handley, James W.

2010-04-01

Malware are analogs of viruses. Viruses are comprised of large numbers of polypeptide proteins. The shape and function of the protein strands determines the functionality of the segment, similar to a subroutine in malware. The full combination of subroutines is the malware organism, in analogous fashion as a collection of polypeptides forms protein structures that are information bearing. We propose to apply the methods of Bioinformatics to analyze malware to provide a rich feature set for creating a unique and novel detection and classification scheme that is originally applied to Bioinformatics amino acid sequencing. Our proposed methods enable real time in situ (in contrast to in vivo) detection applications.
A retroviral oncogene, akt, encoding a serine-threonine kinase containing an SH2-like region.

PubMed

Bellacosa, A; Testa, J R; Staal, S P; Tsichlis, P N

1991-10-11

The v-akt oncogene codes for a 105-kilodalton fusion phosphoprotein containing Gag sequences at its amino terminus. Sequence analysis of v-akt and biochemical characterization of its product revealed that it codes for a protein kinase C-related serine-threonine kinase whose cellular homolog is expressed in most tissues, with the highest amount found in thymus. Although Akt is a serine-threonine kinase, part of its regulatory region is similar to the Src homology-2 domain, a structural motif characteristic of cytoplasmic tyrosine kinases that functions in protein-protein interactions. This suggests that Akt may form a functional link between tyrosine and serine-threonine phosphorylation pathways.
Identification of key residues for protein conformational transition using elastic network model.

PubMed

Su, Ji Guo; Xu, Xian Jin; Li, Chun Hua; Chen, Wei Zu; Wang, Cun Xin

2011-11-07

Proteins usually undergo conformational transitions between structurally disparate states to fulfill their functions. The large-scale allosteric conformational transitions are believed to involve some key residues that mediate the conformational movements between different regions of the protein. In the present work, a thermodynamic method based on the elastic network model is proposed to predict the key residues involved in protein conformational transitions. In our method, the key functional sites are identified as the residues whose perturbations largely influence the free energy difference between the protein states before and after transition. Two proteins, nucleotide binding domain of the heat shock protein 70 and human/rat DNA polymerase β, are used as case studies to identify the critical residues responsible for their open-closed conformational transitions. The results show that the functionally important residues mainly locate at the following regions for these two proteins: (1) the bridging point at the interface between the subdomains that control the opening and closure of the binding cleft; (2) the hinge region between different subdomains, which mediates the cooperative motions between the corresponding subdomains; and (3) the substrate binding sites. The similarity in the positions of the key residues for these two proteins may indicate a common mechanism in their conformational transitions.
Redox proteomics of tomato in response to Pseudomonas syringae infection

PubMed Central

Balmant, Kelly Mayrink; Parker, Jennifer; Yoo, Mi-Jeong; Zhu, Ning; Dufresne, Craig; Chen, Sixue

2015-01-01

Unlike mammals with adaptive immunity, plants rely on their innate immunity based on pattern-triggered immunity (PTI) and effector-triggered immunity (ETI) for pathogen defense. Reactive oxygen species, known to play crucial roles in PTI and ETI, can perturb cellular redox homeostasis and lead to changes of redox-sensitive proteins through modification of cysteine sulfhydryl groups. Although redox regulation of protein functions has emerged as an important mechanism in several biological processes, little is known about redox proteins and how they function in PTI and ETI. In this study, cysTMT proteomics technology was used to identify similarities and differences of protein redox modifications in tomato resistant (PtoR) and susceptible (prf3) genotypes in response to Pseudomonas syringae pv tomato (Pst) infection. In addition, the results of the redox changes were compared and corrected with the protein level changes. A total of 90 potential redox-regulated proteins were identified with functions in carbohydrate and energy metabolism, biosynthesis of cysteine, sucrose and brassinosteroid, cell wall biogenesis, polysaccharide/starch biosynthesis, cuticle development, lipid metabolism, proteolysis, tricarboxylic acid cycle, protein targeting to vacuole, and oxidation–reduction. This inventory of previously unknown protein redox switches in tomato pathogen defense lays a foundation for future research toward understanding the biological significance of protein redox modifications in plant defense responses. PMID:26504582
PIGN prevents protein aggregation in the endoplasmic reticulum independently of its function in the GPI synthesis.

PubMed

Ihara, Shinji; Nakayama, Sohei; Murakami, Yoshiko; Suzuki, Emiko; Asakawa, Masayo; Kinoshita, Taroh; Sawa, Hitoshi

2017-02-01

Quality control of proteins in the endoplasmic reticulum (ER) is essential for ensuring the integrity of secretory proteins before their release into the extracellular space. Secretory proteins that fail to pass quality control form aggregates. Here we show the PIGN-1/PIGN is required for quality control in Caenorhabditis elegans and in mammalian cells. In C. elegans pign-1 mutants, several proteins fail to be secreted and instead form abnormal aggregation. PIGN-knockout HEK293 cells also showed similar protein aggregation. Although PIGN-1/PIGN is responsible for glycosylphosphatidylinositol (GPI)-anchor biosynthesis in the ER, certain mutations in C. elegans pign-1 caused protein aggregation in the ER without affecting GPI-anchor biosynthesis. These results show that PIGN-1/PIGN has a conserved and non-canonical function to prevent deleterious protein aggregation in the ER independently of the GPI-anchor biosynthesis. PIGN is a causative gene for some human diseases including multiple congenital seizure-related syndrome (MCAHS1). Two pign-1 mutations created by CRISPR/Cas9 that correspond to MCAHS1 also cause protein aggregation in the ER, implying that the dysfunction of the PIGN non-canonical function might affect symptoms of MCAHS1 and potentially those of other diseases. © 2017. Published by The Company of Biologists Ltd.
Nucleotide sequence and phylogenetic analysis of Cucurbit yellow stunting disorder virus RNA 2.

PubMed

Livieratos, Ioannis C; Coutts, Robert H A

2002-06-01

The complete nucleotide sequence of Cucurbit yellow stunting disorder virus (CYSDV) RNA 2, a whitefly (Bemisia tabaci)-transmitted closterovirus with a bi-partite genome, is reported. CYSDV RNA 2 is 7,281 nucleotides long and contains the closterovirus hallmark gene array with a similar arrangement to the prototype member of the genus Crinivirus, Lettuce infectious yellows virus (LIYV). CYSDV RNA 2 contains open reading frames (ORFs) potentially encoding in a 5' to 3' direction for proteins of 5 kDa (ORF 1; hydrophobic protein), 62 kDa (ORF 2; heat shock protein 70 homolog, HSP70h), 59 kDa (ORF 3; protein of unknown function), 9 kDa (ORF 4; protein of unknown function), 28.5 kDa (ORF 5; coat protein, CP), 53 kDa (ORF 6; coat protein minor, CPm), and 26.5 kDa (ORF 7; protein of unknown function). Pairwise comparisons of CYSDV RNA 2-encoded proteins (HSP70h, p59 and CPm) among the closteroviruses showed that CYSDV is closely related to LIYV. Phylogenetic analysis based on the amino acid sequence of the HSP70h, indicated that CYSDV clusters with other members of the genus Crinivirus, and it is related to Little cherry virus-1 (LChV-1), but is distinct from the aphid- or mealybug-transmitted closteroviruses.
Analysis of yeast prp20 mutations and functional complementation by the human homologue RCC1, a protein involved in the control of chromosome condensation.

PubMed

Fleischmann, M; Clark, M W; Forrester, W; Wickens, M; Nishimoto, T; Aebi, M

1991-07-01

Mutations in the PRP20 gene of yeast show a pleiotropic phenotype, in which both mRNA metabolism and nuclear structure are affected. srm1 mutants, defective in the same gene, influence the signal transduction pathway for the pheromone response. The yeast PRP20/SRM1 protein is highly homologous to the RCC1 protein of man, hamster and frog. In mammalian cells, this protein is a negative regulator for initiation of chromosome condensation. We report the analysis of two, independently isolated, recessive temperature-sensitive prp20 mutants. They have identical G to A transitions, leading to the alteration of a highly conserved glycine residue to glutamic acid. By immunofluorescence microscopy the PRP20 protein was localized in the nucleus. Expression of the RCC1 protein can complement the temperature-sensitive phenotype of prp20 mutants, demonstrating the functional similarity of the yeast and mammalian proteins.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.